VectorETL documentation

VectorETL by Context Data is a flexible and modular Python framework designed to streamline the process of converting diverse data sources into vector embeddings and storing them in various vector databases. It supports multiple data sources (databases, cloud storage, and local files), various embedding models (including OpenAI, Cohere, and Google Gemini), and several vector database targets (like Pinecone, Qdrant, and Weaviate).

This pipeline aims to simplify the creation and management of vector search systems, enabling developers and data scientists to easily build and scale applications that require semantic search, recommendation systems, or other vector-based operations.

Features

Modular architecture with support for multiple data sources, embedding models, and vector databases
Batch processing for efficient handling of large datasets
Configurable chunking and overlapping for text data
Easy integration of new data sources, embedding models, and vector databases

Information

Github

Context Data Website