.. VectorETL documentation master file, created by sphinx-quickstart on Wed Aug 7 13:04:31 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. VectorETL documentation ======================= VectorETL by `Context Data `_ is a flexible and modular Python framework designed to streamline the process of converting diverse data sources into vector embeddings and storing them in various vector databases. It supports multiple data sources (databases, cloud storage, and local files), various embedding models (including OpenAI, Cohere, and Google Gemini), and several vector database targets (like Pinecone, Qdrant, and Weaviate). This pipeline aims to simplify the creation and management of vector search systems, enabling developers and data scientists to easily build and scale applications that require semantic search, recommendation systems, or other vector-based operations. .. image:: https://contextdata.nyc3.digitaloceanspaces.com/rs/images/ContextDataDark.png :width: 300 :alt: Context Data Logo :align: center :target: https://contextdata.ai/ .. image:: https://github.com/ContextData/VectorETL/raw/main/docs/assets/vector-etl-flow.png :width: 800 :alt: Process Flow :align: center **Features** 1. Modular architecture with support for multiple data sources, embedding models, and vector databases 2. Batch processing for efficient handling of large datasets 3. Configurable chunking and overlapping for text data 4. Easy integration of new data sources, embedding models, and vector databases **Information** * - `Github `_ * - `Context Data Website `_ .. toctree:: :hidden: :maxdepth: 2 :caption: Contents: introduction getting_started core_concepts configuration data_sources embedding_models vector_dbs examples api_reference extending troubleshooting