.. VectorETL documentation master file, created by
   sphinx-quickstart on Wed Aug  7 13:04:31 2024.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

VectorETL documentation
=======================


VectorETL by `Context Data <https://contextdata.ai>`_ is a flexible and modular Python framework designed to streamline the process of converting diverse data sources into vector embeddings and storing them in various vector databases. It supports multiple data sources (databases, cloud storage, and local files), various embedding models (including OpenAI, Cohere, and Google Gemini), and several vector database targets (like Pinecone, Qdrant, and Weaviate).

This pipeline aims to simplify the creation and management of vector search systems, enabling developers and data scientists to easily build and scale applications that require semantic search, recommendation systems, or other vector-based operations.


.. image:: https://contextdata.nyc3.digitaloceanspaces.com/rs/images/ContextDataDark.png
  :width: 300
  :alt: Context Data Logo
  :align: center
  :target: https://contextdata.ai/

.. image:: https://github.com/ContextData/VectorETL/raw/main/docs/assets/vector-etl-flow.png
  :width: 800
  :alt: Process Flow
  :align: center

**Features**

1. Modular architecture with support for multiple data sources, embedding models, and vector databases
2. Batch processing for efficient handling of large datasets
3. Configurable chunking and overlapping for text data
4. Easy integration of new data sources, embedding models, and vector databases


**Information**

   * - `Github <https://github.com/ContextData/VectorETL>`_
   * - `Context Data Website <https://contextdata.ai>`_


.. toctree::
   :hidden:
   :maxdepth: 2
   :caption: Contents:

   introduction
   getting_started
   core_concepts
   configuration
   data_sources
   embedding_models
   vector_dbs
   examples
   api_reference
   extending
   troubleshooting