## 9. API Reference VectorETL provides a set of core classes and functions that you can use to build and customize your ETL pipelines. This section provides an overview of the main components of the API. ### Source Classes All source classes inherit from the `BaseSource` class: ```python class BaseSource(ABC): @abstractmethod def connect(self): pass @abstractmethod def fetch_data(self): pass ``` Key source classes include: - `S3Source`: For Amazon S3 buckets - `DatabaseSource`: For SQL databases - `LocalFileSource`: For local file systems - `DropboxSource`: For Dropbox files - `GoogleDriveSource`: For Google Drive files ### Embedding Classes Embedding classes inherit from the `BaseEmbedding` class: ```python class BaseEmbedding(ABC): @abstractmethod def embed(self, df, embed_column='__concat_final'): pass ``` Key embedding classes include: - `OpenAIEmbedding`: For OpenAI's embedding models - `CohereEmbedding`: For Cohere's embedding models - `GoogleGeminiEmbedding`: For Google's Gemini models - `AzureOpenAIEmbedding`: For Azure OpenAI Service - `HuggingFaceEmbedding`: For Hugging Face models ### Target Classes Target classes inherit from the `BaseTarget` class: ```python class BaseTarget(ABC): @abstractmethod def connect(self): pass @abstractmethod def create_index_if_not_exists(self, dimension): pass @abstractmethod def write_data(self, df, columns, domain=None): pass ``` Key target classes include: - `PineconeTarget`: For Pinecone vector database - `QdrantTarget`: For Qdrant vector database - `WeaviateTarget`: For Weaviate vector database - `SingleStoreTarget`: For SingleStore database - `SupabaseTarget`: For Supabase vector storage ### Utility Functions VectorETL includes several utility functions to help with common tasks: - `get_source_class(config)`: Returns the appropriate source class based on configuration - `get_embedding_model(config)`: Returns the appropriate embedding class based on configuration - `get_target_database(config)`: Returns the appropriate target class based on configuration ### Orchestrator The `ETLOrchestrator` class coordinates the entire ETL process: ```python class ETLOrchestrator: def __init__(self, source_config, embedding_config, target_config, embed_columns): # Initialize components def run(self): # Run the ETL process def fetch_data(self): # Fetch data from source def process_and_embed_data(self, df): # Process and embed data def write_to_target(self, df): # Write data to target database ```