## 10. Extending VectorETL VectorETL is designed to be easily extensible. You can add new source types, embedding models, and target databases to suit your specific needs. ### Creating Custom Source Modules To add a new source: 1. Create a new file in the `source_mods` directory (e.g., `my_custom_source.py`). 2. Implement a class that inherits from `BaseSource`: ```python from .base import BaseSource class MyCustomSource(BaseSource): def __init__(self, config): self.config = config def connect(self): # Implement connection logic def fetch_data(self): # Implement data fetching logic # Return a pandas DataFrame ``` 3. Update `source_mods/__init__.py` to include your new source: ```python from .my_custom_source import MyCustomSource def get_source_class(config): # ... existing code ... elif source_type == 'MyCustomSource': return MyCustomSource(config) # ... existing code ... ``` ### Implementing New Embedding Models To add a new embedding model: 1. Create a new file in the `embedding_mods` directory (e.g., `my_custom_embedding.py`). 2. Implement a class that inherits from `BaseEmbedding`: ```python from .base import BaseEmbedding class MyCustomEmbedding(BaseEmbedding): def __init__(self, config): self.config = config def embed(self, df, embed_column='__concat_final'): # Implement embedding logic # Return DataFrame with new 'embeddings' column ``` 3. Update `embedding_mods/__init__.py` to include your new model: ```python from .my_custom_embedding import MyCustomEmbedding def get_embedding_model(config): # ... existing code ... elif embedding_type == 'MyCustomEmbedding': return MyCustomEmbedding(config) # ... existing code ... ``` ### Adding New Vector Database Targets To add a new vector database target: 1. Create a new file in the `target_mods` directory (e.g., `my_custom_target.py`). 2. Implement a class that inherits from `BaseTarget`: ```python from .base import BaseTarget class MyCustomTarget(BaseTarget): def __init__(self, config): self.config = config def connect(self): # Implement connection logic def create_index_if_not_exists(self, dimension): # Implement index creation logic def write_data(self, df, columns, domain=None): # Implement data writing logic ``` 3. Update `target_mods/__init__.py` to include your new target: ```python from .my_custom_target import MyCustomTarget def get_target_database(config): # ... existing code ... elif target_type == 'MyCustomTarget': return MyCustomTarget(config) # ... existing code ... ``` ### Best Practices for Contributions When extending VectorETL: 1. Follow the existing code style and structure. 2. Write clear docstrings and comments. 3. Include error handling and logging. 4. Write unit tests for your new components. 5. Update the documentation to reflect new features. 6. Consider submitting a pull request to contribute back to the main project.