10. Extending VectorETL

VectorETL is designed to be easily extensible. You can add new source types, embedding models, and target databases to suit your specific needs.

Creating Custom Source Modules

To add a new source:

  1. Create a new file in the source_mods directory (e.g., my_custom_source.py).

  2. Implement a class that inherits from BaseSource:

from .base import BaseSource

class MyCustomSource(BaseSource):
    def __init__(self, config):
        self.config = config

    def connect(self):
        # Implement connection logic

    def fetch_data(self):
        # Implement data fetching logic
        # Return a pandas DataFrame
  1. Update source_mods/__init__.py to include your new source:

from .my_custom_source import MyCustomSource

def get_source_class(config):
    # ... existing code ...
    elif source_type == 'MyCustomSource':
        return MyCustomSource(config)
    # ... existing code ...

Implementing New Embedding Models

To add a new embedding model:

  1. Create a new file in the embedding_mods directory (e.g., my_custom_embedding.py).

  2. Implement a class that inherits from BaseEmbedding:

from .base import BaseEmbedding

class MyCustomEmbedding(BaseEmbedding):
    def __init__(self, config):
        self.config = config

    def embed(self, df, embed_column='__concat_final'):
        # Implement embedding logic
        # Return DataFrame with new 'embeddings' column
  1. Update embedding_mods/__init__.py to include your new model:

from .my_custom_embedding import MyCustomEmbedding

def get_embedding_model(config):
    # ... existing code ...
    elif embedding_type == 'MyCustomEmbedding':
        return MyCustomEmbedding(config)
    # ... existing code ...

Adding New Vector Database Targets

To add a new vector database target:

  1. Create a new file in the target_mods directory (e.g., my_custom_target.py).

  2. Implement a class that inherits from BaseTarget:

from .base import BaseTarget

class MyCustomTarget(BaseTarget):
    def __init__(self, config):
        self.config = config

    def connect(self):
        # Implement connection logic

    def create_index_if_not_exists(self, dimension):
        # Implement index creation logic

    def write_data(self, df, columns, domain=None):
        # Implement data writing logic
  1. Update target_mods/__init__.py to include your new target:

from .my_custom_target import MyCustomTarget

def get_target_database(config):
    # ... existing code ...
    elif target_type == 'MyCustomTarget':
        return MyCustomTarget(config)
    # ... existing code ...

Best Practices for Contributions

When extending VectorETL:

  1. Follow the existing code style and structure.

  2. Write clear docstrings and comments.

  3. Include error handling and logging.

  4. Write unit tests for your new components.

  5. Update the documentation to reflect new features.

  6. Consider submitting a pull request to contribute back to the main project.