10. Extending VectorETL
VectorETL is designed to be easily extensible. You can add new source types, embedding models, and target databases to suit your specific needs.
Creating Custom Source Modules
To add a new source:
Create a new file in the
source_mods
directory (e.g.,my_custom_source.py
).Implement a class that inherits from
BaseSource
:
from .base import BaseSource
class MyCustomSource(BaseSource):
def __init__(self, config):
self.config = config
def connect(self):
# Implement connection logic
def fetch_data(self):
# Implement data fetching logic
# Return a pandas DataFrame
Update
source_mods/__init__.py
to include your new source:
from .my_custom_source import MyCustomSource
def get_source_class(config):
# ... existing code ...
elif source_type == 'MyCustomSource':
return MyCustomSource(config)
# ... existing code ...
Implementing New Embedding Models
To add a new embedding model:
Create a new file in the
embedding_mods
directory (e.g.,my_custom_embedding.py
).Implement a class that inherits from
BaseEmbedding
:
from .base import BaseEmbedding
class MyCustomEmbedding(BaseEmbedding):
def __init__(self, config):
self.config = config
def embed(self, df, embed_column='__concat_final'):
# Implement embedding logic
# Return DataFrame with new 'embeddings' column
Update
embedding_mods/__init__.py
to include your new model:
from .my_custom_embedding import MyCustomEmbedding
def get_embedding_model(config):
# ... existing code ...
elif embedding_type == 'MyCustomEmbedding':
return MyCustomEmbedding(config)
# ... existing code ...
Adding New Vector Database Targets
To add a new vector database target:
Create a new file in the
target_mods
directory (e.g.,my_custom_target.py
).Implement a class that inherits from
BaseTarget
:
from .base import BaseTarget
class MyCustomTarget(BaseTarget):
def __init__(self, config):
self.config = config
def connect(self):
# Implement connection logic
def create_index_if_not_exists(self, dimension):
# Implement index creation logic
def write_data(self, df, columns, domain=None):
# Implement data writing logic
Update
target_mods/__init__.py
to include your new target:
from .my_custom_target import MyCustomTarget
def get_target_database(config):
# ... existing code ...
elif target_type == 'MyCustomTarget':
return MyCustomTarget(config)
# ... existing code ...
Best Practices for Contributions
When extending VectorETL:
Follow the existing code style and structure.
Write clear docstrings and comments.
Include error handling and logging.
Write unit tests for your new components.
Update the documentation to reflect new features.
Consider submitting a pull request to contribute back to the main project.