## 6. Embedding Models Embedding models are crucial in transforming your raw data into vector representations. VectorETL supports several popular embedding models, allowing you to choose the one that best fits your needs. ### OpenAI **Python variable (as JSON)** ```json { "embedding_model": "OpenAI", "api_key": "your-openai-api-key", "model_name": "text-embedding-ada-002" } ``` **YAML** ```yaml embedding: embedding_model: "OpenAI" api_key: "your-openai-api-key" model_name: "text-embedding-ada-002" ``` ### Cohere **Python variable (as JSON)** ```json { "embedding_model": "Cohere", "api_key": "your-cohere-api-key", "model_name": "embed-english-v2.0" } ``` **YAML** ```yaml embedding: embedding_model: "Cohere" api_key: "your-cohere-api-key" model_name: "embed-english-v2.0" ``` ### Google Gemini **Python variable (as JSON)** ```json { "embedding_model": "Google Gemini", "api_key": "your-gemini-api-key", "model_name": "embedding-001" } ``` **YAML** ```yaml embedding: embedding_model: "Google Gemini" api_key: "your-gemini-api-key" model_name: "embedding-001" ``` ### Azure OpenAI **Python variable (as JSON)** ```json { "embedding_model": "Azure OpenAI", "api_key": "your-azure-openai-api-key", "endpoint": "your-azure-openai-endpoint", "version": "2022-12-01", "model_name": "text-embedding-ada-002", "private_deployment": "Yes", "deployment_name": "your-deployment-name" } ``` **YAML** ```yaml embedding: embedding_model: "Azure OpenAI" api_key: "your-azure-openai-api-key" endpoint: "your-azure-openai-endpoint" version: "2022-12-01" # API version model_name: "text-embedding-ada-002" private_deployment: "Yes" # or "No" deployment_name: "your-deployment-name" # if private_deployment is "Yes" ``` ### Hugging Face **Python variable (as JSON)** ```json { "embedding_model": "Hugging Face", "api_key": "your-huggingface-api-key", "model_name": "sentence-transformers/all-MiniLM-L6-v2" } ``` **YAML** ```yaml embedding: embedding_model: "Hugging Face" api_key: "your-huggingface-api-key" model_name: "sentence-transformers/all-MiniLM-L6-v2" ``` ### Choosing the Right Embedding Model When selecting an embedding model, consider the following factors: - Language support: Ensure the model supports the languages in your data. - Embedding dimension: Different models produce embeddings of different sizes. - Licensing and cost: Be aware of usage limits and pricing for API-based models. - Performance: Consider the trade-off between embedding quality and computation time/cost. ### Adding Custom Embedding Models To add a custom embedding model: 1. Create a new file in the `embedding_mods` directory. 2. Implement a new class that inherits from `BaseEmbedding`. 3. Implement the required `embed()` method. 4. Update the `get_embedding_model()` function in `embedding_mods/__init__.py` to include your new model. Example of a custom embedding class: ```python from .base import BaseEmbedding class MyCustomEmbedding(BaseEmbedding): def __init__(self, config): self.config = config # Initialize your model here def embed(self, df, embed_column='__concat_final'): # Implement embedding logic here # Return the dataframe with a new 'embeddings' column return df ```