## 7. Vector Databases (Targets) Vector databases are specialized databases designed to store and efficiently query vector embeddings. VectorETL supports several popular vector databases as targets for your embedded data. ### Pinecone **Python variable (as JSON)** ```json { "target_database": "Pinecone", "pinecone_api_key": "your-pinecone-api-key", "index_name": "my-index", "dimension": 1536, "metric": "cosine", "cloud": "aws", "region": "us-west-2" } ``` **YAML** ```yaml target: target_database: "Pinecone" pinecone_api_key: "your-pinecone-api-key" index_name: "my-index" dimension: 1536 metric: "cosine" cloud: "aws" region: "us-west-2" ``` ### Qdrant **Python variable (as JSON)** ```json { "target_database": "Qdrant", "qdrant_url": "https://your-qdrant-cluster-url.qdrant.io", "qdrant_api_key": "your-qdrant-api-key", "collection_name": "my-collection" } ``` **YAML** ```yaml target: target_database: "Qdrant" qdrant_url: "https://your-qdrant-cluster-url.qdrant.io" qdrant_api_key: "your-qdrant-api-key" collection_name: "my-collection" ``` ### Milvus (Zilliz) **Python variable (as JSON)** ```json { "target_database": "Milvus", "host": "https://my-server-url.zillizcloud.com", "api_key": "my-zilliz-api-key", "collection_name": "my_collection", "vector_dim": 1536 } ``` **YAML** ```yaml target: target_database: "Milvus" host: "https://my-server-url.zillizcloud.com" #include port if exists (e.g. http://my-milvus-server:19530) api_key: "my-zilliz-api-key" collection_name: "my_collection" vector_dim: 1536 # Dimension of your vector embeddings ``` ### Weaviate **Python variable (as JSON)** ```json { "target_database": "Weaviate", "weaviate_url": "https://your-cluster-url.weaviate.network", "weaviate_api_key": "your-weaviate-api-key", "class_name": "MyClass" } ``` **YAML** ```yaml target: target_database: "Weaviate" weaviate_url: "https://your-cluster-url.weaviate.network" weaviate_api_key: "your-weaviate-api-key" class_name: "MyClass" ``` ### SingleStore **Python variable (as JSON)** ```json { "target_database": "Single Store", "singlestore_host": "your-host.singlestore.com", "singlestore_port": 3306, "singlestore_user": "your-username", "singlestore_password": "your-password", "singlestore_database_name": "your-database", "singlestore_table": "your-table" } ``` **YAML** ```yaml target: target_database: "Single Store" singlestore_host: "your-host.singlestore.com" singlestore_port: 3306 singlestore_user: "your-username" singlestore_password: "your-password" singlestore_database_name: "your-database" singlestore_table: "your-table" ``` ### Supabase **Python variable (as JSON)** ```json { "target_database": "Supabase", "supabase_uri": "your-supabase-connection-string", "index_name": "my_index" } ``` **YAML** ```yaml target: target_database: "Supabase" supabase_uri: "your-supabase-connection-string" index_name: "my_index" ``` ### LanceDB **Python variable (as JSON)** ```json { "target_database": "LanceDB", "lancedb_api_key": "your-lancedb-api-key", "project_name": "your-project-name", "table_name": "your-table-name" } ``` **YAML** ```yaml target: target_database: "LanceDB" lancedb_api_key: "your-lancedb-api-key" project_name: "your-project-name" table_name: "your-table-name" ``` ### Tembo **Python variable (as JSON)** ```json { "target_database": "Tembo", "host": "your-tembo-host.tembo.io", "database_name": "your-database", "username": "your-username", "password": "your-password", "port": 5432, "schema_name": "your-schema", "table_name": "your-table" } ``` **YAML** ```yaml target: target_database: "Tembo" host: "your-tembo-host.tembo.io" database_name: "your-database" username: "your-username" password: "your-password" port: 5432 schema_name: "your-schema" table_name: "your-table" ``` ### MongoDB **Python variable (as JSON)** ```json { "target_database": "MongoDB", "mongodb_uri": "your-mongodb-connection-string", "database_name": "your-database", "collection_name": "your-collection", "vector_field": "embedding" } ``` **YAML** ```yaml target: target_database: "MongoDB" mongodb_uri: "your-mongodb-connection-string" database_name: "your-database" collection_name: "your-collection" vector_field: "embedding" ``` ### Neo4j **Python variable (as JSON)** ```json { "target_database": "Neo4j", "neo4j_uri": "bolt://your-neo4j-host:7687", "username": "neo4j", "password": "your-password", "vector_property": "embedding", "vector_dimensions": 1536, "similarity_function": "cosine", "graph_structure": { "nodes": [ { "label": "Document", "properties": [ "title", "content" ] } ], "relationships": [ { "type": "SIMILAR_TO", "start_node": "Document", "end_node": "Document" } ] } } ``` **YAML** ```yaml target: target_database: "Neo4j" neo4j_uri: "bolt://your-neo4j-host:7687" username: "neo4j" password: "your-password" vector_property: "embedding" vector_dimensions: 1536 similarity_function: "cosine" graph_structure: nodes: - label: "Document" properties: ["title", "content"] relationships: - type: "SIMILAR_TO" start_node: "Document" end_node: "Document" ``` ### Choosing the Right Vector Database When selecting a vector database, consider: - Scalability: How well does it handle large volumes of data? - Query performance: Speed of similarity searches and filtering operations. - Integration: Compatibility with your existing infrastructure. - Features: Support for metadata filtering, real-time updates, etc. - Hosting: Managed service vs. self-hosted options. - Cost: Pricing model and operational costs. ### Adding Custom Vector Database Targets To add a custom vector database target: 1. Create a new file in the `target_mods` directory. 2. Implement a new class that inherits from `BaseTarget`. 3. Implement the required methods: `connect()`, `create_index_if_not_exists()`, and `write_data()`. 4. Update the `get_target_database()` function in `target_mods/__init__.py` to include your new target. Example of a custom target class: ```python from .base import BaseTarget class MyCustomTarget(BaseTarget): def __init__(self, config): self.config = config def connect(self): # Implement connection logic here pass def create_index_if_not_exists(self, dimension): # Implement index creation logic here pass def write_data(self, df, columns, domain=None): # Implement data writing logic here pass ```