7. Vector Databases (Targets)
Vector databases are specialized databases designed to store and efficiently query vector embeddings. VectorETL supports several popular vector databases as targets for your embedded data.
Pinecone
Python variable (as JSON)
{
    "target_database": "Pinecone",
    "pinecone_api_key": "your-pinecone-api-key",
    "index_name": "my-index",
    "dimension": 1536,
    "metric": "cosine",
    "cloud": "aws",
    "region": "us-west-2"
}
YAML
target:
  target_database: "Pinecone"
  pinecone_api_key: "your-pinecone-api-key"
  index_name: "my-index"
  dimension: 1536
  metric: "cosine"
  cloud: "aws"
  region: "us-west-2"
Qdrant
Python variable (as JSON)
{
    "target_database": "Qdrant",
    "qdrant_url": "https://your-qdrant-cluster-url.qdrant.io",
    "qdrant_api_key": "your-qdrant-api-key",
    "collection_name": "my-collection"
}
YAML
target:
  target_database: "Qdrant"
  qdrant_url: "https://your-qdrant-cluster-url.qdrant.io"
  qdrant_api_key: "your-qdrant-api-key"
  collection_name: "my-collection"
Milvus (Zilliz)
Python variable (as JSON)
{
    "target_database": "Milvus",
    "host": "https://my-server-url.zillizcloud.com",
    "api_key": "my-zilliz-api-key",
    "collection_name": "my_collection",
    "vector_dim": 1536
}
YAML
target:
  target_database: "Milvus"
  host: "https://my-server-url.zillizcloud.com" #include port if exists (e.g. http://my-milvus-server:19530)
  api_key: "my-zilliz-api-key"
  collection_name: "my_collection"
  vector_dim: 1536  # Dimension of your vector embeddings
Weaviate
Python variable (as JSON)
{
    "target_database": "Weaviate",
    "weaviate_url": "https://your-cluster-url.weaviate.network",
    "weaviate_api_key": "your-weaviate-api-key",
    "class_name": "MyClass"
}
YAML
target:
  target_database: "Weaviate"
  weaviate_url: "https://your-cluster-url.weaviate.network"
  weaviate_api_key: "your-weaviate-api-key"
  class_name: "MyClass"
SingleStore
Python variable (as JSON)
{
    "target_database": "Single Store",
    "singlestore_host": "your-host.singlestore.com",
    "singlestore_port": 3306,
    "singlestore_user": "your-username",
    "singlestore_password": "your-password",
    "singlestore_database_name": "your-database",
    "singlestore_table": "your-table"
}
YAML
target:
  target_database: "Single Store"
  singlestore_host: "your-host.singlestore.com"
  singlestore_port: 3306
  singlestore_user: "your-username"
  singlestore_password: "your-password"
  singlestore_database_name: "your-database"
  singlestore_table: "your-table"
Supabase
Python variable (as JSON)
{
    "target_database": "Supabase",
    "supabase_uri": "your-supabase-connection-string",
    "index_name": "my_index"
}
YAML
target:
  target_database: "Supabase"
  supabase_uri: "your-supabase-connection-string"
  index_name: "my_index"
LanceDB
Python variable (as JSON)
{
    "target_database": "LanceDB",
    "lancedb_api_key": "your-lancedb-api-key",
    "project_name": "your-project-name",
    "table_name": "your-table-name"
}
YAML
target:
  target_database: "LanceDB"
  lancedb_api_key: "your-lancedb-api-key"
  project_name: "your-project-name"
  table_name: "your-table-name"
Tembo
Python variable (as JSON)
{
    "target_database": "Tembo",
    "host": "your-tembo-host.tembo.io",
    "database_name": "your-database",
    "username": "your-username",
    "password": "your-password",
    "port": 5432,
    "schema_name": "your-schema",
    "table_name": "your-table"
}
YAML
target:
  target_database: "Tembo"
  host: "your-tembo-host.tembo.io"
  database_name: "your-database"
  username: "your-username"
  password: "your-password"
  port: 5432
  schema_name: "your-schema"
  table_name: "your-table"
MongoDB
Python variable (as JSON)
{
    "target_database": "MongoDB",
    "mongodb_uri": "your-mongodb-connection-string",
    "database_name": "your-database",
    "collection_name": "your-collection",
    "vector_field": "embedding"
}
YAML
target:
  target_database: "MongoDB"
  mongodb_uri: "your-mongodb-connection-string"
  database_name: "your-database"
  collection_name: "your-collection"
  vector_field: "embedding"
Neo4j
Python variable (as JSON)
{
    "target_database": "Neo4j",
    "neo4j_uri": "bolt://your-neo4j-host:7687",
    "username": "neo4j",
    "password": "your-password",
    "vector_property": "embedding",
    "vector_dimensions": 1536,
    "similarity_function": "cosine",
    "graph_structure": {
        "nodes": [
            {
                "label": "Document",
                "properties": [
                    "title",
                    "content"
                ]
            }
        ],
        "relationships": [
            {
                "type": "SIMILAR_TO",
                "start_node": "Document",
                "end_node": "Document"
            }
        ]
    }
}
YAML
target:
  target_database: "Neo4j"
  neo4j_uri: "bolt://your-neo4j-host:7687"
  username: "neo4j"
  password: "your-password"
  vector_property: "embedding"
  vector_dimensions: 1536
  similarity_function: "cosine"
  graph_structure:
    nodes:
      - label: "Document"
        properties: ["title", "content"]
    relationships:
      - type: "SIMILAR_TO"
        start_node: "Document"
        end_node: "Document"
Choosing the Right Vector Database
When selecting a vector database, consider:
Scalability: How well does it handle large volumes of data?
Query performance: Speed of similarity searches and filtering operations.
Integration: Compatibility with your existing infrastructure.
Features: Support for metadata filtering, real-time updates, etc.
Hosting: Managed service vs. self-hosted options.
Cost: Pricing model and operational costs.
Adding Custom Vector Database Targets
To add a custom vector database target:
Create a new file in the
target_modsdirectory.Implement a new class that inherits from
BaseTarget.Implement the required methods:
connect(),create_index_if_not_exists(), andwrite_data().Update the
get_target_database()function intarget_mods/__init__.pyto include your new target.
Example of a custom target class:
from .base import BaseTarget
class MyCustomTarget(BaseTarget):
    def __init__(self, config):
        self.config = config
    def connect(self):
        # Implement connection logic here
        pass
    def create_index_if_not_exists(self, dimension):
        # Implement index creation logic here
        pass
    def write_data(self, df, columns, domain=None):
        # Implement data writing logic here
        pass