7. Vector Databases (Targets)

Vector databases are specialized databases designed to store and efficiently query vector embeddings. VectorETL supports several popular vector databases as targets for your embedded data.

Pinecone

Python variable (as JSON)

{
    "target_database": "Pinecone",
    "pinecone_api_key": "your-pinecone-api-key",
    "index_name": "my-index",
    "dimension": 1536,
    "metric": "cosine",
    "cloud": "aws",
    "region": "us-west-2"
}

YAML

target:
  target_database: "Pinecone"
  pinecone_api_key: "your-pinecone-api-key"
  index_name: "my-index"
  dimension: 1536
  metric: "cosine"
  cloud: "aws"
  region: "us-west-2"

Qdrant

Python variable (as JSON)

{
    "target_database": "Qdrant",
    "qdrant_url": "https://your-qdrant-cluster-url.qdrant.io",
    "qdrant_api_key": "your-qdrant-api-key",
    "collection_name": "my-collection"
}

YAML

target:
  target_database: "Qdrant"
  qdrant_url: "https://your-qdrant-cluster-url.qdrant.io"
  qdrant_api_key: "your-qdrant-api-key"
  collection_name: "my-collection"

Milvus (Zilliz)

Python variable (as JSON)

{
    "target_database": "Milvus",
    "host": "https://my-server-url.zillizcloud.com",
    "api_key": "my-zilliz-api-key",
    "collection_name": "my_collection",
    "vector_dim": 1536
}

YAML

target:
  target_database: "Milvus"
  host: "https://my-server-url.zillizcloud.com" #include port if exists (e.g. http://my-milvus-server:19530)
  api_key: "my-zilliz-api-key"
  collection_name: "my_collection"
  vector_dim: 1536  # Dimension of your vector embeddings

Weaviate

Python variable (as JSON)

{
    "target_database": "Weaviate",
    "weaviate_url": "https://your-cluster-url.weaviate.network",
    "weaviate_api_key": "your-weaviate-api-key",
    "class_name": "MyClass"
}

YAML

target:
  target_database: "Weaviate"
  weaviate_url: "https://your-cluster-url.weaviate.network"
  weaviate_api_key: "your-weaviate-api-key"
  class_name: "MyClass"

SingleStore

Python variable (as JSON)

{
    "target_database": "Single Store",
    "singlestore_host": "your-host.singlestore.com",
    "singlestore_port": 3306,
    "singlestore_user": "your-username",
    "singlestore_password": "your-password",
    "singlestore_database_name": "your-database",
    "singlestore_table": "your-table"
}

YAML

target:
  target_database: "Single Store"
  singlestore_host: "your-host.singlestore.com"
  singlestore_port: 3306
  singlestore_user: "your-username"
  singlestore_password: "your-password"
  singlestore_database_name: "your-database"
  singlestore_table: "your-table"

Supabase

Python variable (as JSON)

{
    "target_database": "Supabase",
    "supabase_uri": "your-supabase-connection-string",
    "index_name": "my_index"
}

YAML

target:
  target_database: "Supabase"
  supabase_uri: "your-supabase-connection-string"
  index_name: "my_index"

LanceDB

Python variable (as JSON)

{
    "target_database": "LanceDB",
    "lancedb_api_key": "your-lancedb-api-key",
    "project_name": "your-project-name",
    "table_name": "your-table-name"
}

YAML

target:
  target_database: "LanceDB"
  lancedb_api_key: "your-lancedb-api-key"
  project_name: "your-project-name"
  table_name: "your-table-name"

Tembo

Python variable (as JSON)

{
    "target_database": "Tembo",
    "host": "your-tembo-host.tembo.io",
    "database_name": "your-database",
    "username": "your-username",
    "password": "your-password",
    "port": 5432,
    "schema_name": "your-schema",
    "table_name": "your-table"
}

YAML

target:
  target_database: "Tembo"
  host: "your-tembo-host.tembo.io"
  database_name: "your-database"
  username: "your-username"
  password: "your-password"
  port: 5432
  schema_name: "your-schema"
  table_name: "your-table"

MongoDB

Python variable (as JSON)

{
    "target_database": "MongoDB",
    "mongodb_uri": "your-mongodb-connection-string",
    "database_name": "your-database",
    "collection_name": "your-collection",
    "vector_field": "embedding"
}

YAML

target:
  target_database: "MongoDB"
  mongodb_uri: "your-mongodb-connection-string"
  database_name: "your-database"
  collection_name: "your-collection"
  vector_field: "embedding"

Neo4j

Python variable (as JSON)

{
    "target_database": "Neo4j",
    "neo4j_uri": "bolt://your-neo4j-host:7687",
    "username": "neo4j",
    "password": "your-password",
    "vector_property": "embedding",
    "vector_dimensions": 1536,
    "similarity_function": "cosine",
    "graph_structure": {
        "nodes": [
            {
                "label": "Document",
                "properties": [
                    "title",
                    "content"
                ]
            }
        ],
        "relationships": [
            {
                "type": "SIMILAR_TO",
                "start_node": "Document",
                "end_node": "Document"
            }
        ]
    }
}

YAML

target:
  target_database: "Neo4j"
  neo4j_uri: "bolt://your-neo4j-host:7687"
  username: "neo4j"
  password: "your-password"
  vector_property: "embedding"
  vector_dimensions: 1536
  similarity_function: "cosine"
  graph_structure:
    nodes:
      - label: "Document"
        properties: ["title", "content"]
    relationships:
      - type: "SIMILAR_TO"
        start_node: "Document"
        end_node: "Document"

Choosing the Right Vector Database

When selecting a vector database, consider:

  • Scalability: How well does it handle large volumes of data?

  • Query performance: Speed of similarity searches and filtering operations.

  • Integration: Compatibility with your existing infrastructure.

  • Features: Support for metadata filtering, real-time updates, etc.

  • Hosting: Managed service vs. self-hosted options.

  • Cost: Pricing model and operational costs.

Adding Custom Vector Database Targets

To add a custom vector database target:

  1. Create a new file in the target_mods directory.

  2. Implement a new class that inherits from BaseTarget.

  3. Implement the required methods: connect(), create_index_if_not_exists(), and write_data().

  4. Update the get_target_database() function in target_mods/__init__.py to include your new target.

Example of a custom target class:

from .base import BaseTarget

class MyCustomTarget(BaseTarget):
    def __init__(self, config):
        self.config = config

    def connect(self):
        # Implement connection logic here
        pass

    def create_index_if_not_exists(self, dimension):
        # Implement index creation logic here
        pass

    def write_data(self, df, columns, domain=None):
        # Implement data writing logic here
        pass