7. Vector Databases (Targets)
Vector databases are specialized databases designed to store and efficiently query vector embeddings. VectorETL supports several popular vector databases as targets for your embedded data.
Pinecone
Python variable (as JSON)
{
"target_database": "Pinecone",
"pinecone_api_key": "your-pinecone-api-key",
"index_name": "my-index",
"dimension": 1536,
"metric": "cosine",
"cloud": "aws",
"region": "us-west-2"
}
YAML
target:
target_database: "Pinecone"
pinecone_api_key: "your-pinecone-api-key"
index_name: "my-index"
dimension: 1536
metric: "cosine"
cloud: "aws"
region: "us-west-2"
Qdrant
Python variable (as JSON)
{
"target_database": "Qdrant",
"qdrant_url": "https://your-qdrant-cluster-url.qdrant.io",
"qdrant_api_key": "your-qdrant-api-key",
"collection_name": "my-collection"
}
YAML
target:
target_database: "Qdrant"
qdrant_url: "https://your-qdrant-cluster-url.qdrant.io"
qdrant_api_key: "your-qdrant-api-key"
collection_name: "my-collection"
Milvus (Zilliz)
Python variable (as JSON)
{
"target_database": "Milvus",
"host": "https://my-server-url.zillizcloud.com",
"api_key": "my-zilliz-api-key",
"collection_name": "my_collection",
"vector_dim": 1536
}
YAML
target:
target_database: "Milvus"
host: "https://my-server-url.zillizcloud.com" #include port if exists (e.g. http://my-milvus-server:19530)
api_key: "my-zilliz-api-key"
collection_name: "my_collection"
vector_dim: 1536 # Dimension of your vector embeddings
Weaviate
Python variable (as JSON)
{
"target_database": "Weaviate",
"weaviate_url": "https://your-cluster-url.weaviate.network",
"weaviate_api_key": "your-weaviate-api-key",
"class_name": "MyClass"
}
YAML
target:
target_database: "Weaviate"
weaviate_url: "https://your-cluster-url.weaviate.network"
weaviate_api_key: "your-weaviate-api-key"
class_name: "MyClass"
SingleStore
Python variable (as JSON)
{
"target_database": "Single Store",
"singlestore_host": "your-host.singlestore.com",
"singlestore_port": 3306,
"singlestore_user": "your-username",
"singlestore_password": "your-password",
"singlestore_database_name": "your-database",
"singlestore_table": "your-table"
}
YAML
target:
target_database: "Single Store"
singlestore_host: "your-host.singlestore.com"
singlestore_port: 3306
singlestore_user: "your-username"
singlestore_password: "your-password"
singlestore_database_name: "your-database"
singlestore_table: "your-table"
Supabase
Python variable (as JSON)
{
"target_database": "Supabase",
"supabase_uri": "your-supabase-connection-string",
"index_name": "my_index"
}
YAML
target:
target_database: "Supabase"
supabase_uri: "your-supabase-connection-string"
index_name: "my_index"
LanceDB
Python variable (as JSON)
{
"target_database": "LanceDB",
"lancedb_api_key": "your-lancedb-api-key",
"project_name": "your-project-name",
"table_name": "your-table-name"
}
YAML
target:
target_database: "LanceDB"
lancedb_api_key: "your-lancedb-api-key"
project_name: "your-project-name"
table_name: "your-table-name"
Tembo
Python variable (as JSON)
{
"target_database": "Tembo",
"host": "your-tembo-host.tembo.io",
"database_name": "your-database",
"username": "your-username",
"password": "your-password",
"port": 5432,
"schema_name": "your-schema",
"table_name": "your-table"
}
YAML
target:
target_database: "Tembo"
host: "your-tembo-host.tembo.io"
database_name: "your-database"
username: "your-username"
password: "your-password"
port: 5432
schema_name: "your-schema"
table_name: "your-table"
MongoDB
Python variable (as JSON)
{
"target_database": "MongoDB",
"mongodb_uri": "your-mongodb-connection-string",
"database_name": "your-database",
"collection_name": "your-collection",
"vector_field": "embedding"
}
YAML
target:
target_database: "MongoDB"
mongodb_uri: "your-mongodb-connection-string"
database_name: "your-database"
collection_name: "your-collection"
vector_field: "embedding"
Neo4j
Python variable (as JSON)
{
"target_database": "Neo4j",
"neo4j_uri": "bolt://your-neo4j-host:7687",
"username": "neo4j",
"password": "your-password",
"vector_property": "embedding",
"vector_dimensions": 1536,
"similarity_function": "cosine",
"graph_structure": {
"nodes": [
{
"label": "Document",
"properties": [
"title",
"content"
]
}
],
"relationships": [
{
"type": "SIMILAR_TO",
"start_node": "Document",
"end_node": "Document"
}
]
}
}
YAML
target:
target_database: "Neo4j"
neo4j_uri: "bolt://your-neo4j-host:7687"
username: "neo4j"
password: "your-password"
vector_property: "embedding"
vector_dimensions: 1536
similarity_function: "cosine"
graph_structure:
nodes:
- label: "Document"
properties: ["title", "content"]
relationships:
- type: "SIMILAR_TO"
start_node: "Document"
end_node: "Document"
Choosing the Right Vector Database
When selecting a vector database, consider:
Scalability: How well does it handle large volumes of data?
Query performance: Speed of similarity searches and filtering operations.
Integration: Compatibility with your existing infrastructure.
Features: Support for metadata filtering, real-time updates, etc.
Hosting: Managed service vs. self-hosted options.
Cost: Pricing model and operational costs.
Adding Custom Vector Database Targets
To add a custom vector database target:
Create a new file in the
target_mods
directory.Implement a new class that inherits from
BaseTarget
.Implement the required methods:
connect()
,create_index_if_not_exists()
, andwrite_data()
.Update the
get_target_database()
function intarget_mods/__init__.py
to include your new target.
Example of a custom target class:
from .base import BaseTarget
class MyCustomTarget(BaseTarget):
def __init__(self, config):
self.config = config
def connect(self):
# Implement connection logic here
pass
def create_index_if_not_exists(self, dimension):
# Implement index creation logic here
pass
def write_data(self, df, columns, domain=None):
# Implement data writing logic here
pass