Press ESC to exit fullscreen
📖 Lesson ⏱️ 150 minutes

Vector Databases in Practice

ChromaDB, Pinecone, and pgvector — set up, index, and query each

Why Your Vector Database Choice Matters

A vector database has one fundamental job: given a query vector, find the k most similar vectors in your index as fast as possible. Everything else — persistence, metadata filtering, ACID guarantees — is secondary.

But your choice of vector database affects more than performance. It affects your infrastructure complexity, your team’s operational burden, your ability to filter results by metadata, and whether you can keep your data on-premises. Choosing the wrong one means a painful migration when your use case outgrows it.

This lesson covers three databases that together handle 95% of real-world RAG deployments: ChromaDB for local development, Pinecone for managed cloud production, and pgvector for teams already on PostgreSQL.


Before diving into specific databases, understand what they’re doing internally.

When you insert a document chunk with its embedding, the database adds it to an index — specifically, most use an algorithm called HNSW (Hierarchical Navigable Small World). HNSW builds a multi-layer graph where each node (vector) is connected to its nearest neighbors. Searching it is like asking “who are this node’s neighbors, and who are their neighbors, climbing up and down layers until you converge on the nearest vectors to the query.”

This enables approximate nearest neighbor (ANN) search: not guaranteed exact nearest neighbors, but extremely close results in logarithmic time. A brute-force exact search through 1 million 1536-dim vectors takes ~2 seconds. HNSW finds the approximate nearest neighbors in ~5ms.

The “approximate” part rarely matters in practice — if you have 500,000 chunks and the exact nearest neighbor has cosine similarity 0.87 while HNSW returns 0.86, the answer quality is indistinguishable.


Option 1: ChromaDB — Local Development and Prototyping

ChromaDB is the easiest vector database to get started with. No account, no infrastructure, no config files. Just pip install chromadb and you’re storing vectors.

Setup and Indexing

# pip install chromadb langchain-community
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Option A: In-memory only (lost when process exits)
vectorstore_memory = Chroma(
    embedding_function=embeddings,
    collection_name="docs_memory"
)

# Option B: Persistent to disk (recommended)
vectorstore = Chroma(
    embedding_function=embeddings,
    persist_directory="./chroma_db",
    collection_name="product_docs"
)

# Add documents (also handles embedding)
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyMuPDFLoader("product_docs.pdf")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
chunks = splitter.split_documents(documents)

# Index everything in one call
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_name="product_docs"
)

print(f"Indexed {vectorstore._collection.count()} chunks")

Querying

# Basic similarity search
results = vectorstore.similarity_search(
    "How do I configure SSO?",
    k=5
)

for doc in results:
    print(f"Score: N/A | Source: {doc.metadata.get('source')}")
    print(doc.page_content[:200])
    print()

# Similarity search with relevance scores (0 to 1, higher = more similar)
results_with_scores = vectorstore.similarity_search_with_relevance_scores(
    "How do I configure SSO?",
    k=5
)

for doc, score in results_with_scores:
    print(f"Score: {score:.3f} | {doc.page_content[:100]}")

Metadata Filtering

ChromaDB supports filtering on metadata at query time — this is essential for real applications:

# Only retrieve from pages in the "Security" section
results = vectorstore.similarity_search(
    "authentication setup",
    k=5,
    filter={"section": "Security"}
)

# Only from specific source files
results = vectorstore.similarity_search(
    "rate limits",
    k=5,
    filter={"source": "api_reference.pdf"}
)

# Numeric comparisons (e.g., recent documents only)
results = vectorstore.similarity_search(
    "deployment",
    k=5,
    filter={"page": {"$gte": 10, "$lte": 50}}  # pages 10-50 only
)

Loading an Existing Index

# Load a previously persisted index — no re-embedding needed
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
    collection_name="product_docs"
)

print(f"Loaded index with {vectorstore._collection.count()} chunks")

Updating the Index

# Add new documents to an existing index
new_docs = loader_new.load()
new_chunks = splitter.split_documents(new_docs)
vectorstore.add_documents(new_chunks)

# Delete by metadata filter (e.g., when a document is updated)
vectorstore._collection.delete(
    where={"source": "old_document.pdf"}
)

When ChromaDB is the right choice:

  • Local development and prototyping
  • Datasets under ~500K documents
  • Single-machine deployments
  • When you want zero infrastructure overhead

Performance: ~5-10ms per query locally on an M2 MacBook for 100K vectors.

When to graduate away from ChromaDB: When you need multi-node deployment, when your vector count exceeds a few million, or when you need sophisticated managed backup/restore.


Option 2: Pinecone — Managed Cloud at Scale

Pinecone is a fully managed vector database service. You don’t run servers, manage indexes, or worry about scaling. This simplicity comes at a cost (pricing starts at ~$70/month for the smallest serverless tier) but eliminates entire categories of operational work.

Setup

# pip install pinecone-client langchain-pinecone
import pinecone
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore

# Initialize client
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")

# Create an index (one-time setup)
index_name = "product-docs"

if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,        # must match your embedding model!
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
    print(f"Created index '{index_name}'")

# Connect to the index
index = pc.Index(index_name)
print(f"Index stats: {index.describe_index_stats()}")

Indexing Documents

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store connected to Pinecone
vectorstore = PineconeVectorStore(
    index=index,
    embedding=embeddings,
    text_key="text"    # metadata key that stores the text content
)

# Index documents
vectorstore.add_documents(chunks)

# Check status
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")

Querying with Namespace Support

Pinecone’s killer feature for multi-tenant applications is namespaces: separate logical partitions within a single index. This is perfect for customer data isolation:

# Index customer A's documents in their own namespace
customer_a_store = PineconeVectorStore(
    index=index,
    embedding=embeddings,
    namespace="customer_a"
)
customer_a_store.add_documents(customer_a_chunks)

# Query only within customer A's namespace
results = customer_a_store.similarity_search(
    "contract renewal terms",
    k=5
)
# Will NEVER return results from customer B's data

Metadata Filtering

# Filter during similarity search
results = vectorstore.similarity_search(
    "authentication setup",
    k=5,
    filter={
        "department": {"$eq": "Engineering"},
        "year": {"$gte": 2024}
    }
)

# Available filter operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin

When Pinecone is the right choice:

  • Production workloads serving multiple users concurrently
  • Need for managed backups, replication, and uptime SLAs
  • Multi-tenant applications (namespaces)
  • Teams that can’t or don’t want to manage infrastructure

Performance: ~20-40ms per query on the serverless tier (network latency dominates).


pgvector is a PostgreSQL extension that adds a vector column type and similarity search operators. If your team already runs PostgreSQL, pgvector means you can add vector search without adding another service to your infrastructure.

Installation

# Using Docker (easiest for local dev)
docker run -d \
  --name postgres-vector \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  pgvector/pgvector:pg16
-- In your PostgreSQL instance
CREATE EXTENSION IF NOT EXISTS vector;

Schema Setup

-- Create a table for document chunks
CREATE TABLE document_chunks (
    id          SERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536),          -- dimension must match embedding model
    source      TEXT,
    page_num    INTEGER,
    section     TEXT,
    created_at  TIMESTAMP DEFAULT NOW()
);

-- Create an HNSW index for fast similarity search
-- (IVFFlat is the alternative; HNSW is faster for most use cases)
CREATE INDEX ON document_chunks 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Using pgvector with LangChain

# pip install langchain-postgres psycopg2-binary
from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

CONNECTION_STRING = "postgresql+psycopg2://postgres:mysecretpassword@localhost:5432/mydb"

# Create vector store (creates table automatically if it doesn't exist)
vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="product_docs",
    connection=CONNECTION_STRING,
    use_jsonb=True  # store metadata as JSONB — enables fast filtering
)

# Index documents
vectorstore.add_documents(chunks)

The SQL Superpower: Combine Vector and Relational Queries

The real advantage of pgvector is that you can combine vector similarity with SQL:

# Using raw SQL for complex queries
import psycopg2
import numpy as np

conn = psycopg2.connect(CONNECTION_STRING)
cur = conn.cursor()

# Get the query embedding
query_text = "How do I configure SSO?"
query_embedding = embeddings.embed_query(query_text)

# SQL query: semantic search + date filter in one query
cur.execute("""
    SELECT 
        content,
        source,
        page_num,
        1 - (embedding <=> %s::vector) AS similarity
    FROM document_chunks
    WHERE created_at > NOW() - INTERVAL '90 days'
      AND source != 'archived_docs.pdf'
    ORDER BY embedding <=> %s::vector
    LIMIT 5;
""", (query_embedding, query_embedding))

results = cur.fetchall()
for content, source, page, similarity in results:
    print(f"[{similarity:.3f}] {source} p.{page}: {content[:100]}")

This query does something Pinecone and ChromaDB can’t do efficiently: combine arbitrary SQL filters with vector search in a single database round-trip, with full ACID guarantees and transactional consistency.

Performance Tuning with HNSW Parameters

-- Tune HNSW index for your workload
-- m: number of connections per layer (higher = better recall, more memory)
-- ef_construction: candidates explored at build time (higher = better recall, slower build)
-- ef_search: candidates explored at query time (tune at query time, not index time)

CREATE INDEX ON document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 100);

-- Set ef_search at query time for recall/latency tradeoff
SET hnsw.ef_search = 100;  -- default is 40; increase for better recall

Benchmark with 500K vectors (PostgreSQL on AWS RDS db.r6g.large):

  • Default HNSW settings: ~15ms per query
  • Tuned HNSW (m=24, ef_search=100): ~20ms per query, ~2% better recall
  • No index (brute force): ~2.1 seconds per query

When pgvector is the right choice:

  • Already running PostgreSQL (don’t add another service)
  • Need to combine vector search with relational queries
  • Need SQL transactions (update chunks atomically)
  • On-premises deployment requirements
  • Need full-text search AND vector search (PostgreSQL has both)

Performance and Feature Comparison

ChromaDBPineconepgvector
Setup time2 minutes10 minutes30 minutes
Local devYesNo (API)Yes
Managed serviceNoYesSelf-hosted
Query latency5-10ms20-40ms10-20ms
Max scale~5M vectorsUnlimited~50M vectors practical
SQL integrationNoNoYes
Multi-tenancyCollectionsNamespacesSchemas/tables
Operational cost$0$70+/monthYour PostgreSQL cost
Learning curveLowLowMedium

The Recommendation

If you’re prototyping or learning: Start with ChromaDB. Zero friction, works immediately, same LangChain API as everything else. The code you write with ChromaDB is 95% identical to what you’d write for any other backend.

If you’re shipping to production and your team hates ops: Use Pinecone. The managed service is genuinely good, and paying $70-150/month to never think about vector database maintenance is a reasonable trade.

If you’re already on PostgreSQL: Use pgvector. One fewer service, one fewer backup strategy, one fewer monitoring dashboard. The SQL integration is powerful for non-trivial filtering requirements.

In the capstone project, we’ll use ChromaDB locally and show the Pinecone swap in a single configuration change — because that’s how LangChain’s abstraction is designed to work.