pgvector vs Pinecone: When to Self-Host Vector Search

Pinecone is the default choice for vector search. It's also $70/month minimum, and you don't control your data.

pgvector runs in Postgres. You already have Postgres.

Let's compare.

The Cost Reality

Pinecone Pricing (as of 2025)

Tier	Price	Vectors	Dimensions
Starter	Free	100k	Limited
Standard	$70/mo	1M	1536
Enterprise	Custom	Unlimited	Any

pgvector Pricing

Setup	Price	Vectors	Dimensions
Existing Postgres	$0	Millions	Any
Managed Postgres	Your current bill	Millions	Any
New RDS/Cloud SQL	~$15-50/mo	Millions	Any

If you already have Postgres, pgvector is free. You're just adding an extension.

Performance Comparison

Query Latency

Scenario	Pinecone	pgvector
100k vectors	10-20ms	15-30ms
1M vectors	15-30ms	30-50ms
10M vectors	20-40ms	50-100ms

Pinecone is faster at scale. But for most RAG applications, 50ms is fine.

Index Build Time

Vectors	Pinecone	pgvector (IVFFlat)	pgvector (HNSW)
100k	Minutes	Seconds	Minutes
1M	Minutes	Minutes	10-30 min
10M	Hours	Hours	Hours

Comparable at most scales.

When to Choose Pinecone

Choose Pinecone if:

You need 10M+ vectors — pgvector performance degrades at very large scale
You want zero ops — Pinecone is fully managed
You need advanced features — namespaces, metadata filtering, hybrid search
Cost isn't a concern — Enterprise budgets

Real example: A large e-commerce site with 50M product embeddings. Pinecone makes sense here.

When to Choose pgvector

Choose pgvector if:

You already have Postgres — no new infrastructure
You have <5M vectors — performance is fine
You want data locality — features and embeddings in one database
You're cost-conscious — startups, side projects
You need transactional consistency — embeddings with other data

Real example: A startup building a RAG chatbot with 100k documents. pgvector is plenty.

Fabra's Approach

Fabra uses pgvector by default:

from fabra.core import FeatureStore
from fabra.retrieval import retriever

store = FeatureStore()

# Index documents (embeddings generated automatically)
await store.index("docs", "doc_1", "Your document content here")
await store.index("docs", "doc_2", "Another document")

# Search
@retriever(index="docs", top_k=5)
async def search_docs(query: str):
    pass  # Auto-wired to pgvector

Why We Chose pgvector

Unified stack — features and embeddings in the same database
Simpler operations — one less service to manage
Cost efficiency — free for existing Postgres users
Good enough performance — <50ms for most use cases
Data locality — joins between vectors and other data

Local Development

For local development, Fabra stores embeddings in DuckDB:

FABRA_ENV=development  # DuckDB, no external services

For production with pgvector:

FABRA_ENV=production
FABRA_POSTGRES_URL=postgresql+asyncpg://...

Same code, different backends.

Setting Up pgvector

Option 1: Existing Postgres

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Fabra creates tables automatically

Option 2: Docker

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"

Option 3: Managed Services

Most managed Postgres services support pgvector:

Supabase — built-in
Neon — built-in
AWS RDS — enable extension
Google Cloud SQL — enable extension
Azure — enable extension

Indexing Strategies

pgvector supports two index types:

IVFFlat (Inverted File Flat)

CREATE INDEX ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Faster to build
Good for <1M vectors
Requires tuning lists parameter

HNSW (Hierarchical Navigable Small World)

CREATE INDEX ON embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Slower to build
Better query performance
Good for 1M+ vectors

Fabra uses HNSW by default for production.

Embedding Generation

Fabra handles embedding generation:

# Default: OpenAI embeddings
store = FeatureStore()

# Or configure a different provider
store = FeatureStore(
    embedding_provider="openai",  # or "cohere", "anthropic"
    embedding_model="text-embedding-3-small"
)

# Index with automatic embedding
await store.index("docs", "doc_1", "Your content here")

Embeddings are cached to avoid redundant API calls.

Hybrid Search

Both Pinecone and pgvector support hybrid search (vector + keyword).

With pgvector:

-- Combine vector similarity with full-text search
SELECT *
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('keyword')
ORDER BY embedding <-> query_embedding
LIMIT 10;

Fabra supports hybrid search in retrievers:

@retriever(index="docs", top_k=5, hybrid=True)
async def search(query: str):
    pass  # Combines vector and keyword search

Migration Path

Starting with pgvector doesn't lock you in:

Export embeddings — they're in your database
Upload to Pinecone — standard API
Update retriever config — point to new backend

Fabra will support Pinecone as an alternative backend if demand exists.

Try It

pip install "fabra-ai[ui]"

from fabra.core import FeatureStore
from fabra.retrieval import retriever

store = FeatureStore()

# Index some documents
await store.index("docs", "1", "Fabra is a feature store")
await store.index("docs", "2", "pgvector runs in Postgres")

# Search
@retriever(index="docs", top_k=2)
async def search(query: str):
    pass

results = await search("what is fabra?")
print(results)

Vector search docs →