How to Configure pgvector HNSW Index for Fast RAG Queries
I deployed my RAG application to production, and the vector similarity search was slow. Not just “a bit laggy” slow—queries were taking 2-3 seconds for simple document retrieval. This was unacceptable for a real-time chat interface.
After profiling and debugging, I discovered the culprit: my pgvector HNSW index was using default parameters that didn’t match my dataset size and query patterns.
The Problem
I had set up pgvector following a basic tutorial:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(1536));
-- This was my mistakeCREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);The index worked, but as my dataset grew to 500,000+ embeddings, query latency increased dramatically. The default HNSW parameters weren’t designed for my use case.
Understanding HNSW Parameters
HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor algorithm. It builds a graph structure where each node connects to its neighbors. The key insight: you trade exact recall for query speed.
Two parameters control index quality:
+------------------+ +------------------+| m = 16 | | m = 32 || | | || o---o---o | | o---o---o---o || | | | | | | X | X | | || o---o---o | | o---o---o---o || | | | | X | X | | || o | | o---o---o---o |+------------------+ +------------------+ Fewer connections More connections Faster build Better recall Lower memory Higher memorym = Number of connections per node in the graph
- Default: 16
- Higher = better recall, more memory, slower build
- Lower = faster build, lower recall
ef_construction = Search depth during index build
- Default: 64
- Higher = better index quality, slower build
- Lower = faster build, potentially worse index
The Solution
For my 500K document dataset, I recreated the index with tuned parameters:
-- Drop the old indexDROP INDEX documents_embedding_idx;
-- Create optimized index for 500K-1M vectorsCREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 64);
-- Critical: Analyze after bulk operationsANALYZE documents;The build took longer (about 15 minutes vs 5 minutes with defaults), but query latency dropped from 2-3 seconds to 50-100 milliseconds.
Query-Time Tuning
Here’s something tutorials often skip: HNSW has a query-time parameter that affects both speed and accuracy.
-- Set this per query sessionSET hnsw.ef_search = 100;
-- Now run your similarity searchSELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarityFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'::vectorLIMIT 10;The ef_search parameter controls how many nodes the algorithm visits during search:
| ef_search | Recall | Query Time |
|---|---|---|
| 40 | ~90% | ~20ms |
| 100 | ~95% | ~50ms |
| 200 | ~98% | ~100ms |
I tested different values and found ef_search = 100 gives me 95% recall at acceptable latency for my use case.
Dataset Size Guidelines
After testing with various dataset sizes, here’s what I’ve found works:
Dataset Size | m | ef_construction | Build Time | Memory---------------------|----|--------------------|-------------|--------< 100K vectors | 16 | 64 | ~2 min | 1x100K - 1M vectors | 16 | 64 | ~15 min | 1x1M - 10M vectors | 32 | 128 | ~2 hours | 2x> 10M vectors | 48 | 200 | ~8 hours | 3xHNSW vs IVFFlat
I also considered IVFFlat, another pgvector index type. Here’s the tradeoff:
HNSW IVFFlat +-------------+ +-------------+ | Better recall | Lower memory | Slower build | Faster build | No training needed | Needs training data | Good for real-time | Good for batch +-------------+ +-------------+For RAG applications with real-time queries, HNSW is the better choice. IVFFlat performs well for batch similarity searches but requires retraining when data changes significantly.
Common Mistakes I Made
-
Not setting ef_search - I relied on the default, which was too low for my accuracy requirements.
-
Forgetting to ANALYZE - After bulk inserting documents, the query planner had stale statistics. Always run
ANALYZE documents;after bulk operations. -
Wrong operator class - Make sure your index operator matches your query operator:
vector_cosine_opsfor<=>(cosine distance)vector_l2_opsfor<->(L2 distance)vector_ip_opsfor<#>(inner product)
-- Correct: index and query operator matchCREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);SELECT * FROM documents ORDER BY embedding <=> query_vector;
-- WRONG: mismatch will cause slow sequential scansCREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);SELECT * FROM documents ORDER BY embedding <=> query_vector; -- Uses seq scan!- Not testing with production load - My test dataset was 10K vectors. Production had 500K. The performance difference was massive.
Monitoring Index Health
I added a simple monitoring query to check index usage:
SELECT schemaname, tablename, indexname, idx_scan as index_scans, idx_tup_read as tuples_read, idx_tup_fetch as tuples_fetchedFROM pg_stat_user_indexesWHERE indexname LIKE '%embedding%';If idx_scan is low compared to your query volume, PostgreSQL might be choosing sequential scans. This usually means:
- Statistics are stale (run ANALYZE)
ef_searchis too high (slow queries -> planner avoids index)- Dataset is too small (sequential scan is faster anyway)
Final Setup Checklist
Before deploying to production, verify:
-- 1. Check index exists with correct parametersSELECT indexdef FROM pg_indexesWHERE tablename = 'documents' AND indexname LIKE '%hnsw%';
-- 2. Verify statistics are currentSELECT last_analyze FROM pg_stat_user_tablesWHERE relname = 'documents';
-- 3. Test query performance with EXPLAIN ANALYZEEXPLAIN ANALYZESELECT id, contentFROM documentsORDER BY embedding <=> '[...]'::vectorLIMIT 10;
-- Should show "Index Scan" not "Seq Scan"Why This Matters
Query latency directly impacts user experience in RAG applications. A 2-second delay makes the chat feel sluggish. A 50ms response feels instant.
The configuration changes I made took about 20 minutes to implement (mostly index rebuild time), but they transformed my RAG pipeline from “barely usable” to “production ready.”
For smaller datasets, default parameters might work fine. But as you scale, tuning these parameters becomes essential. The good news: pgvector gives you the control you need—you just have to use it.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 pgvector HNSW Index Documentation
- 👨💻 HNSW Algorithm Explained
- 👨💻 PostgreSQL Index Types Comparison
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments