Skip to content

How to Configure pgvector HNSW Index for Fast RAG Queries

I deployed my RAG application to production, and the vector similarity search was slow. Not just “a bit laggy” slow—queries were taking 2-3 seconds for simple document retrieval. This was unacceptable for a real-time chat interface.

After profiling and debugging, I discovered the culprit: my pgvector HNSW index was using default parameters that didn’t match my dataset size and query patterns.

The Problem

I had set up pgvector following a basic tutorial:

initial_setup.sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- This was my mistake
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

The index worked, but as my dataset grew to 500,000+ embeddings, query latency increased dramatically. The default HNSW parameters weren’t designed for my use case.

Understanding HNSW Parameters

HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor algorithm. It builds a graph structure where each node connects to its neighbors. The key insight: you trade exact recall for query speed.

Two parameters control index quality:

+------------------+ +------------------+
| m = 16 | | m = 32 |
| | | |
| o---o---o | | o---o---o---o |
| | | | | | | X | X | | |
| o---o---o | | o---o---o---o |
| | | | | X | X | | |
| o | | o---o---o---o |
+------------------+ +------------------+
Fewer connections More connections
Faster build Better recall
Lower memory Higher memory

m = Number of connections per node in the graph

  • Default: 16
  • Higher = better recall, more memory, slower build
  • Lower = faster build, lower recall

ef_construction = Search depth during index build

  • Default: 64
  • Higher = better index quality, slower build
  • Lower = faster build, potentially worse index

The Solution

For my 500K document dataset, I recreated the index with tuned parameters:

optimized_index.sql
-- Drop the old index
DROP INDEX documents_embedding_idx;
-- Create optimized index for 500K-1M vectors
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Critical: Analyze after bulk operations
ANALYZE documents;

The build took longer (about 15 minutes vs 5 minutes with defaults), but query latency dropped from 2-3 seconds to 50-100 milliseconds.

Query-Time Tuning

Here’s something tutorials often skip: HNSW has a query-time parameter that affects both speed and accuracy.

query_with_ef_search.sql
-- Set this per query session
SET hnsw.ef_search = 100;
-- Now run your similarity search
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

The ef_search parameter controls how many nodes the algorithm visits during search:

ef_searchRecallQuery Time
40~90%~20ms
100~95%~50ms
200~98%~100ms

I tested different values and found ef_search = 100 gives me 95% recall at acceptable latency for my use case.

Dataset Size Guidelines

After testing with various dataset sizes, here’s what I’ve found works:

Dataset Size | m | ef_construction | Build Time | Memory
---------------------|----|--------------------|-------------|--------
< 100K vectors | 16 | 64 | ~2 min | 1x
100K - 1M vectors | 16 | 64 | ~15 min | 1x
1M - 10M vectors | 32 | 128 | ~2 hours | 2x
> 10M vectors | 48 | 200 | ~8 hours | 3x

HNSW vs IVFFlat

I also considered IVFFlat, another pgvector index type. Here’s the tradeoff:

HNSW IVFFlat
+-------------+ +-------------+
| Better recall | Lower memory
| Slower build | Faster build
| No training needed | Needs training data
| Good for real-time | Good for batch
+-------------+ +-------------+

For RAG applications with real-time queries, HNSW is the better choice. IVFFlat performs well for batch similarity searches but requires retraining when data changes significantly.

Common Mistakes I Made

  1. Not setting ef_search - I relied on the default, which was too low for my accuracy requirements.

  2. Forgetting to ANALYZE - After bulk inserting documents, the query planner had stale statistics. Always run ANALYZE documents; after bulk operations.

  3. Wrong operator class - Make sure your index operator matches your query operator:

    • vector_cosine_ops for <=> (cosine distance)
    • vector_l2_ops for <-> (L2 distance)
    • vector_ip_ops for <#> (inner product)
operator_match.sql
-- Correct: index and query operator match
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
SELECT * FROM documents ORDER BY embedding <=> query_vector;
-- WRONG: mismatch will cause slow sequential scans
CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);
SELECT * FROM documents ORDER BY embedding <=> query_vector; -- Uses seq scan!
  1. Not testing with production load - My test dataset was 10K vectors. Production had 500K. The performance difference was massive.

Monitoring Index Health

I added a simple monitoring query to check index usage:

check_index_usage.sql
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

If idx_scan is low compared to your query volume, PostgreSQL might be choosing sequential scans. This usually means:

  • Statistics are stale (run ANALYZE)
  • ef_search is too high (slow queries -> planner avoids index)
  • Dataset is too small (sequential scan is faster anyway)

Final Setup Checklist

Before deploying to production, verify:

production_checklist.sql
-- 1. Check index exists with correct parameters
SELECT indexdef FROM pg_indexes
WHERE tablename = 'documents' AND indexname LIKE '%hnsw%';
-- 2. Verify statistics are current
SELECT last_analyze FROM pg_stat_user_tables
WHERE relname = 'documents';
-- 3. Test query performance with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT id, content
FROM documents
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- Should show "Index Scan" not "Seq Scan"

Why This Matters

Query latency directly impacts user experience in RAG applications. A 2-second delay makes the chat feel sluggish. A 50ms response feels instant.

The configuration changes I made took about 20 minutes to implement (mostly index rebuild time), but they transformed my RAG pipeline from “barely usable” to “production ready.”

For smaller datasets, default parameters might work fine. But as you scale, tuning these parameters becomes essential. The good news: pgvector gives you the control you need—you just have to use it.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments