Skip to content

What is Hybrid Search in RAG? When to Use It (Complete Guide)

My RAG system was failing on simple queries. Users searched for “error code 404” and got nothing relevant. But when they searched for “page not found problem”, suddenly results appeared. The issue? Pure vector search was too semantic—it missed exact term matches that users expected.

That’s when I learned about hybrid search, and honestly, it changed everything.

The Problem with Single-Method Retrieval

I started with pure vector search because it was the “modern” approach. Semantic embeddings, transformer models, the works. But I kept running into these frustrating scenarios:

retrieval-failures.txt
Query: "How to fix CORS error"
Vector Results: Articles about cross-origin, web security, browser policies
User Wanted: Specific CORS error troubleshooting guide
Query: "PostgreSQL connection refused port 5432"
Vector Results: Database connection guides, PostgreSQL tutorials
User Wanted: Exact error message solution
Query: "async await not working"
Vector Results: Asynchronous programming concepts, JavaScript tutorials
User Wanted: Specific debugging steps for await issues

Vector search excels at semantic similarity but struggles with exact terms, error codes, and specific identifiers. Meanwhile, traditional keyword search (BM25) has the opposite problem—great for exact matches, terrible at understanding intent.

comparison-diagram.txt
┌─────────────────────────────────────────────────────────────┐
│ RETRIEVAL SPECTRUM │
├─────────────────────────────────────────────────────────────┤
│ │
│ BM25 (Sparse) Vector (Dense) │
│ ───────────── ────────────── │
│ ✓ Exact term matching ✓ Semantic understanding │
│ ✓ Fast and interpretable ✓ Handles synonyms │
│ ✓ No model needed ✓ Concept matching │
│ ✗ Misses synonyms ✗ Misses exact terms │
│ ✗ No semantic understanding ✗ Black box scoring │
│ │
│ HYBRID SEARCH │
│ ───────────── │
│ Both strengths combined │
│ via result fusion │
│ │
└─────────────────────────────────────────────────────────────┘

What Hybrid Search Actually Does

Hybrid search runs multiple retrieval methods in parallel and merges their results. The most common combination is BM25 (sparse retrieval) + vector search (dense retrieval), but you can blend any retrieval methods.

Here’s the workflow:

hybrid-search-flow.txt
┌──────────────────────────────────────────────────────────────┐
│ │
│ Query: "How to fix CORS error in Express.js" │
│ │
└─────────────────┬───────────────────────────────────────────┘
┌─────────────────────────────────────┐
│ PARALLEL RETRIEVAL │
└─────────────────────────────────────┘
┌──────────┴──────────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ BM25 │ │ Vector │
│ Retrieval │ │ Retrieval │
└─────────────┘ └─────────────┘
│ │
▼ ▼
Top-K Docs Top-K Docs
(Term-based) (Semantic)
│ │
└──────────┬──────────┘
┌─────────────────────────────────────┐
│ RESULT FUSION (RRF) │
│ Combine and re-rank results │
└─────────────────────────────────────┘
Final Top-K Documents

The magic happens in the fusion step. Different strategies exist, but Reciprocal Rank Fusion (RRF) has become the default choice for most implementations.

How Result Fusion Works

I tested three main fusion approaches:

1. Reciprocal Rank Fusion (RRF)

RRF is elegantly simple. For each document, you calculate a score based on its rank in each retrieval method:

rrf_fusion.py
def reciprocal_rank_fusion(results_dict, k=60):
"""
RRF score = sum(1 / (k + rank)) for each retrieval method
k is a smoothing constant (default 60 is standard)
"""
fused_scores = {}
for method_name, doc_list in results_dict.items():
for rank, doc in enumerate(doc_list, start=1):
doc_id = doc['id']
if doc_id not in fused_scores:
fused_scores[doc_id] = {
'doc': doc,
'rrf_score': 0
}
# Add contribution from this ranking
fused_scores[doc_id]['rrf_score'] += 1 / (k + rank)
# Sort by fused score
sorted_results = sorted(
fused_scores.values(),
key=lambda x: x['rrf_score'],
reverse=True
)
return sorted_results
# Example usage
bm25_results = [
{'id': 'doc1', 'content': 'CORS error fix...', 'score': 0.95},
{'id': 'doc2', 'content': 'Express middleware...', 'score': 0.87},
{'id': 'doc3', 'content': 'Security headers...', 'score': 0.72},
]
vector_results = [
{'id': 'doc4', 'content': 'Cross-origin setup...', 'score': 0.91},
{'id': 'doc1', 'content': 'CORS error fix...', 'score': 0.89},
{'id': 'doc5', 'content': 'API authentication...', 'score': 0.83},
]
all_results = {
'bm25': bm25_results,
'vector': vector_results
}
final_ranking = reciprocal_rank_fusion(all_results)
# doc1 appears in both, so it gets highest RRF score

The beauty of RRF: it doesn’t need normalized scores. BM25 scores and vector similarity scores are incomparable, but ranks are always comparable.

2. Weighted Score Fusion

This approach requires score normalization:

weighted_fusion.py
import numpy as np
def normalize_scores(results):
"""Normalize scores to [0, 1] range using min-max scaling"""
if not results:
return results
scores = [doc['score'] for doc in results]
min_score = min(scores)
max_score = max(scores)
range_score = max_score - min_score
if range_score == 0:
range_score = 1 # Avoid division by zero
for doc in results:
doc['normalized_score'] = (doc['score'] - min_score) / range_score
return results
def weighted_fusion(bm25_results, vector_results, bm25_weight=0.4, vector_weight=0.6):
"""
Combine normalized scores with configurable weights.
Common splits:
- 50/50: Equal importance
- 40/60: Slight preference for semantic
- 30/70: Strong semantic preference
"""
# Normalize both result sets
bm25_normalized = normalize_scores(bm25_results.copy())
vector_normalized = normalize_scores(vector_results.copy())
# Create unified doc dictionary
all_docs = {}
for doc in bm25_normalized:
all_docs[doc['id']] = {
'doc': doc,
'bm25_score': doc['normalized_score'],
'vector_score': 0
}
for doc in vector_normalized:
if doc['id'] in all_docs:
all_docs[doc['id']]['vector_score'] = doc['normalized_score']
else:
all_docs[doc['id']] = {
'doc': doc,
'bm25_score': 0,
'vector_score': doc['normalized_score']
}
# Calculate weighted final score
for doc_id, data in all_docs.items():
data['final_score'] = (
bm25_weight * data['bm25_score'] +
vector_weight * data['vector_score']
)
# Sort by final score
sorted_results = sorted(
all_docs.values(),
key=lambda x: x['final_score'],
reverse=True
)
return sorted_results

The problem with weighted fusion: normalization is tricky. BM25 scores can range widely, and vector cosine similarities cluster around 0.7-0.9 for “similar” documents. Making these comparable requires careful tuning.

3. Re-ranking with Cross-Encoders

The most accurate but expensive approach:

reranker_fusion.py
from sentence_transformers import CrossEncoder
def rerank_results(query, combined_results, top_k=10):
"""
Use cross-encoder to re-rank combined results.
Slower but more accurate than RRF or weighted fusion.
"""
# Load cross-encoder model
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
# Prepare query-document pairs
pairs = [[query, doc['content']] for doc in combined_results]
# Get re-ranking scores
scores = reranker.predict(pairs)
# Sort by cross-encoder scores
scored_results = list(zip(combined_results, scores))
scored_results.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_results[:top_k]]

Cross-encoders are slow (each query-document pair needs full transformer inference) but extremely accurate. Best for smaller result sets or when accuracy is critical.

fusion-comparison.txt
┌────────────────────────────────────────────────────────────────┐
│ FUSION STRATEGY COMPARISON │
├─────────────────┬──────────────┬─────────────┬────────────────┤
│ Method │ Speed │ Accuracy │ Tuning Effort │
├─────────────────┼──────────────┼─────────────┼────────────────┤
│ RRF │ Fast │ Good │ Minimal (k=60) │
│ Weighted Score │ Fast │ Good │ Medium │
│ Cross-Encoder │ Slow │ Excellent │ Minimal │
└─────────────────┴──────────────┴─────────────┴────────────────┘

After running production RAG systems for months, I’ve developed a simple decision framework:

Use Hybrid Search When:

  • Building production RAG applications (default choice)
  • Users search for error codes, product names, or specific identifiers
  • Document corpus has both technical content and conceptual explanations
  • You want both precision and recall
  • Query patterns are diverse (some semantic, some exact-match)

Use Pure Vector Search When:

  • Document corpus is homogeneous (all similar content types)
  • Queries are primarily semantic/conceptual
  • Dealing with multilingual content where exact terms don’t match
  • Speed is critical and you can tolerate some misses

Use Pure BM25 When:

  • Documents are short and keywords are sufficient
  • You need maximum interpretability for debugging
  • Computational resources are extremely limited
  • Exact term matching is all that matters (e.g., log search)

From the Reddit discussion on hybrid search, the consensus is clear:

“Logs 100%, other than that, always hybrid”

“Blending vector with fulltext searching in mysql/aurora or postgresql is absolutely a great idea”

“By default, hybrid or semantic search. Works for the majority of cases”

This aligns with my production experience. Hybrid search has become the standard for a reason—it works.

Implementation Approaches

PostgreSQL with pgvector

If you’re already using PostgreSQL, adding hybrid search is straightforward:

postgres_hybrid.py
import psycopg2
from pgvector.psycopg2 import register_vector
def hybrid_search_postgres(query_text, query_embedding, top_k=10):
"""
Hybrid search using PostgreSQL with pgvector extension.
Combines full-text search (tsvector) with vector similarity.
"""
conn = psycopg2.connect("postgresql://user:pass@localhost/db")
register_vector(conn)
cur = conn.cursor()
# Single query combining BM25 and vector search
# Uses RRF-style combination at database level
cur.execute("""
WITH bm25_results AS (
SELECT
id,
content,
ts_rank_cd(to_tsvector('english', content), plainto_tsquery(%s)) as bm25_score,
1 as source
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery(%s)
ORDER BY bm25_score DESC
LIMIT %s
),
vector_results AS (
SELECT
id,
content,
1 - (embedding <=> %s) as vector_score,
2 as source
FROM documents
ORDER BY embedding <=> %s
LIMIT %s
),
combined AS (
SELECT id, content, bm25_score as score, source FROM bm25_results
UNION ALL
SELECT id, content, vector_score as score, source FROM vector_results
)
SELECT
id,
content,
SUM(1.0 / (60 + ROW_NUMBER() OVER (
PARTITION BY source ORDER BY score DESC
))) as rrf_score
FROM combined
GROUP BY id, content
ORDER BY rrf_score DESC
LIMIT %s;
""", (query_text, query_text, top_k, query_embedding, query_embedding, top_k, top_k))
results = cur.fetchall()
conn.close()
return results

The key advantage: everything runs in-database. No need to manage separate search services.

Elasticsearch/OpenSearch

Elasticsearch 8.x and OpenSearch both support hybrid search natively:

elasticsearch_hybrid.py
from elasticsearch import Elasticsearch
def hybrid_search_elasticsearch(query_text, query_vector, index="documents", top_k=10):
"""
Hybrid search using Elasticsearch RRF pipeline.
Requires Elasticsearch 8.9+ with kNN enabled.
"""
es = Elasticsearch(["http://localhost:9200"])
# Elasticsearch has built-in RRF support since 8.9
response = es.search(
index=index,
size=top_k,
query={
"bool": {
"should": [
# BM25 query
{
"match": {
"content": {
"query": query_text,
"boost": 1.0
}
}
}
]
}
},
knn={
"field": "embedding",
"query_vector": query_vector,
"k": top_k,
"num_candidates": top_k * 10
},
rank={
"rrf": {
"window_size": top_k * 2,
"rank_constant": 60
}
}
)
return response['hits']['hits']

Elasticsearch handles the RRF fusion internally—just configure both query types and specify the rank constant.

Dedicated Vector Databases (Pinecone, Weaviate, Qdrant)

Most vector databases now support hybrid search:

weaviate_hybrid.py
import weaviate
def hybrid_search_weaviate(query_text, alpha=0.5, top_k=10):
"""
Hybrid search using Weaviate.
Alpha controls BM25/vector balance (0=BM25 only, 1=vector only).
"""
client = weaviate.Client("http://localhost:8080")
response = (
client.query
.get("Document", ["content", "title"])
.with_hybrid(
query=query_text,
alpha=alpha, # 0.5 = equal balance
vector=query_vector # optional, auto-generated if omitted
)
.with_limit(top_k)
.do()
)
return response['data']['Get']['Document']

The alpha parameter controls the blend: 0 is pure BM25, 1 is pure vector, 0.5 is equal weight.

Common Pitfalls I Encountered

1. Score Normalization Hell

When I tried weighted fusion initially, my results were terrible. BM25 scores dominated everything because they weren’t normalized. The fix:

normalization-fix.py
# WRONG: Raw scores
final = 0.5 * bm25_score + 0.5 * vector_score
# BM25 might be 0-30, vector is 0-1. BM25 dominates.
# RIGHT: Normalize first
bm25_normalized = (bm25_score - bm25_min) / (bm25_max - bm25_min)
vector_normalized = (vector_score - vector_min) / (vector_max - vector_min)
final = 0.5 * bm25_normalized + 0.5 * vector_normalized

2. Ignoring Document Length Bias

BM25 naturally handles document length, but vector similarity doesn’t. Longer documents tend to have higher cosine similarity just due to dimensionality. Consider chunking uniformly or normalizing by chunk length.

3. Not Tuning the RRF Constant

The default k=60 works well, but I found lower values (k=20-40) sometimes help when you want top ranks to matter more:

rrf-tuning.py
# k=60: Smooth ranking, less emphasis on top position
# k=20: Top positions have more influence
# k=10: Very sensitive to rank order
# Test different k values on your validation set
for k in [10, 20, 40, 60, 100]:
results = reciprocal_rank_fusion(all_results, k=k)
score = evaluate_retrieval(results, ground_truth)
print(f"k={k}: {score}")

Advanced Pattern: Multi-Tool Retrieval Agents

For complex RAG systems, I’ve seen teams move beyond simple hybrid search to multi-tool retrieval:

multi-tool-retrieval.txt
┌─────────────────────────────────────────────────────────────┐
│ AGENT-BASED RETRIEVAL │
├─────────────────────────────────────────────────────────────┤
│ │
│ User Query │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ LLM │ ─── Analyzes query intent │
│ │ Agent │ │
│ └─────────────┘ │
│ │ │
│ ├──────────► BM25 Tool (exact terms) │
│ ├──────────► Vector Tool (semantic) │
│ ├──────────► Graph Tool (relationships) │
│ ├──────────► SQL Tool (structured data) │
│ └──────────► Web Search Tool (external) │
│ │
│ ▼ │
│ Synthesized Response │
│ │
└─────────────────────────────────────────────────────────────┘

From the Reddit discussion:

“ReAct, single agent with tools. Each tool is a type of retrieval”

This pattern treats each retrieval method as a tool the agent can invoke based on query analysis. More complex but more flexible.

Key Takeaways

After implementing hybrid search across multiple production RAG systems:

  1. Default to hybrid for any serious RAG application. The combination of BM25 precision and vector semantic understanding covers most query patterns.

  2. Use RRF for fusion. It’s simple, requires no score normalization, and works reliably across different domains.

  3. Tune on your data. Default parameters work well, but evaluate on your specific query patterns and documents.

  4. Consider your infrastructure. PostgreSQL users can stay with a single database. Large-scale systems might benefit from dedicated search engines.

  5. Don’t over-engineer. Start with simple hybrid search. Move to multi-tool agent retrieval only if your use case demands it.

Hybrid search isn’t experimental anymore—it’s the production standard. If you’re building RAG systems today, this should be your default retrieval approach.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments