Why Does RAG Return Irrelevant Results? A Practical Guide to Fixing Retrieval Quality
Problem
My RAG system seemed to work fine during testing. But in production, users started complaining about irrelevant results. The retrieval would return chunks that were semantically related but contextually wrong.
For example, when a user asked about “model deployment costs,” the system returned chunks about “model training costs” and “model architecture”—all semantically similar, but none answering the actual question.
u/Lucky-Duck-2968 on Reddit described this well: “Retrieval looks fine until irrelevant or slightly off chunks start creeping in.”
Environment
- Python 3.11
- OpenAI embeddings (text-embedding-3-small)
- Pinecone vector database
- Basic cosine similarity retrieval
What happened?
I used the standard approach from tutorials:
def retrieve(query, vector_store, k=5): results = vector_store.similarity_search(query, k=k) return resultsThis worked for straightforward queries. But it failed on:
- Semantically similar but contextually different: “deployment costs” vs “training costs”
- Domain-specific terminology: Vector similarity doesn’t understand that “K8s” means “Kubernetes”
- Multi-hop questions: Queries requiring information from multiple documents
The core issue: Vector similarity measures semantic relatedness, not relevance. u/Sea-Wedding9940 on Reddit said it plainly: “Retrieval quality and context handling make or break the whole system.”
How to solve it?
Solution 1: Add Reranking
I added a reranking step after initial retrieval:
from cohere import Client
def retrieve_with_rerank(query, vector_store, top_k=20, rerank_top_n=5): # Initial retrieval with higher k initial_results = vector_store.similarity_search(query, k=top_k)
# Rerank using cross-attention model co = Client("your-api-key") docs = [doc.page_content for doc in initial_results]
reranked = co.rerank( query=query, documents=docs, top_n=rerank_top_n, model="rerank-english-v2.0" )
return [initial_results[r.index] for r in reranked.results]Rerankers use cross-attention between query and document, which is much better at determining actual relevance than vector similarity alone.
Solution 2: Document-Type-Specific Strategies
u/yafitzdev on Reddit made an important discovery: “Each doc type needed a different retrieval harness altogether.”
I created different configurations for different document types:
from enum import Enumfrom dataclasses import dataclass
class DocumentType(Enum): TECHNICAL_DOC = "technical" CONVERSATION = "conversation" STRUCTURED = "structured" NARRATIVE = "narrative"
@dataclassclass RetrievalConfig: chunk_size: int chunk_overlap: int use_reranking: bool hybrid_search: bool top_k_initial: int
RETRIEVAL_CONFIGS = { DocumentType.TECHNICAL_DOC: RetrievalConfig( chunk_size=512, chunk_overlap=100, use_reranking=True, hybrid_search=True, top_k_initial=20 ), DocumentType.CONVERSATION: RetrievalConfig( chunk_size=256, chunk_overlap=50, use_reranking=True, hybrid_search=False, top_k_initial=15 ), DocumentType.NARRATIVE: RetrievalConfig( chunk_size=1024, chunk_overlap=200, use_reranking=True, hybrid_search=False, top_k_initial=15 ),}Solution 3: Hybrid Search
I combined vector search with keyword search (BM25):
def hybrid_search(query, vector_store, bm25_index, alpha=0.5): # Vector search vector_results = vector_store.similarity_search(query, k=10)
# Keyword search keyword_results = bm25_index.search(query, k=10)
# Reciprocal rank fusion return reciprocal_rank_fusion( vector_results, keyword_results, alpha=alpha )This captures exact matches and domain-specific terminology that embeddings might miss.
Solution 4: Improved Chunking
I added metadata and context headers to each chunk:
def create_chunks_with_metadata(documents): chunks = [] for doc in documents: doc_chunks = text_splitter.split_text(doc.page_content) for i, chunk in enumerate(doc_chunks): chunks.append({ "content": chunk, "metadata": { **doc.metadata, "chunk_index": i, "section": extract_section(doc, chunk), "summary": generate_summary(chunk) # For retrieval } }) return chunksThe reason
I think the key reason for irrelevant results is that tutorials present retrieval as a solved problem. They show:
Query → Embed → Vector Search → DoneBut real retrieval needs:
Query → Query Analysis → Multiple Retrieval Strategies ↓ Reranking ↓ Result Filtering ↓ Context AssemblyThe OP on Reddit highlighted “Handling irrelevant retrieval” as one of the hard parts that tutorials gloss over. This gap between tutorial simplicity and production reality causes most RAG failures.
Common Mistakes
Based on my experience and the Reddit discussion:
- Using default chunk sizes without considering document structure
- Single retrieval strategy for all content types
- No reranking—relying only on vector similarity
- Ignoring metadata—not leveraging document structure
- Testing on easy queries—not including edge cases in evaluation
Summary
In this post, I showed why RAG returns irrelevant results and how to fix retrieval quality. The key point is that vector similarity alone cannot distinguish between semantically related and actually relevant information.
Fix this by implementing reranking, document-type-specific strategies, and hybrid search. Your LLM’s output quality is limited by your retrieval quality.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments