Why Does RAG Return Irrelevant Results? A Practical Guide to Fixing Retrieval Quality

Mar 26, 2026

Problem

My RAG system seemed to work fine during testing. But in production, users started complaining about irrelevant results. The retrieval would return chunks that were semantically related but contextually wrong.

For example, when a user asked about “model deployment costs,” the system returned chunks about “model training costs” and “model architecture”—all semantically similar, but none answering the actual question.

u/Lucky-Duck-2968 on Reddit described this well: “Retrieval looks fine until irrelevant or slightly off chunks start creeping in.”

Environment

Python 3.11
OpenAI embeddings (text-embedding-3-small)
Pinecone vector database
Basic cosine similarity retrieval

What happened?

I used the standard approach from tutorials:

def retrieve(query, vector_store, k=5):
    results = vector_store.similarity_search(query, k=k)
    return results

This worked for straightforward queries. But it failed on:

Semantically similar but contextually different: “deployment costs” vs “training costs”
Domain-specific terminology: Vector similarity doesn’t understand that “K8s” means “Kubernetes”
Multi-hop questions: Queries requiring information from multiple documents

The core issue: Vector similarity measures semantic relatedness, not relevance. u/Sea-Wedding9940 on Reddit said it plainly: “Retrieval quality and context handling make or break the whole system.”

How to solve it?

Solution 1: Add Reranking

I added a reranking step after initial retrieval:

from cohere import Client

def retrieve_with_rerank(query, vector_store, top_k=20, rerank_top_n=5):
    # Initial retrieval with higher k
    initial_results = vector_store.similarity_search(query, k=top_k)

    # Rerank using cross-attention model
    co = Client("your-api-key")
    docs = [doc.page_content for doc in initial_results]

    reranked = co.rerank(
        query=query,
        documents=docs,
        top_n=rerank_top_n,
        model="rerank-english-v2.0"
    )

    return [initial_results[r.index] for r in reranked.results]

Rerankers use cross-attention between query and document, which is much better at determining actual relevance than vector similarity alone.

Solution 2: Document-Type-Specific Strategies

u/yafitzdev on Reddit made an important discovery: “Each doc type needed a different retrieval harness altogether.”

I created different configurations for different document types:

from enum import Enum
from dataclasses import dataclass

class DocumentType(Enum):
    TECHNICAL_DOC = "technical"
    CONVERSATION = "conversation"
    STRUCTURED = "structured"
    NARRATIVE = "narrative"

@dataclass
class RetrievalConfig:
    chunk_size: int
    chunk_overlap: int
    use_reranking: bool
    hybrid_search: bool
    top_k_initial: int

RETRIEVAL_CONFIGS = {
    DocumentType.TECHNICAL_DOC: RetrievalConfig(
        chunk_size=512, chunk_overlap=100,
        use_reranking=True, hybrid_search=True, top_k_initial=20
    ),
    DocumentType.CONVERSATION: RetrievalConfig(
        chunk_size=256, chunk_overlap=50,
        use_reranking=True, hybrid_search=False, top_k_initial=15
    ),
    DocumentType.NARRATIVE: RetrievalConfig(
        chunk_size=1024, chunk_overlap=200,
        use_reranking=True, hybrid_search=False, top_k_initial=15
    ),
}

Solution 3: Hybrid Search

I combined vector search with keyword search (BM25):

def hybrid_search(query, vector_store, bm25_index, alpha=0.5):
    # Vector search
    vector_results = vector_store.similarity_search(query, k=10)

    # Keyword search
    keyword_results = bm25_index.search(query, k=10)

    # Reciprocal rank fusion
    return reciprocal_rank_fusion(
        vector_results,
        keyword_results,
        alpha=alpha
    )

This captures exact matches and domain-specific terminology that embeddings might miss.

Solution 4: Improved Chunking

I added metadata and context headers to each chunk:

def create_chunks_with_metadata(documents):
    chunks = []
    for doc in documents:
        doc_chunks = text_splitter.split_text(doc.page_content)
        for i, chunk in enumerate(doc_chunks):
            chunks.append({
                "content": chunk,
                "metadata": {
                    **doc.metadata,
                    "chunk_index": i,
                    "section": extract_section(doc, chunk),
                    "summary": generate_summary(chunk)  # For retrieval
                }
            })
    return chunks

The reason

I think the key reason for irrelevant results is that tutorials present retrieval as a solved problem. They show:

Query → Embed → Vector Search → Done

But real retrieval needs:

Query → Query Analysis → Multiple Retrieval Strategies
                    ↓
              Reranking
                    ↓
              Result Filtering
                    ↓
              Context Assembly

The OP on Reddit highlighted “Handling irrelevant retrieval” as one of the hard parts that tutorials gloss over. This gap between tutorial simplicity and production reality causes most RAG failures.

Common Mistakes

Based on my experience and the Reddit discussion:

Using default chunk sizes without considering document structure
Single retrieval strategy for all content types
No reranking—relying only on vector similarity
Ignoring metadata—not leveraging document structure
Testing on easy queries—not including edge cases in evaluation

Summary

In this post, I showed why RAG returns irrelevant results and how to fix retrieval quality. The key point is that vector similarity alone cannot distinguish between semantically related and actually relevant information.

Fix this by implementing reranking, document-type-specific strategies, and hybrid search. Your LLM’s output quality is limited by your retrieval quality.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: Most RAG tutorials are misleading
👨‍💻 Cohere Rerank API

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!