Skip to content

Vectorless RAG vs Vector RAG: When to Use Each Approach

I deployed my first RAG system last year. It used vector embeddings for everything. Six months later, I watched it fail spectacularly on a simple query: “Find Q3 earnings reports for tech companies.”

The system returned random quarterly filings. Some were Q1. Some were Q2. One was an S-1 filing from a company that went public three years ago. All had cosine similarities above 0.85.

The problem? My embeddings couldn’t tell the difference between “Q3 earnings” and “quarterly report” and “SEC filing.” They all looked the same to the model.

This sent me down a rabbit hole: Do we actually need vectors for RAG? The answer surprised me.

The Vector Search Trap

When I started building RAG systems, I treated vector search as the default. Everyone was doing it:

User Query → Embedding → Vector Search → Context → LLM → Answer

Simple. Elegant. Wrong for many use cases.

The Embedding Crowding Problem

I discovered this the hard way with financial documents. My embedding model (a popular off-the-shelf choice) clustered all corporate filings together:

Query: "Apple Q3 2024 earnings"
Retrieved documents (cosine similarity):
- Apple Q3 2024 10-Q .......... 0.91
- Apple Q2 2024 10-Q .......... 0.89
- Apple Q3 2023 10-Q .......... 0.88
- Microsoft Q3 2024 10-Q ..... 0.87
- Apple S-1 filing (1980) ..... 0.84

All five results are technically relevant. But only one is correct. Vector similarity has no concept of “right answer” - it only measures semantic proximity.

The Binary Match Problem

I tried tuning the similarity threshold. This created a new problem:

Threshold 0.90: Misses relevant docs (low recall)
Threshold 0.85: Includes wrong quarters (low precision)
Threshold 0.80: Returns garbage (unusable)

Different queries need different thresholds. “Capital of France” works at 0.9. “Best restaurants in Paris” needs 0.7. There’s no universal cutoff.

The Linear Fragility Problem

My RAG pipeline had four steps:

1. LLM translates prompt to search query
2. Vector database retrieves documents
3. LLM generates answer from context
4. Return answer to user

Each step can fail silently. When the LLM in step 1 misparses “Paris” as Paris, Texas instead of Paris, France, the entire chain produces wrong results. No feedback loop. No recovery.

The Vectorless Alternative

I started experimenting with BM25 and keyword-based retrieval. Not as a replacement for vectors, but as a different approach with different trade-offs.

What Vectorless RAG Actually Means

Vectorless RAG uses traditional information retrieval techniques:

BM25 scoring for term frequency/inverse document frequency
Exact matching for identifiers, codes, and structured fields
Regular expressions for pattern matching
Faceted filtering for categorical data

No embeddings. No vector databases. Just text.

Where It Excels

I rebuilt my financial document search with BM25:

Query: "Q3 2024 earnings Apple"
BM25 scoring:
- "Q3" matches quarter field exactly
- "2024" matches year field exactly
- "Apple" matches company name field exactly
- Combined score prioritizes exact matches
Result: Apple Q3 2024 10-Q (correct on first try)

The structured query understood that “Q3” and “2024” weren’t just similar concepts - they were exact filters.

Real-World Use Cases

From my experience and industry discussions:

Use Vectorless RAG (BM25/Keyword) when:

  1. Exact matches dominate - Logs, error codes, SKUs, identifiers
  2. Structured, predictable data - Documentation with clear hierarchies, API references
  3. Small datasets - Vector search overhead isn’t justified
  4. High precision required - Compliance documents, legal text, medical codes
  5. Real-time indexing needed - Logs streaming in that need immediate searchability

Use Vector RAG when:

  1. Semantic understanding matters - “Comfortable seating” should match “plush sofas”
  2. Natural language queries - Users ask questions, not keyword lists
  3. Cross-language retrieval - Find similar content across languages
  4. Discovery/exploration - Users don’t know exact terminology
  5. Large, diverse corpora - Semantic similarity helps navigate ambiguity

The Hybrid Approach (What Actually Works)

After months of experimentation, I landed on hybrid search. As one Reddit user put it: “Logs 100%, other than that, always hybrid.”

Why Hybrid Wins

Hybrid search combines multiple signals:

Query Pipeline:
1. LLM extracts structured filters (company, date, doc type)
2. BM25 retrieves exact matches
3. Vector search retrieves semantic matches
4. Reciprocal rank fusion combines scores
5. Agent evaluates results, reformulates if needed

This architecture addresses both the binary match problem and the linear fragility problem.

Implementation Pattern

Here’s the pattern I now use:

┌─────────────────────────────────────────────────────┐
│ User Query │
│ "Show me Apple Q3 earnings" │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Query Understanding │
│ │
│ Extract: │
│ - company: "Apple" (exact match) │
│ - quarter: "Q3" (exact match) │
│ - year: inferred current (2024) │
│ - intent: "financial report" (semantic) │
└──────────────────────┬──────────────────────────────┘
┌─────────────┴─────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ BM25 Search │ │ Vector Search │
│ (Exact Match) │ │ (Semantic) │
│ │ │ │
│ company=Apple │ │ "earnings" │
│ quarter=Q3 │ │ "financial" │
│ year=2024 │ │ "report" │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────┬───────────────┘
┌─────────────────────────────────────────────────────┐
│ Reciprocal Rank Fusion │
│ │
│ BM25 rank: 1, Vector rank: 3 → Combined: 0.83 │
│ BM25 rank: 2, Vector rank: 1 → Combined: 0.79 │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Result Evaluation │
│ │
│ Agent checks: Does this match the query intent? │
│ If no → Reformulate and retry │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Final Answer │
└─────────────────────────────────────────────────────┘

Agentic Resilience

The key improvement: loops instead of linear pipelines.

Classic RAG:
Query → Search → Retrieve → Answer → Done (or fail)
Agentic RAG:
Query → Search → Evaluate → [Good? Return] / [Bad? Reformulate → Search]

When retrieval fails, the agent can see why and try again. This simple change dramatically improved my system’s reliability.

Common Mistakes I Made

Mistake 1: Starting with Vectors

I treated vector search as the default. Wrong.

Build structured query understanding first. Add vectors as a fallback. Vectors are one similarity measure among many, not the foundation.

Mistake 2: Single Similarity Score

I used one cosine similarity threshold for all queries. Different queries need different cutoffs. Different attributes need different similarity measures.

Company name → Exact match (Levenshtein distance ≤ 1)
Quarter → Exact match (must equal Q1-Q4)
Year → Exact match (must equal YYYY)
Document type → Semantic match (vector similarity)
Topic → Semantic match (vector similarity)

Mistake 3: Ignoring Domain Specificity

I used off-the-shelf embeddings without evaluating fit. Financial data, medical terminology, and legal documents require domain-aware similarity.

My embeddings couldn’t distinguish between an S-1 and a 10-Q. They were trained on general text, not SEC filings.

Mistake 4: Linear RAG Pipelines

No feedback loops. When retrieval failed, the agent couldn’t see why or try again.

Build evaluation and reformulation into the loop. Let the agent examine results and decide: “Is this what I expected?”

Mistake 5: No Diversity

I returned the top 10 most similar documents. They were often nearly identical. Users need diverse perspectives, not 10 variations of the same result.

Bad: 10 documents all about Apple Q3 2024 iPhone sales
Good: Mix of iPhone, Mac, Services, and comparative analysis

Practical Recommendations

After all this experimentation, here’s what I recommend:

  1. Start with query understanding - Extract structured filters before any retrieval
  2. Use BM25 as the baseline - It’s fast, interpretable, and handles exact matches perfectly
  3. Add vectors for semantic queries - When users ask natural language questions
  4. Combine with reciprocal rank fusion - Balance precision and recall
  5. Build evaluation loops - Let agents check results and retry
  6. Monitor failure modes - When does each approach fail?

For logs and metrics: Vectorless all the way. Exact matches, structured data, real-time indexing.

For user-facing search: Hybrid always. Balance precision (BM25) with recall (vectors).

For agentic RAG: Multiple signals with loops. Agents need diverse inputs to make good decisions.

I learned most of this from painful trial and error, but these resources helped:

  • Doug Turnbull’s analysis on RAG without embeddings opened my eyes to query understanding first
  • Reddit discussions on LocalLLaMA provided real-world production experiences
  • PageIndex demonstrates a working vectorless RAG implementation

The key insight: RAG isn’t about replacing search with vectors. It’s about using the right retrieval method for each query component.

Vectors are a tool, not a solution. Sometimes the best tool is no vectors at all.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments