Why RAG Failed for Large Codebases: The Evolution to Agentic Coding

Mar 24, 2026

Problem

When I tried using AI coding tools on my large codebase, I kept getting useless answers. I’d ask “How does authentication work?” and the AI would return isolated functions without any context about imports, middleware, or database calls.

I tried RAG-based tools. They chunked my code, embedded it, and did semantic search. But the results were garbage because code isn’t like documents.

Why RAG Fails for Code

The fundamental problem: documents are self-contained, but code files are interconnected.

When I ask about authentication, here’s what I need:

auth.ts → imports session.ts → calls middleware in server.ts → queries users table

But RAG does this:

[chunk from auth.ts] → semantic similarity matched "auth"
[chunk from utils.ts] → semantic similarity matched "auth" (false positive)

The dependency chain is completely lost. RAG can’t follow imports because it treats each chunk independently.

The Context Window Problem

Even if retrieval worked, there’s another issue. My codebase has 150,000+ lines. The typical context window is 64k-128k tokens. I can’t fit everything.

So tools try to find the “needle in the haystack.” But for code, I often need the whole haystack—the surrounding context, the imports, the callers, the tests.

How Cursor Solved It (Partially)

Cursor took a different approach. Instead of naive RAG, they built sophisticated indexing:

Codebase → Merkle Tree → Simhash → Cloud Index Reuse
                                      ↓
                          Incremental Updates (only changed branches)

When I start Cursor on my project:

Client computes a Merkle tree hash of the entire codebase
Server matches this against existing team indexes (92% average similarity)
Reuses the existing index instead of re-vectorizing everything
Only updated branches get re-embedded

This is clever. But Cursor still operates in the “needle in haystack” paradigm. It’s better retrieval, but still retrieval.

Why Claude Code is Different

Claude Code took a fundamentally different approach: give the AI tools to navigate, not just search.

Here’s the comparison:

Query: "How does authentication work?"
     ↓
Search index for "auth" chunks
     ↓
Return top-K matching chunks
     ↓
Model generates answer from chunks

Query: "How does authentication work?"
     ↓
Agent reads auth.ts
     ↓
Agent sees import → reads session.ts
     ↓
Agent sees middleware call → reads server.ts
     ↓
Agent traces DB queries → reads schema
     ↓
Agent returns integrated answer with full context

The key difference: the agent follows the actual code structure, not semantic similarity.

The Progressive Authorization Pattern

One thing that made me nervous about AI coding tools was trusting them with my code. Cursor requires reviewing every change. That’s exhausting.

Claude Code introduced progressive authorization:

First tool call: "Allow this file read?" → [Yes] [No] [Yes, don't ask again for reads]

Pattern: AI "earns trust" through correct actions
Result: Eventually fully autonomous operation

This mirrors how I onboard junior developers. I watch them closely at first. As they demonstrate competence, I grant more autonomy. By treating the AI the same way, Claude Code enables true hands-off operation.

Sub-Agents with Separate Context Windows

The latest evolution is parallel sub-agents. Each agent gets its own context window:

Main Agent
    ├── Sub-agent 1 (context window A): Analyze authentication flow
    ├── Sub-agent 2 (context window B): Review database schema
    └── Sub-agent 3 (context window C): Check test coverage

Result: 3x faster than sequential analysis

This solves the context window limitation differently. Instead of trying to fit everything into one window, Claude Code spawns multiple agents that each focus on a slice of the problem.

What This Means for Developers

The shift from RAG to agentic coding changes what’s possible:

RAG-based tools	Agentic tools
Answer questions about code	Navigate and modify code
Return chunks	Follow dependencies
One-shot retrieval	Iterative exploration
Limited by index quality	Limited by agent reasoning
No execution capability	Run tests, git operations, builds

I used to spend time reviewing every AI suggestion in Cursor. With Claude Code, I grant permission once for a class of operations and let it run. The mental overhead is dramatically lower.

Summary

In this post, I explained why RAG fails for large codebases. The key points:

Code has dependencies; documents don’t. Chunking destroys relationships.
Context windows can’t hold enterprise codebases.
Cursor solved this with better indexing, but still operates in retrieval mode.
Claude Code’s agentic approach lets AI navigate code iteratively, following actual dependencies.
Progressive authorization and sub-agents enable autonomous operation at scale.

The future of AI coding isn’t better search. It’s agents that can work with code, not just find it.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!