Skip to content

Why Does AI Coding Productivity Drop in Large Codebases?

Problem

I’ve been using AI coding assistants extensively over the past year, and I noticed something troubling: my productivity gains weren’t consistent. In greenfield projects, I felt like a superhero—coding 10x faster than before. But as projects grew, that superpower faded. In mid-sized codebases, I was down to 2-3x productivity. In large legacy systems, AI barely helped at all.

This wasn’t just my imagination. A Reddit thread confirmed I wasn’t alone:

“Claude has been great for spinning things up quickly. 10x improvement at least. When codebase gets to mid-size territory, productivity drops to 2-3x. In large codebases, pretty much forget it.”

Why does this happen? And more importantly, what can we do about it?

The Context Window Bottleneck

The root cause isn’t model capability—it’s context management. Large language models have finite context windows, but more importantly, they lack the mental model that developers build over time.

Let me break down the math:

Context Window Math
Small Project: 10,000 lines (~250K tokens)
├─ Fits in context window
└─ Productivity: 10x
Mid-sized: 100,000 lines (~2.5M tokens)
├─ Exceeds context by 12x
└─ Productivity: 2-3x
Large Enterprise: 1,000,000+ lines (~25M+ tokens)
├─ Exceeds context by 100x+
└─ Productivity: 1-1.5x (near parity)

The numbers are stark. But the real problem goes deeper than token limits.

What Actually Breaks Down

When I tried to work on a large codebase with AI assistance, I saw three specific failures:

1. Architecture Blindness

The AI doesn’t understand the architectural decisions made months ago. It suggests patterns that contradict established conventions. One Reddit user put it well:

“System is huge, logic is complex, CC begins struggling with simple things, missing old, new, and deprecated approaches”

I experienced this firsthand. I asked the AI to add a feature, and it proposed using a library we had already deprecated in favor of a custom implementation. It had no way of knowing—we never documented that decision.

2. Mental Model Deficit

Here’s the uncomfortable truth:

“If you’ve generated the whole codebase, your brain has no idea how anything works”

When AI generates code, you lose the understanding that comes from writing it yourself. This creates a vicious cycle: you rely more on AI, understand less, and the AI’s suggestions become less useful because you can’t validate them.

3. Context Dilution

Even when you provide context, the AI gets overwhelmed. It starts mixing approaches, forgetting constraints, and generating code that “works” but breaks subtle invariants.

Strategies That Actually Work

After experimenting with different approaches, here’s what helped me regain productivity:

Work Within Bounded Contexts

Instead of pointing AI at the entire codebase, I narrow its scope. One commenter noted:

“You can work on a large codebase but if you’re doing this as part of a large team, I actually think it works well because every team focuses only on a limited part”

I create focused sessions where I only provide files relevant to the specific feature. This dramatically improves accuracy.

Context Engineering

I use .claudeignore to exclude irrelevant files and pre-filter context. I also:

  • Provide architectural decision records (ADRs) as context
  • Include pattern documentation upfront
  • Reference specific files rather than asking broad questions

Incremental Ownership

I no longer let AI generate everything. Instead:

  1. I write the core logic myself
  2. AI generates boilerplate
  3. I review and understand each piece
  4. I document as I go

This maintains my mental model while still leveraging AI for tedious work.

Documentation as Context

I treat documentation as a first-class context tool:

Documentation Stack for AI Context
Architecture Decision Records (ADRs)
├─ Why we chose X over Y
├─ What patterns we use where
└─ What's deprecated and why
API Contracts
├─ Input/output schemas
├─ Error handling patterns
└─ Authentication flows
Pattern Library
├─ Common code patterns
├─ Domain-specific conventions
└─ Testing patterns

The Practical Trade-off

I’ve learned to set realistic expectations:

Codebase SizeAI ProductivityBest Strategy
<10K lines10xFull AI assistance
10-100K lines3-5xBounded contexts + documentation
>100K lines1.5-2xAI for boilerplate only

The key insight: context management is a skill. The better I provide context, the more useful AI becomes.

Summary

In this post, I explored why AI coding productivity drops in large codebases and what to do about it. The key point is that context management is the bottleneck, not model capability. By working within bounded contexts, engineering context carefully, and maintaining ownership of core logic, you can extend AI’s usefulness well beyond greenfield projects.

The productivity drop is real, but it’s not inevitable. With the right strategies, you can keep the 10x productivity even as your codebase grows.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments