Why AI Coding Agents Burn Tokens: The Context Re-Reading Loop

Mar 10, 2026

I tracked my Claude Code usage for a month. The numbers shocked me: 100 million tokens consumed. But here’s what really blew my mind - 99.4% of those were INPUT tokens. Claude was reading far more than it was writing.

For every 1 token written, approximately 166 tokens were read. That’s a ratio of roughly 165:1.

Every time Claude Code makes a move - reads a file, runs a command, edits code - it needs the full picture fed back in. It’s not building understanding over time. It’s: forget everything, re-read everything, write 50 lines, forget everything again.

The bottleneck isn’t inference speed. It’s the re-reading loop.

What’s Really Happening

AI coding agents like Claude Code lack persistent memory. Every action requires re-reading all relevant context because the model has no way to “remember” what it learned in previous turns without it being present in the current context window.

This architecture means:

99.4% of tokens are input (reading) vs 0.6% output (writing)
Massive token consumption for seemingly simple tasks
Costs scale linearly with context size, not complexity
Efficiency gains require fundamentally different architecture

The Re-Reading Loop Problem

Here’s how AI coding agents actually work:

1. User makes a request
2. Agent reads relevant files (tokens consumed)
3. Agent understands the codebase (temporarily, in context)
4. Agent makes changes (output tokens - minimal)
5. Context is reset (no persistent memory)
6. Next request - Repeat from step 2

The inefficiency compounds quickly:

Request #1:    Read 50 files → Understand → Edit 1 file
Request #2:    Read 50 files AGAIN → Understand AGAIN → Edit 1 file
Request #3:    Read 50 files AGAIN → Understand AGAIN → Edit 1 file
...
Request #1,289: Read 50 files AGAIN → ...

Real Numbers From My Tracking

100,000,000 total tokens
99,400,000 input tokens (reading)
600,000 output tokens (writing)
Ratio: ~165:1 (read to write)

What This Means for Costs

If you pay $3 per million input tokens:

Reading cost:  $298.20 (99.4M tokens × $3/M)
Writing cost:  $0.90 (0.6M tokens × $15/M)
Total:         ~$300 for what feels like "simple coding tasks"

Human vs AI: The Memory Gap

The difference is stark. Here’s how humans work:

Day 1:   Learn the codebase → Build mental model
Day 2:   Remember key patterns → Apply them
Day 30:  Deep understanding → Fast iterations
Day 90:  Expert-level → Minimal re-learning needed

And here’s how AI agents work:

Request 1:     Read codebase → No persistent memory
Request 2:     Read codebase again → No persistent memory
Request 100:   Read codebase again → No persistent memory
Request 1,000: Read codebase again → Still no memory

The fundamental difference:

Humans build compressed, evolving mental models
AI agents have no persistent project memory
Every request is essentially “day one” for the AI

It’s like hiring a contractor who forgets your house layout every time they leave the room. Every task requires a full walkthrough of the entire property again.

Why 99.4% Input Tokens

The technical explanation breaks down to four key factors:

Context windows are finite - Claude Code has ~200k tokens
No persistent storage - Understanding dies with each turn
Compressing context - Summaries lose critical details
Re-reading is mandatory - Can’t skip files you “should” know

The Compounding Problem

Small project (10 files):
  - Each request: Read ~10 files
  - Manageable token usage

Medium project (100 files):
  - Each request: Read ~20-50 relevant files
  - Token usage explodes

Large project (1000+ files):
  - Each request: Navigate massive codebase
  - Constant re-reading of core modules
  - Token costs become prohibitive

Why output is so small:

Writing code is actually efficient
Most changes are small (add function, fix bug, refactor module)
The inefficiency is entirely on the input side

Current Workarounds (And Why They’re Limited)

Workaround 1: CLAUDE.md Files

Store project context in markdown. The AI reads this instead of re-reading all files.

Limitation: Static, must be manually updated.

Workaround 2: Memory Files

The agent writes notes about the project. Next session reads the notes.

Limitation: Not compressed understanding, just more text to read.

Workaround 3: Context Compaction

Summarize old context to fit in window.

Limitation: Loses nuance, “why” decisions fade.

The Fundamental Issue

All workarounds are still “reading” solutions. They don’t create true persistent memory - they just reduce what needs to be re-read.

Implications for the Future

What Needs to Change

Persistent project memory (not just saved files)
Compressed, evolving understanding
Efficient retrieval without re-reading everything

When This Changes

Token costs could drop 10-100x
AI coding becomes viable for large enterprises
True “AI pair programmer” experience

Current Trajectory

Models getting larger context windows (helps but doesn’t solve)
Better retrieval systems (RAG, vector databases)
Specialized coding models (but still no persistent memory)

Frequently Asked Questions

Why do AI coding agents use so many tokens?

AI coding agents lack persistent memory, requiring complete context re-reading on every turn. This means 99%+ of tokens are input (reading) rather than output (writing).

Is Claude Code inefficient with tokens?

Not inefficient in design - it’s an architectural limitation. Without persistent memory, re-reading context is the only way to maintain understanding across turns.

Will this problem be solved soon?

Context windows are growing (now 200k+ tokens), but true persistent memory requires fundamental architectural changes. Expect incremental improvements, not immediate solutions.

How can I reduce token usage with Claude Code?

Keep projects small and focused
Use CLAUDE.md files for static context
End sessions at natural breakpoints
Don’t expect the AI to “remember” between sessions

Summary

The 100M token tracking experiment revealed a fundamental truth about AI coding agents: they’re not inefficient in their design - they’re working within an architectural constraint. Without persistent memory, the re-reading loop is inevitable.

The 165:1 read-to-write ratio isn’t a bug. It’s the cost of an agent that starts fresh on every turn. Understanding this helps you:

Predict costs accurately - It’s not about the code complexity, it’s about the context size
Optimize strategically - Reduce what needs re-reading, not how much is written
Set realistic expectations - AI coding agents aren’t “forgetful” - they simply have no memory to forget

The future of AI coding isn’t just larger context windows or faster inference. It’s persistent, compressed understanding that grows with your project. Until then, every token spent on re-reading is the price of starting fresh.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Claude Context Windows
👨‍💻 Claude Pricing

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!