Why AI Coding Agents Burn Tokens: The Context Re-Reading Loop
I tracked my Claude Code usage for a month. The numbers shocked me: 100 million tokens consumed. But here’s what really blew my mind - 99.4% of those were INPUT tokens. Claude was reading far more than it was writing.
For every 1 token written, approximately 166 tokens were read. That’s a ratio of roughly 165:1.
Every time Claude Code makes a move - reads a file, runs a command, edits code - it needs the full picture fed back in. It’s not building understanding over time. It’s: forget everything, re-read everything, write 50 lines, forget everything again.
The bottleneck isn’t inference speed. It’s the re-reading loop.
What’s Really Happening
AI coding agents like Claude Code lack persistent memory. Every action requires re-reading all relevant context because the model has no way to “remember” what it learned in previous turns without it being present in the current context window.
This architecture means:
- 99.4% of tokens are input (reading) vs 0.6% output (writing)
- Massive token consumption for seemingly simple tasks
- Costs scale linearly with context size, not complexity
- Efficiency gains require fundamentally different architecture
The Re-Reading Loop Problem
Here’s how AI coding agents actually work:
1. User makes a request2. Agent reads relevant files (tokens consumed)3. Agent understands the codebase (temporarily, in context)4. Agent makes changes (output tokens - minimal)5. Context is reset (no persistent memory)6. Next request - Repeat from step 2The inefficiency compounds quickly:
Request #1: Read 50 files → Understand → Edit 1 fileRequest #2: Read 50 files AGAIN → Understand AGAIN → Edit 1 fileRequest #3: Read 50 files AGAIN → Understand AGAIN → Edit 1 file...Request #1,289: Read 50 files AGAIN → ...Real Numbers From My Tracking
- 100,000,000 total tokens
- 99,400,000 input tokens (reading)
- 600,000 output tokens (writing)
- Ratio: ~165:1 (read to write)
What This Means for Costs
If you pay $3 per million input tokens:
Reading cost: $298.20 (99.4M tokens × $3/M)Writing cost: $0.90 (0.6M tokens × $15/M)Total: ~$300 for what feels like "simple coding tasks"Human vs AI: The Memory Gap
The difference is stark. Here’s how humans work:
Day 1: Learn the codebase → Build mental modelDay 2: Remember key patterns → Apply themDay 30: Deep understanding → Fast iterationsDay 90: Expert-level → Minimal re-learning neededAnd here’s how AI agents work:
Request 1: Read codebase → No persistent memoryRequest 2: Read codebase again → No persistent memoryRequest 100: Read codebase again → No persistent memoryRequest 1,000: Read codebase again → Still no memoryThe fundamental difference:
- Humans build compressed, evolving mental models
- AI agents have no persistent project memory
- Every request is essentially “day one” for the AI
It’s like hiring a contractor who forgets your house layout every time they leave the room. Every task requires a full walkthrough of the entire property again.
Why 99.4% Input Tokens
The technical explanation breaks down to four key factors:
- Context windows are finite - Claude Code has ~200k tokens
- No persistent storage - Understanding dies with each turn
- Compressing context - Summaries lose critical details
- Re-reading is mandatory - Can’t skip files you “should” know
The Compounding Problem
Small project (10 files): - Each request: Read ~10 files - Manageable token usage
Medium project (100 files): - Each request: Read ~20-50 relevant files - Token usage explodes
Large project (1000+ files): - Each request: Navigate massive codebase - Constant re-reading of core modules - Token costs become prohibitiveWhy output is so small:
- Writing code is actually efficient
- Most changes are small (add function, fix bug, refactor module)
- The inefficiency is entirely on the input side
Current Workarounds (And Why They’re Limited)
Workaround 1: CLAUDE.md Files
Store project context in markdown. The AI reads this instead of re-reading all files.
Limitation: Static, must be manually updated.
Workaround 2: Memory Files
The agent writes notes about the project. Next session reads the notes.
Limitation: Not compressed understanding, just more text to read.
Workaround 3: Context Compaction
Summarize old context to fit in window.
Limitation: Loses nuance, “why” decisions fade.
The Fundamental Issue
All workarounds are still “reading” solutions. They don’t create true persistent memory - they just reduce what needs to be re-read.
Implications for the Future
What Needs to Change
- Persistent project memory (not just saved files)
- Compressed, evolving understanding
- Efficient retrieval without re-reading everything
When This Changes
- Token costs could drop 10-100x
- AI coding becomes viable for large enterprises
- True “AI pair programmer” experience
Current Trajectory
- Models getting larger context windows (helps but doesn’t solve)
- Better retrieval systems (RAG, vector databases)
- Specialized coding models (but still no persistent memory)
Frequently Asked Questions
Why do AI coding agents use so many tokens?
AI coding agents lack persistent memory, requiring complete context re-reading on every turn. This means 99%+ of tokens are input (reading) rather than output (writing).
Is Claude Code inefficient with tokens?
Not inefficient in design - it’s an architectural limitation. Without persistent memory, re-reading context is the only way to maintain understanding across turns.
Will this problem be solved soon?
Context windows are growing (now 200k+ tokens), but true persistent memory requires fundamental architectural changes. Expect incremental improvements, not immediate solutions.
How can I reduce token usage with Claude Code?
- Keep projects small and focused
- Use CLAUDE.md files for static context
- End sessions at natural breakpoints
- Don’t expect the AI to “remember” between sessions
Summary
The 100M token tracking experiment revealed a fundamental truth about AI coding agents: they’re not inefficient in their design - they’re working within an architectural constraint. Without persistent memory, the re-reading loop is inevitable.
The 165:1 read-to-write ratio isn’t a bug. It’s the cost of an agent that starts fresh on every turn. Understanding this helps you:
- Predict costs accurately - It’s not about the code complexity, it’s about the context size
- Optimize strategically - Reduce what needs re-reading, not how much is written
- Set realistic expectations - AI coding agents aren’t “forgetful” - they simply have no memory to forget
The future of AI coding isn’t just larger context windows or faster inference. It’s persistent, compressed understanding that grows with your project. Until then, every token spent on re-reading is the price of starting fresh.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Claude Context Windows
- 👨💻 Claude Pricing
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments