How Claude Code Manages Context Window with Haiku Subagents
Problem
I was worried about Claude Code’s token usage. When working with large codebases, I assumed that reading an 8,000-line file meant burning through my context window budget. My mental model was:
Read 8,000-line file → 84,000+ tokens in context → Budget destroyedI started manually avoiding large files, breaking them into chunks, or pasting only the sections I thought were relevant. This was tedious and often led to missing important context.
Then I read a Reddit thread where someone claimed 88-99% token reduction on massive codebases like Redis and the Linux kernel. That seemed impossible if every file read went into the main context.
What I thought was happening
My naive understanding was:
User Request → Main Agent reads entire file → Main Context fills up ↓ 84,000 tokens burned ↓ Performance degradesI pictured Claude Code reading entire files directly into the main context window, leaving little room for actual reasoning. Each file operation felt like a tax on my token budget.
I even considered strategies like:
- Pre-chunking files before asking Claude to read them
- Manually extracting only the functions I needed
- Limiting file sizes in my projects
None of these felt right. Surely Claude Code had a better solution.
What actually happens
The Reddit thread revealed the architecture I was missing:
Main Agent Context (Sonnet/Opus) │ │──→ [Task: Read file X] │ │ ▼ ▼ Orchestrates Haiku Subagent (isolated context) │ │ │←──── [Summary/Result only] │ Main context preservedClaude Code uses Haiku subagents for file reading operations. These subagents:
- Operate in their own isolated context (separate from main agent)
- Read and process files at a fraction of the cost
- Return only relevant summaries to the main agent
- Discard their context after task completion
Why this matters
Let me break down the economics:
Without Subagent (my assumption):- Read 8,000-line file with Sonnet: ~84,000 tokens- Cost: Expensive- Main context: Polluted with entire file
With Subagent (actual):- Read 8,000-line file with Haiku: ~84,000 tokens- But Haiku costs ~1/20th of Opus per token- Return only relevant function: ~2,700 tokens to main context- Cost: ~20x cheaper- Main context: Clean, focused on reasoningThe key insight: Haiku is described as “dirt cheap and very fast”. The subagent reads the entire file, finds what’s relevant, and returns only that to the main agent.
How subagents work conceptually
Here’s how I understand the flow now:
# What I thought happened (expensive)def naive_approach(): # Main agent reads entire file - EXPENSIVE large_file = read_entire_file("server.c") # 84,193 tokens function = find_function(large_file, "initServer") return function # Problem: 84K tokens now in main context
# What actually happens (efficient)def actual_architecture(): # Main agent delegates to Haiku subagent result = subagent_haiku.task( "Find initServer function in server.c" ) # Subagent reads file in ITS context (cheap) # Returns only the function definition to main agent return result.summary # ~2,700 tokens # Main context stays clean for reasoningThe subagent does the heavy lifting in its own context, then hands back only what’s needed.
Real-world results
The Reddit discussion showed impressive numbers:
Redis codebase:- Traditional approach: Millions of tokens- With subagent architecture: 88-99% reduction
Linux kernel (30M+ lines):- Would be impossible with naive file reading- Subagents make it feasible to analyze
My own experience:- Large file operations feel instant- Context doesn't bloat- Can work on big projects without token anxietyModel selection strategy
Based on this architecture, I now understand why Claude Code uses different models for different tasks:
| Task | Model Used | Why |
|---|---|---|
| File reading/searching | Haiku | Fast, cheap, isolated context |
| Code exploration | Haiku | 90% capability, 3x cost savings |
| Main orchestration | Sonnet | Best coding model for reasoning |
| Complex architecture | Opus | Deepest reasoning for decisions |
This isn’t arbitrary - it’s designed to optimize cost and performance together.
Common mistakes I made
Mistake 1: Pasting large files directly into chat
When I paste a 50KB code block, I bypass the subagent optimization. All those tokens go directly to the main context. The fix is simple: let Claude Code read files itself.
Mistake 2: Avoiding large codebases
I thought big projects would be expensive to work with. The subagent architecture means I can explore Redis or the Linux kernel without worrying about context costs.
Mistake 3: Manual file chunking
I wasted time splitting files into smaller pieces. Claude Code already handles this through subagents - my manual effort was redundant.
Mistake 4: Worrying about read costs
I assumed every file read had a heavy token cost. In reality, Haiku is so cheap that reads are essentially free for practical purposes.
Practical takeaways
After understanding this architecture, here’s how I work now:
-
Let Claude Code handle file reading
- Don’t paste code into chat
- Don’t manually extract sections
- Trust the subagent system
-
Use Haiku for exploration
- Search operations are fast and cheap
- Can grep through massive codebases
- Results come back summarized
-
Save main context for reasoning
- Focus on the problem, not file management
- Main agent keeps strategic context
- Subagents handle tactical file operations
-
Stop optimizing manually
- The architecture already optimizes
- Manual intervention often makes things worse
- Trust the system design
Context window best practices
For tasks that do require the main context:
Good for main context:- High-level planning and reasoning- Code review decisions- Architecture discussions- Multi-file coordination
Offload to subagents:- File reading- Code searching- Pattern matching- Exploratory analysis
Avoid:- Pasting large files directly- Long chat history with repeated content- Redundant context (re-sending same info)The bottom line
Claude Code’s context window management is built on a smart subagent architecture. Haiku handles the expensive file operations in isolated contexts, returning only what’s needed to the main agent. This achieves:
- 88-99% token reduction on large codebases
- Fast performance (Haiku is optimized for speed)
- Clean main context for actual reasoning
- Cost efficiency (Haiku is ~1/20th the cost)
The key takeaway: Don’t manually optimize for context window - the architecture already does it for you.
Let Claude Code read files directly. Trust the subagent system. Focus your attention on the high-level reasoning tasks that matter.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion on Claude Code Context Management
- 👨💻 Claude Models Documentation
- 👨💻 Claude Code Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments