Skip to content

How Claude Code Manages Context Window with Haiku Subagents

Problem

I was worried about Claude Code’s token usage. When working with large codebases, I assumed that reading an 8,000-line file meant burning through my context window budget. My mental model was:

Read 8,000-line file → 84,000+ tokens in context → Budget destroyed

I started manually avoiding large files, breaking them into chunks, or pasting only the sections I thought were relevant. This was tedious and often led to missing important context.

Then I read a Reddit thread where someone claimed 88-99% token reduction on massive codebases like Redis and the Linux kernel. That seemed impossible if every file read went into the main context.

What I thought was happening

My naive understanding was:

My Assumed Architecture
User Request → Main Agent reads entire file → Main Context fills up
84,000 tokens burned
Performance degrades

I pictured Claude Code reading entire files directly into the main context window, leaving little room for actual reasoning. Each file operation felt like a tax on my token budget.

I even considered strategies like:

  1. Pre-chunking files before asking Claude to read them
  2. Manually extracting only the functions I needed
  3. Limiting file sizes in my projects

None of these felt right. Surely Claude Code had a better solution.

What actually happens

The Reddit thread revealed the architecture I was missing:

Actual Architecture
Main Agent Context (Sonnet/Opus)
│──→ [Task: Read file X]
│ │
▼ ▼
Orchestrates Haiku Subagent (isolated context)
│ │
│←──── [Summary/Result only]
Main context preserved

Claude Code uses Haiku subagents for file reading operations. These subagents:

  1. Operate in their own isolated context (separate from main agent)
  2. Read and process files at a fraction of the cost
  3. Return only relevant summaries to the main agent
  4. Discard their context after task completion

Why this matters

Let me break down the economics:

Cost Comparison
Without Subagent (my assumption):
- Read 8,000-line file with Sonnet: ~84,000 tokens
- Cost: Expensive
- Main context: Polluted with entire file
With Subagent (actual):
- Read 8,000-line file with Haiku: ~84,000 tokens
- But Haiku costs ~1/20th of Opus per token
- Return only relevant function: ~2,700 tokens to main context
- Cost: ~20x cheaper
- Main context: Clean, focused on reasoning

The key insight: Haiku is described as “dirt cheap and very fast”. The subagent reads the entire file, finds what’s relevant, and returns only that to the main agent.

How subagents work conceptually

Here’s how I understand the flow now:

conceptual_flow.py
# What I thought happened (expensive)
def naive_approach():
# Main agent reads entire file - EXPENSIVE
large_file = read_entire_file("server.c") # 84,193 tokens
function = find_function(large_file, "initServer")
return function
# Problem: 84K tokens now in main context
# What actually happens (efficient)
def actual_architecture():
# Main agent delegates to Haiku subagent
result = subagent_haiku.task(
"Find initServer function in server.c"
)
# Subagent reads file in ITS context (cheap)
# Returns only the function definition to main agent
return result.summary # ~2,700 tokens
# Main context stays clean for reasoning

The subagent does the heavy lifting in its own context, then hands back only what’s needed.

Real-world results

The Reddit discussion showed impressive numbers:

Token Reduction Benchmarks
Redis codebase:
- Traditional approach: Millions of tokens
- With subagent architecture: 88-99% reduction
Linux kernel (30M+ lines):
- Would be impossible with naive file reading
- Subagents make it feasible to analyze
My own experience:
- Large file operations feel instant
- Context doesn't bloat
- Can work on big projects without token anxiety

Model selection strategy

Based on this architecture, I now understand why Claude Code uses different models for different tasks:

TaskModel UsedWhy
File reading/searchingHaikuFast, cheap, isolated context
Code explorationHaiku90% capability, 3x cost savings
Main orchestrationSonnetBest coding model for reasoning
Complex architectureOpusDeepest reasoning for decisions

This isn’t arbitrary - it’s designed to optimize cost and performance together.

Common mistakes I made

Mistake 1: Pasting large files directly into chat

When I paste a 50KB code block, I bypass the subagent optimization. All those tokens go directly to the main context. The fix is simple: let Claude Code read files itself.

Mistake 2: Avoiding large codebases

I thought big projects would be expensive to work with. The subagent architecture means I can explore Redis or the Linux kernel without worrying about context costs.

Mistake 3: Manual file chunking

I wasted time splitting files into smaller pieces. Claude Code already handles this through subagents - my manual effort was redundant.

Mistake 4: Worrying about read costs

I assumed every file read had a heavy token cost. In reality, Haiku is so cheap that reads are essentially free for practical purposes.

Practical takeaways

After understanding this architecture, here’s how I work now:

  1. Let Claude Code handle file reading

    • Don’t paste code into chat
    • Don’t manually extract sections
    • Trust the subagent system
  2. Use Haiku for exploration

    • Search operations are fast and cheap
    • Can grep through massive codebases
    • Results come back summarized
  3. Save main context for reasoning

    • Focus on the problem, not file management
    • Main agent keeps strategic context
    • Subagents handle tactical file operations
  4. Stop optimizing manually

    • The architecture already optimizes
    • Manual intervention often makes things worse
    • Trust the system design

Context window best practices

For tasks that do require the main context:

Context Usage Guidelines
Good for main context:
- High-level planning and reasoning
- Code review decisions
- Architecture discussions
- Multi-file coordination
Offload to subagents:
- File reading
- Code searching
- Pattern matching
- Exploratory analysis
Avoid:
- Pasting large files directly
- Long chat history with repeated content
- Redundant context (re-sending same info)

The bottom line

Claude Code’s context window management is built on a smart subagent architecture. Haiku handles the expensive file operations in isolated contexts, returning only what’s needed to the main agent. This achieves:

  • 88-99% token reduction on large codebases
  • Fast performance (Haiku is optimized for speed)
  • Clean main context for actual reasoning
  • Cost efficiency (Haiku is ~1/20th the cost)

The key takeaway: Don’t manually optimize for context window - the architecture already does it for you.

Let Claude Code read files directly. Trust the subagent system. Focus your attention on the high-level reasoning tasks that matter.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments