How Claude Code Manages Context Window with Haiku Subagents

Mar 20, 2026

Problem

I was worried about Claude Code’s token usage. When working with large codebases, I assumed that reading an 8,000-line file meant burning through my context window budget. My mental model was:

Read 8,000-line file → 84,000+ tokens in context → Budget destroyed

I started manually avoiding large files, breaking them into chunks, or pasting only the sections I thought were relevant. This was tedious and often led to missing important context.

Then I read a Reddit thread where someone claimed 88-99% token reduction on massive codebases like Redis and the Linux kernel. That seemed impossible if every file read went into the main context.

What I thought was happening

My naive understanding was:

User Request → Main Agent reads entire file → Main Context fills up
                    ↓
            84,000 tokens burned
                    ↓
            Performance degrades

I pictured Claude Code reading entire files directly into the main context window, leaving little room for actual reasoning. Each file operation felt like a tax on my token budget.

I even considered strategies like:

Pre-chunking files before asking Claude to read them
Manually extracting only the functions I needed
Limiting file sizes in my projects

None of these felt right. Surely Claude Code had a better solution.

What actually happens

The Reddit thread revealed the architecture I was missing:

Main Agent Context (Sonnet/Opus)
        │
        │──→ [Task: Read file X]
        │           │
        ▼           ▼
   Orchestrates   Haiku Subagent (isolated context)
        │           │
        │←──── [Summary/Result only]
        │
   Main context preserved

Claude Code uses Haiku subagents for file reading operations. These subagents:

Operate in their own isolated context (separate from main agent)
Read and process files at a fraction of the cost
Return only relevant summaries to the main agent
Discard their context after task completion

Why this matters

Let me break down the economics:

Without Subagent (my assumption):
- Read 8,000-line file with Sonnet: ~84,000 tokens
- Cost: Expensive
- Main context: Polluted with entire file

With Subagent (actual):
- Read 8,000-line file with Haiku: ~84,000 tokens
- But Haiku costs ~1/20th of Opus per token
- Return only relevant function: ~2,700 tokens to main context
- Cost: ~20x cheaper
- Main context: Clean, focused on reasoning

The key insight: Haiku is described as “dirt cheap and very fast”. The subagent reads the entire file, finds what’s relevant, and returns only that to the main agent.

How subagents work conceptually

Here’s how I understand the flow now:

# What I thought happened (expensive)
def naive_approach():
    # Main agent reads entire file - EXPENSIVE
    large_file = read_entire_file("server.c")  # 84,193 tokens
    function = find_function(large_file, "initServer")
    return function
    # Problem: 84K tokens now in main context

# What actually happens (efficient)
def actual_architecture():
    # Main agent delegates to Haiku subagent
    result = subagent_haiku.task(
        "Find initServer function in server.c"
    )
    # Subagent reads file in ITS context (cheap)
    # Returns only the function definition to main agent
    return result.summary  # ~2,700 tokens
    # Main context stays clean for reasoning

The subagent does the heavy lifting in its own context, then hands back only what’s needed.

Real-world results

The Reddit discussion showed impressive numbers:

Redis codebase:
- Traditional approach: Millions of tokens
- With subagent architecture: 88-99% reduction

Linux kernel (30M+ lines):
- Would be impossible with naive file reading
- Subagents make it feasible to analyze

My own experience:
- Large file operations feel instant
- Context doesn't bloat
- Can work on big projects without token anxiety

Model selection strategy

Based on this architecture, I now understand why Claude Code uses different models for different tasks:

Task	Model Used	Why
File reading/searching	Haiku	Fast, cheap, isolated context
Code exploration	Haiku	90% capability, 3x cost savings
Main orchestration	Sonnet	Best coding model for reasoning
Complex architecture	Opus	Deepest reasoning for decisions

This isn’t arbitrary - it’s designed to optimize cost and performance together.

Common mistakes I made

Mistake 1: Pasting large files directly into chat

When I paste a 50KB code block, I bypass the subagent optimization. All those tokens go directly to the main context. The fix is simple: let Claude Code read files itself.

Mistake 2: Avoiding large codebases

I thought big projects would be expensive to work with. The subagent architecture means I can explore Redis or the Linux kernel without worrying about context costs.

Mistake 3: Manual file chunking

I wasted time splitting files into smaller pieces. Claude Code already handles this through subagents - my manual effort was redundant.

Mistake 4: Worrying about read costs

I assumed every file read had a heavy token cost. In reality, Haiku is so cheap that reads are essentially free for practical purposes.

Practical takeaways

After understanding this architecture, here’s how I work now:

Let Claude Code handle file reading
- Don’t paste code into chat
- Don’t manually extract sections
- Trust the subagent system
Use Haiku for exploration
- Search operations are fast and cheap
- Can grep through massive codebases
- Results come back summarized
Save main context for reasoning
- Focus on the problem, not file management
- Main agent keeps strategic context
- Subagents handle tactical file operations
Stop optimizing manually
- The architecture already optimizes
- Manual intervention often makes things worse
- Trust the system design

Context window best practices

For tasks that do require the main context:

Good for main context:
- High-level planning and reasoning
- Code review decisions
- Architecture discussions
- Multi-file coordination

Offload to subagents:
- File reading
- Code searching
- Pattern matching
- Exploratory analysis

Avoid:
- Pasting large files directly
- Long chat history with repeated content
- Redundant context (re-sending same info)

The bottom line

Claude Code’s context window management is built on a smart subagent architecture. Haiku handles the expensive file operations in isolated contexts, returning only what’s needed to the main agent. This achieves:

88-99% token reduction on large codebases
Fast performance (Haiku is optimized for speed)
Clean main context for actual reasoning
Cost efficiency (Haiku is ~1/20th the cost)

The key takeaway: Don’t manually optimize for context window - the architecture already does it for you.

Let Claude Code read files directly. Trust the subagent system. Focus your attention on the high-level reasoning tasks that matter.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion on Claude Code Context Management
👨‍💻 Claude Models Documentation
👨‍💻 Claude Code Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!