At What Token Count Does Claude's Performance Degrade?

Mar 14, 2026

I was excited when Claude Opus 4.6 announced a 1M context window at the same price point. More context is always better, right? Then I started noticing something strange in my long sessions.

The Problem

I was working on a large refactoring task. I’d loaded up my context with multiple files, documentation, and several rounds of back-and-forth discussion. Around 180k tokens, Claude started responding with:

“Actually… let me reconsider… Actually, I think…”

Over and over. The “Actually… Actually…” pattern.

The model wasn’t failing. It was degrading. Quietly.

What the Community Reports

I dug into a Reddit discussion about the 1M context announcement. The top comment wasn’t celebrating - it was asking:

“pretty huge, but how’s the performance drop off?”

Here’s what practitioners reported:

Token Range	What Happens	Source
~140k	”Opus gets dementia”	Anecdotal report
~180k	”Actually… Actually…” loops begin	Community consensus
250k-500k	”Quality starts to tank”	Multiple reports
400k-500k	”Loses track of earlier instructions”	7 upvotes

One comment cut to the heart of it:

“In our experience the model starts losing track of earlier instructions somewhere around 400-500k tokens even when the context window technically allows more”

Important clarification: Claude doesn’t “forget” early context. It deprioritizes it when newer information conflicts.

┌─────────────────────────────────────────────────┐
│  Beginning Context                              │
│  - System prompts                               │
│  - Initial instructions                         │
│  - Highest priority                             │
├─────────────────────────────────────────────────┤
│  Middle Context                                 │
│  - Gets less attention during retrieval         │
│  - "Lost in the middle" phenomenon              │
│  - Lowest retrieval accuracy                    │
├─────────────────────────────────────────────────┤
│  End Context                                    │
│  - Most recent interactions                     │
│  - High priority                                │
│  - Competes with earlier instructions           │
└─────────────────────────────────────────────────┘

The 80% Rule

I found guidance in my own project’s performance rules:

Avoid last 20% of context window for:

Large-scale refactoring

Feature implementation spanning multiple files

Debugging complex interactions

This aligns with community reports. The degradation doesn’t happen at the limit - it happens in the upper portion.

Window Size    Safe Threshold    For What Task?
─────────────────────────────────────────────────
200k           ~160k (80%)       Simple tasks
200k           ~100k (50%)       Complex reasoning
1M             ~800k (80%)       Simple tasks (risky)
1M             ~400k-500k        Complex reasoning (recommended)

Signs of Context Degradation

I’ve learned to recognize when Claude’s context is overloaded:

1. The “Actually…” Loop

The model keeps reconsidering without making progress. This signals it’s struggling to reconcile conflicting context.

2. Forgotten Instructions

System prompt says “Use TypeScript strict mode” but later output shows plain JavaScript without types.

3. Quality Regression

Earlier responses: detailed, well-structured Later responses: shorter, generic, less nuanced

4. Pattern Inconsistency

Earlier: correctly uses existing codebase patterns Later: suggests patterns contradicting earlier decisions

Context Hygiene Strategy

I now manage context like a scarce resource:

Before Complex Task:
1. Fresh session if context >50% full
2. Add only relevant files
3. State critical constraints in current message

During Long Session:
1. Monitor response quality
2. If degradation detected:
   - Summarize current state
   - Start new session with summary
   - Restate critical constraints

After Task Completion:
1. Clear context for unrelated tasks
2. Keep summary if tasks are related

"I have 1M context, so let me add the entire codebase"
- More noise, lower signal-to-noise ratio
- Higher chance of conflicting information
- Deprioritization of important early context

"Keep the session going forever"
- Quality degrades over time
- Earlier instructions deprioritized
- Inconsistent model behavior

The Hidden Cost of 1M Context

The 1M window at the same price seems like a pure win. But consider:

Factor	200k Window	1M Window
Technical capacity	200k tokens	1M tokens
Reliable capacity	~160k (80%)	~400k-500k practically
Token cost per reliable unit	Standard	Potentially higher
Debugging difficulty	Moderate	Higher (more context to analyze)

The 1M context is a capacity feature, not a quality guarantee. You can fit more in, but the model won’t weigh it equally.

What I Do Now

For complex reasoning (refactoring, debugging, multi-file changes):

Stay under 50% of window
Actively prune irrelevant files
Start fresh sessions for unrelated tasks

For simple tasks (single-file edits, documentation):

80% threshold is acceptable
Less sensitive to positioning issues

For long sessions:

Watch for degradation signals
Periodically summarize and reset
Move critical instructions to recent context

Summary

In this post, I investigated when Claude’s performance degrades based on real user reports. The key point is that quality drops at 250k-500k tokens despite the 1M capacity—manage context actively and stay under 80% for simple tasks, under 50% for complex reasoning.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Opus 4.6 now defaults to 1M context! (same pricing)

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!