Skip to content

Why Did Claude Usage Limit Drop After 1M Context Update?

The Problem: 16% of My Limit Gone in One Word

I was having a long conversation with Claude. Good session, lots of context built up. Then I stepped away for about an hour. When I came back, I sent a simple message: “update?”.

Boom. 16% of my 5-hour context window - gone. Just like that.

I thought my usage limit had been secretly reduced. It felt like something was wrong with my account. After digging into what happened, I discovered the real culprit: the 1M context update changed how token consumption works, and I walked right into an expensive trap.

What Actually Happened

On March 13th, 2026, Anthropic increased the maximum context size from 200k to 1M tokens - a 5x increase. This is great for long conversations, but it introduced a hidden cost mechanism that can bite you hard.

Here’s the key insight: Prompt caching has asymmetric costs.

Token Cost Breakdown
+------------------+------------+
| Operation | Multiplier |
+------------------+------------+
| Input tokens | 1x |
| Output tokens | 5x |
| Thinking tokens | 5x |
| Cache WRITE | 1.25x-2x | <- Expensive!
| Cache READ | 0.1x | <- Cheap!
+------------------+------------+

When you’re actively using Claude with a large context, the cache is “hot” - you’re paying the cheap 0.1x rate for cache reads. But caches have a TTL (time-to-live). When you step away for too long, the cache expires and goes “cold.”

Here’s where it gets expensive:

When you send a message after the cache expires, Claude has to re-cache your entire conversation history. That 500k token context you built up? It now costs 1.25x-2x the normal rate to re-cache.

The Cache Expiration Timeline
Time 0:00 - You're chatting, cache is hot
Cost: 0.1x (cheap cache reads)
|
Time 0:30 - You step away
Cache TTL: ~5 min to 1 hour (varies by tier)
|
Time 1:30 - You return after 1 hour
Cache is now COLD (expired)
|
Time 1:31 - You send "update?"
Cost: 1.25x-2x for ENTIRE context re-cache
Result: 16% of your limit gone in one message

The Math Behind the Spike

Let me break down what happened with actual numbers.

cache_cost_estimate.py
def estimate_cache_cost(context_tokens: int, cache_type: str) -> float:
"""Calculate token costs for different cache states."""
multipliers = {
"write": 1.5, # Average of 1.25x-2x range
"read": 0.1, # Cache hit
"nocache": 1.0 # Standard input
}
return context_tokens * multipliers[cache_type]
# Real-world example
context = 500_000 # 500k tokens accumulated
print(f"Active chat (cache read): {estimate_cache_cost(context, 'read'):,.0f} tokens")
print(f"Returning after break (cache write): {estimate_cache_cost(context, 'write'):,.0f} tokens")
# Output:
# Active chat (cache read): 50,000 tokens
# Returning after break (cache write): 750,000 tokens

That’s a 15x difference between chatting with a hot cache vs. triggering a re-cache.

If you’re on a plan with a 5-hour limit (roughly 450k input tokens), re-caching 500k tokens at 1.5x would cost you around 750k “effective tokens” - which explains the massive usage spike.

Why This Feels Like a Limit Reduction

Before the 1M context update, you physically couldn’t build up a context large enough for this to matter. With a 200k max context, even if the cache expired, re-caching 200k tokens was manageable.

Now? You can accumulate 500k+ tokens in a conversation. When that cache goes cold, the re-cache cost becomes significant.

Before vs After 1M Update
BEFORE (200k max context):
Max cache write cost: 200k x 1.5 = 300k tokens
Impact: Manageable
AFTER (1M max context):
Max cache write cost: 500k x 1.5 = 750k tokens
Impact: Can consume 16%+ of limit in one message

Practical Strategies to Avoid This Trap

After getting burned by this, I’ve developed some rules:

Rule 1: Stay engaged or start fresh

If you’re stepping away for more than an hour, expect a potential usage spike when you return. Consider summarizing key points and starting a new conversation instead.

Rule 2: Monitor your context percentage

cache_monitor.py
class CacheMonitor:
"""Track cache status and warn about expensive re-cache scenarios."""
def __init__(self, ttl_seconds: int = 300):
self.last_activity = None
self.cache_ttl = ttl_seconds
self.context_tokens = 0
def check_before_message(self, current_time: float) -> str:
if self.last_activity is None:
return "no_cache"
elapsed = current_time - self.last_activity
if elapsed > self.cache_ttl:
estimated_cost = self.context_tokens * 1.5 # cache write
return f"WARNING: Cache expired. Re-cache will cost ~{estimated_cost:,.0f} tokens"
return f"Cache hot. Next message costs ~{self.context_tokens * 0.1:,.0f} tokens"

Rule 3: The 20% rule

From the Reddit thread: “If you get over 20% context on Opus, start a new chat.”

This is solid advice. Once your context grows beyond 20% of the window, the potential re-cache cost becomes significant enough to warrant a fresh conversation.

The “Frozen Brick” Problem

There’s another issue with large cached contexts: they become “frozen bricks” when the cache goes cold.

Once you’ve invested significant tokens into building a large context, you’re incentivized to keep using it. But if the cache expires, you face a painful choice:

  1. Pay the expensive re-cache cost to continue
  2. Abandon the context and lose all that accumulated context

Neither option feels good. This is why starting fresh conversations periodically (especially after completing major tasks) is often the better strategy.

What Anthropic Could Do Better

This situation could be improved with better transparency:

  1. Show cache status in the UI - Let users know when their cache is hot vs. cold
  2. Warn about re-cache costs - Before sending a message that triggers a re-cache, show the estimated cost
  3. Adjust TTLs for larger contexts - Longer cache TTL for larger contexts would reduce the frequency of expensive re-caches

Until these improvements arrive, the responsibility falls on users to understand the mechanics and avoid the trap.

Key Takeaways

  1. Your limits weren’t reduced - The pricing model for large contexts makes cache management critical
  2. Cache writes are expensive - 1.25x-2x multiplier vs. 0.1x for cache reads
  3. Long breaks are dangerous - Cache expiration + large context = usage spike
  4. The 20% rule - Consider starting fresh when context exceeds 20%

The 1M context update is genuinely useful for complex tasks, but it requires more careful management of your conversation state. Stay aware of your cache status, and don’t let a simple “update?” burn through your limit.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments