Claude Code Cache TTL: Why That 'Hey' Cost You 22% of Your Usage
Problem
I said “hey” to resume an old Claude Code session. The response came back, but I noticed something alarming:
Usage: 22% of my monthly allocation consumedFor saying “hey.”
I checked my recent activity, thinking there must be a mistake. A simple greeting shouldn’t consume nearly a quarter of my Pro plan budget. But the usage was real. My cache had expired, and Claude had to reprocess my entire conversation context from scratch.
This is the hidden cost of Claude Code’s cache TTL (Time-To-Live) differences between Pro and Max plans.
Environment
- Claude Code CLI with Pro subscription ($20/month)
- Long-running session with extensive codebase context loaded
- Brief pause between interactions (coffee break, meeting, context switch)
- Expected: Minimal usage for simple message
- Actual: 22% usage spike due to cache expiration
What happened?
I thought returning to an existing Claude Code session would be cheap. After all, the context was already loaded, right?
Wrong. The cache had expired during my break.
When I investigated, I discovered the root cause: Pro plan’s cache TTL is only 5 minutes. Max plan users get a 1-hour TTL. This 12x difference has massive cost implications for developers who context-switch frequently.
Cache TTL: The hidden spec
Here’s the breakdown I found:
Plan | Cache TTL | Rate Limit Multiplier------------|-----------|----------------------Pro ($20) | 5 minutes | 1x (baseline)Max ($100) | 1 hour | 5xMax ($200) | 1 hour | 20xThe TTL difference isn’t mentioned prominently in marketing materials. But it fundamentally changes how you should work with Claude Code.
How prompt caching works
Claude Code uses Anthropic’s prompt caching to avoid reprocessing the same context repeatedly:
Operation | Token Cost Multiplier-----------------|----------------------Cache write | 1.25x normal input tokensCache read (hit) | ~0.1x normal input tokens (90% cheaper!)Fresh processing | 1.0x normal input tokensThis means:
- First time loading context: Pay 1.25x (write to cache)
- Subsequent requests within TTL: Pay ~0.1x (read from cache)
- After TTL expires: Pay 1.0x again (fresh processing)
The 90% savings from cache hits are substantial for large contexts. But they only apply if your cache is still valid.
The 5-minute trap
Here’s where Pro users get caught:
Activity | Duration | Cache Status (Pro)------------------|----------|-------------------Coffee break | 6 min | EXPIREDCheck PR/emails | 10 min | EXPIREDQuick standup | 15 min | EXPIREDLunch | 30 min | EXPIREDFocus on other task | 45 min | EXPIREDAlmost every natural workflow interruption exceeds 5 minutes. When you return, your cache is gone.
Real cost comparison
Let me show you what this means in practice:
Pro Plan (cache expired): - Full re-processing: 100K tokens at normal rate - Cost: 100K tokens from your quota
Max Plan (cache still valid): - Cache read: 100K tokens at ~10% cost - Cost: ~10K tokens from your quota
Difference: 10x more tokens consumed on Pro plan!For my “hey” incident, I had loaded a large codebase into context. When the cache expired, that entire context had to be reprocessed, costing 22% of my allocation.
Why context matters more than message length
I initially thought my “hey” was expensive because of the response. But that’s not how it works:
Your input ("hey"): ~2 tokensClaude's response: ~10 tokensSystem prompts: ~1,000-4,000 tokensPreviously loaded context: ~50,000-200,000 tokens (if cache expired)─────────────────────────────────────────────────Total: Depends on cache status!The “hey” itself costs almost nothing. But if your cache expired, you’re paying for all the context you loaded earlier.
Common mistakes I made
Mistake 1: Treating Claude Code like a simple chatbot
I assumed returning to a session would be cheap because I was sending a short message. Reality: The cost depends on your total context, not just your current message.
Mistake 2: Not understanding cache vs rate limits
I knew Max had higher rate limits. I didn’t realize the cache TTL difference was arguably more important for my workflow.
Mistake 3: Ignoring session timing
I’d start a complex analysis, get distracted by meetings or other tasks, then return expecting everything to work the same. Each return after 5+ minutes was a cache miss.
Mistake 4: Context-switching without considering costs
My developer workflow naturally involves switching between tasks - checking PRs, responding to messages, debugging issues. Each switch that exceeded 5 minutes invalidated my cache.
Strategies to maximize cache efficiency on Pro
After understanding the problem, I changed my workflow:
# Strategy 1: Use /compact before breaks/compact Keep only essential context for current task
# Strategy 2: Monitor your context usage/context # Shows context utilization and optimization suggestions
# Strategy 3: Use /clear when starting fresh topics/clear # Resets context (cheaper than cache miss on large context)Workflow optimization for Pro users:
- Start complex tasks when you have 30+ minutes of uninterrupted time
- Before any break, run
/compactto reduce your context size - If returning after >5 minutes, consider
/clearif the previous context isn’t critical - Batch related questions together instead of spreading them across breaks
- For long-running projects, consider if Max plan’s 1-hour TTL would actually save money
When upgrading to Max makes sense
Calculate your break-even point:
If you typically:- Load 100K+ tokens of context per session- Have 5+ context switches per day exceeding 5 minutes- Work on multiple projects requiring different contexts
Then Max plan's extended TTL can reduce your effective token costsby 5-10x, potentially paying for itself through efficiency gains.The math: If you have 3 cache expirations per day costing 50K tokens each (150K tokens/day), that’s 4.5M tokens/month in cache misses alone. Max plan would reduce that to ~450K tokens through cache reads - a savings of over 4M tokens.
Summary
In this post, I explained why returning to old Claude Code sessions can consume disproportionate usage limits:
- Pro plan’s 5-minute cache TTL vs Max plan’s 1-hour TTL is a 12x difference
- Cache hits cost ~90% less than fresh token processing
- Your “hey” costs almost nothing; your expired context costs everything
- Natural workflow interruptions (coffee, meetings, context switches) almost always exceed 5 minutes
- Upgrading to Max isn’t just about rate limits - cache TTL alone may justify the cost for heavy users
The next time you see an unexpected usage spike after resuming a session, check your cache TTL. You might not be doing anything wrong - you’re just hitting the limits of your plan’s caching policy.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Claude Code Command Reference
- 👨💻 Anthropic Claude Max Subscription Plan
- 👨💻 OpenClaw Session Pruning Documentation
- 👨💻 Anthropic Prompt Caching Announcement
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments