Skip to content

Claude Code Cache TTL: Why That 'Hey' Cost You 22% of Your Usage

Problem

I said “hey” to resume an old Claude Code session. The response came back, but I noticed something alarming:

Usage Spike
Usage: 22% of my monthly allocation consumed

For saying “hey.”

I checked my recent activity, thinking there must be a mistake. A simple greeting shouldn’t consume nearly a quarter of my Pro plan budget. But the usage was real. My cache had expired, and Claude had to reprocess my entire conversation context from scratch.

This is the hidden cost of Claude Code’s cache TTL (Time-To-Live) differences between Pro and Max plans.

Environment

  • Claude Code CLI with Pro subscription ($20/month)
  • Long-running session with extensive codebase context loaded
  • Brief pause between interactions (coffee break, meeting, context switch)
  • Expected: Minimal usage for simple message
  • Actual: 22% usage spike due to cache expiration

What happened?

I thought returning to an existing Claude Code session would be cheap. After all, the context was already loaded, right?

Wrong. The cache had expired during my break.

When I investigated, I discovered the root cause: Pro plan’s cache TTL is only 5 minutes. Max plan users get a 1-hour TTL. This 12x difference has massive cost implications for developers who context-switch frequently.

Cache TTL: The hidden spec

Here’s the breakdown I found:

Cache TTL Comparison
Plan | Cache TTL | Rate Limit Multiplier
------------|-----------|----------------------
Pro ($20) | 5 minutes | 1x (baseline)
Max ($100) | 1 hour | 5x
Max ($200) | 1 hour | 20x

The TTL difference isn’t mentioned prominently in marketing materials. But it fundamentally changes how you should work with Claude Code.

How prompt caching works

Claude Code uses Anthropic’s prompt caching to avoid reprocessing the same context repeatedly:

Cache Cost Structure
Operation | Token Cost Multiplier
-----------------|----------------------
Cache write | 1.25x normal input tokens
Cache read (hit) | ~0.1x normal input tokens (90% cheaper!)
Fresh processing | 1.0x normal input tokens

This means:

  • First time loading context: Pay 1.25x (write to cache)
  • Subsequent requests within TTL: Pay ~0.1x (read from cache)
  • After TTL expires: Pay 1.0x again (fresh processing)

The 90% savings from cache hits are substantial for large contexts. But they only apply if your cache is still valid.

The 5-minute trap

Here’s where Pro users get caught:

Common Break Durations
Activity | Duration | Cache Status (Pro)
------------------|----------|-------------------
Coffee break | 6 min | EXPIRED
Check PR/emails | 10 min | EXPIRED
Quick standup | 15 min | EXPIRED
Lunch | 30 min | EXPIRED
Focus on other task | 45 min | EXPIRED

Almost every natural workflow interruption exceeds 5 minutes. When you return, your cache is gone.

Real cost comparison

Let me show you what this means in practice:

Scenario: 100K tokens of context, returning after 10 minutes
Pro Plan (cache expired):
- Full re-processing: 100K tokens at normal rate
- Cost: 100K tokens from your quota
Max Plan (cache still valid):
- Cache read: 100K tokens at ~10% cost
- Cost: ~10K tokens from your quota
Difference: 10x more tokens consumed on Pro plan!

For my “hey” incident, I had loaded a large codebase into context. When the cache expired, that entire context had to be reprocessed, costing 22% of my allocation.

Why context matters more than message length

I initially thought my “hey” was expensive because of the response. But that’s not how it works:

Token Cost Breakdown for Simple Message
Your input ("hey"): ~2 tokens
Claude's response: ~10 tokens
System prompts: ~1,000-4,000 tokens
Previously loaded context: ~50,000-200,000 tokens (if cache expired)
─────────────────────────────────────────────────
Total: Depends on cache status!

The “hey” itself costs almost nothing. But if your cache expired, you’re paying for all the context you loaded earlier.

Common mistakes I made

Mistake 1: Treating Claude Code like a simple chatbot

I assumed returning to a session would be cheap because I was sending a short message. Reality: The cost depends on your total context, not just your current message.

Mistake 2: Not understanding cache vs rate limits

I knew Max had higher rate limits. I didn’t realize the cache TTL difference was arguably more important for my workflow.

Mistake 3: Ignoring session timing

I’d start a complex analysis, get distracted by meetings or other tasks, then return expecting everything to work the same. Each return after 5+ minutes was a cache miss.

Mistake 4: Context-switching without considering costs

My developer workflow naturally involves switching between tasks - checking PRs, responding to messages, debugging issues. Each switch that exceeded 5 minutes invalidated my cache.

Strategies to maximize cache efficiency on Pro

After understanding the problem, I changed my workflow:

Cache-Aware Commands
# Strategy 1: Use /compact before breaks
/compact Keep only essential context for current task
# Strategy 2: Monitor your context usage
/context # Shows context utilization and optimization suggestions
# Strategy 3: Use /clear when starting fresh topics
/clear # Resets context (cheaper than cache miss on large context)

Workflow optimization for Pro users:

  1. Start complex tasks when you have 30+ minutes of uninterrupted time
  2. Before any break, run /compact to reduce your context size
  3. If returning after >5 minutes, consider /clear if the previous context isn’t critical
  4. Batch related questions together instead of spreading them across breaks
  5. For long-running projects, consider if Max plan’s 1-hour TTL would actually save money

When upgrading to Max makes sense

Calculate your break-even point:

Max Plan ROI Calculation
If you typically:
- Load 100K+ tokens of context per session
- Have 5+ context switches per day exceeding 5 minutes
- Work on multiple projects requiring different contexts
Then Max plan's extended TTL can reduce your effective token costs
by 5-10x, potentially paying for itself through efficiency gains.

The math: If you have 3 cache expirations per day costing 50K tokens each (150K tokens/day), that’s 4.5M tokens/month in cache misses alone. Max plan would reduce that to ~450K tokens through cache reads - a savings of over 4M tokens.

Summary

In this post, I explained why returning to old Claude Code sessions can consume disproportionate usage limits:

  • Pro plan’s 5-minute cache TTL vs Max plan’s 1-hour TTL is a 12x difference
  • Cache hits cost ~90% less than fresh token processing
  • Your “hey” costs almost nothing; your expired context costs everything
  • Natural workflow interruptions (coffee, meetings, context switches) almost always exceed 5 minutes
  • Upgrading to Max isn’t just about rate limits - cache TTL alone may justify the cost for heavy users

The next time you see an unexpected usage spike after resuming a session, check your cache TTL. You might not be doing anything wrong - you’re just hitting the limits of your plan’s caching policy.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments