Why Does Claude Code Consume So Much Usage When Returning to Old Sessions?
The Problem
I returned to a Claude Code session after leaving it open overnight, typed “hey” to continue working, and watched my usage drop by 22%. That single word consumed 192,000 tokens.
I stared at the usage breakdown in disbelief:
Input: "hey" (~3 tokens)Output: ~30 tokensCache read: 192,000 tokensCache write: 192,000 tokensTotal impact: 22% of usage limitWhy did saying “hey” cost me so much?
My Investigation
I assumed Claude Code was somehow broken or overcharging. I dug into the Reddit discussions and found hundreds of users with the same complaint. One user reported that 92% of their tokens were cache reads, only 0.015% was actual output.
Then I found the real culprit: prompt caching TTL.
The Root Cause: Cache Expiry
Claude Code uses prompt caching to avoid re-processing your entire conversation on each message. But this cache has a limited lifetime:
+-------------+------------------+| Tier | Cache TTL |+-------------+------------------+| Pro | 5 minutes || Max | 1 hour |+-------------+------------------+I left my session overnight. The cache expired. When I returned and sent a message:
Step 1: Cache miss detected | vStep 2: Entire conversation context must be re-processed | vStep 3: Cache rebuild triggers at 1.25x normal input rate | vStep 4: 150K context → 187.5K tokens chargedThe math is brutal:
CACHED CONTEXT: 150K tokens × 0.1 (cache read rate) = 15K token equivalent
EXPIRED CONTEXT: 150K tokens × 1.25 (cache write rate) = 187.5K token cost
DIFFERENCE: 12.5x more expensiveThe Token Economics Breakdown
Understanding the cost structure changed how I view usage:
+------------------------+------------------+------------------------+| Token Type | Cost Multiplier | When Charged |+------------------------+------------------+------------------------+| Input (uncached) | 1.0x | Every message w/o cache|| Cache read | ~0.1x | Reading cached context || Cache write | 1.25x | First msg after miss |+------------------------+------------------+------------------------+The cache write is the expensive part. It’s a one-time penalty, but if your context is large (like a big project with 100K+ tokens), that penalty is massive.
My First Attempt at a Solution
I tried to just start new sessions every time. But this created new problems:
# I thought: just always start fresh# Problem: I lose all context, instructions, and conversation historydef start_fresh_session(): # Lost: project context # Lost: custom instructions # Lost: conversation memory # Result: Claude has to re-learn everything passStarting fresh means Claude has no memory of what we were working on. That’s not practical for ongoing projects.
What Actually Works
After experimenting, I found several strategies that help:
Strategy 1: Use Max Tier for 1-Hour TTL
If you’re on Pro tier, the 5-minute cache expiry is brutal. Upgrading to Max tier gives you a 1-hour TTL:
Pro Tier: - Cache expires after 5 minutes - Coffee break = cache miss - Lunch break = expensive cache rebuild
Max Tier: - Cache expires after 1 hour - Coffee break = cache HIT - Lunch break = cache HIT - Only overnight = cache missStrategy 2: Keepalive Messages
For long work sessions, send periodic messages to keep the cache alive:
# Set a timer for 4 minutes (Pro) or 50 minutes (Max)# Send a simple message to refresh the cache
# Option A: Manual approach# Just type "continue" or "refresh" every few minutes
# Option B: Automated (if you're away)while true; do sleep 240 # 4 minutes for Pro tier echo "cache refresh" | claudedoneStrategy 3: Monitor Cache Status
Claude Code exposes cache metrics in the status line. Watch for these indicators:
{ "context_window": { "total_input_tokens": 15234, "total_output_tokens": 4521, "context_window_size": 200000, "used_percentage": 8, "current_usage": { "input_tokens": 8500, "output_tokens": 1200, "cache_creation_input_tokens": 5000, // Watch this! "cache_read_input_tokens": 2000 // vs this } }}When cache_creation_input_tokens spikes, you’re paying the cache write penalty.
Strategy 4: Context Size Management
Large contexts make cache misses expensive. Use .claudeignore to reduce what gets loaded:
# Reduce context sizenode_modules/.git/dist/*.log*.min.jscoverage/.envSmaller context = smaller cache write penalty.
Common Mistakes I Made
Mistake 1: Leaving Sessions Open Overnight
Before bed: Session: 150K tokens cached
Morning: Cache: EXPIRED (5 min TTL × many hours)
First message "hey": Result: 187.5K tokens charged for cache rebuildMistake 2: Ignoring the Usage Breakdown
I saw “22% usage” and panicked. But breaking it down:
Total tokens charged: 192K+ - Cache reads: 92% of tokens (charged at 0.1x rate) - Actual output: 0.015% of tokens - Cache write: The real cost (1.25x rate)
Lesson: It wasn't the output, it was the cache rebuildMistake 3: Confusing Rate Limits with Usage Limits
Rate Limits: - Reset automatically (5-hour, 7-day windows) - Block you temporarily
Usage Limits: - Consume your allocation - Cache writes affect both
Cache writes consume rate limit AND usage limit simultaneouslyMistake 4: Not Checking Context Window Size
Big projects = big context = big cache penalty:
Small project (10K tokens): Cache miss penalty: 10K × 1.25 = 12.5K tokens
Large project (150K tokens): Cache miss penalty: 150K × 1.25 = 187.5K tokens
15x more expensive for large projects!When to Accept the Cost vs. Start Fresh
I developed a decision tree:
Is context > 50K tokens? | +-- NO --> Just pay the cache write, it's manageable | +-- YES --> How long since last message? | +-- < 1 hour (Max tier) --> Cache is valid, proceed | +-- > 1 hour --> Do I need the full context? | +-- YES --> Pay the cache write | +-- NO --> Start fresh session Copy only essential contextSummary
Saying “hey” in an old session isn’t expensive because of the word itself. It’s expensive because:
- The prompt cache expired (5 min TTL for Pro, 1 hour for Max)
- Your entire conversation context needs to be re-cached
- Cache writes cost 1.25x the normal input rate
- Large projects with 150K+ tokens get hit hardest
The solution: Either keep the cache alive with periodic messages, or accept the cost and plan for it. Starting fresh makes sense when you don’t need the full context.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 AWS Bedrock: Prompt Caching for Faster Model Inference
- 👨💻 Claude Code Status Line Documentation
- 👨💻 Reddit: Saying 'hey' cost me 22% of my usage limits
- 👨💻 Anthropic SDK Python - Token Counting
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments