Skip to content

Why Does Claude Code Consume So Much Usage When Returning to Old Sessions?

The Problem

I returned to a Claude Code session after leaving it open overnight, typed “hey” to continue working, and watched my usage drop by 22%. That single word consumed 192,000 tokens.

I stared at the usage breakdown in disbelief:

token-usage-breakdown.txt
Input: "hey" (~3 tokens)
Output: ~30 tokens
Cache read: 192,000 tokens
Cache write: 192,000 tokens
Total impact: 22% of usage limit

Why did saying “hey” cost me so much?

My Investigation

I assumed Claude Code was somehow broken or overcharging. I dug into the Reddit discussions and found hundreds of users with the same complaint. One user reported that 92% of their tokens were cache reads, only 0.015% was actual output.

Then I found the real culprit: prompt caching TTL.

The Root Cause: Cache Expiry

Claude Code uses prompt caching to avoid re-processing your entire conversation on each message. But this cache has a limited lifetime:

cache-ttl-by-tier.txt
+-------------+------------------+
| Tier | Cache TTL |
+-------------+------------------+
| Pro | 5 minutes |
| Max | 1 hour |
+-------------+------------------+

I left my session overnight. The cache expired. When I returned and sent a message:

what-happens-on-cache-miss.txt
Step 1: Cache miss detected
|
v
Step 2: Entire conversation context must be re-processed
|
v
Step 3: Cache rebuild triggers at 1.25x normal input rate
|
v
Step 4: 150K context → 187.5K tokens charged

The math is brutal:

token-cost-comparison.txt
CACHED CONTEXT:
150K tokens × 0.1 (cache read rate) = 15K token equivalent
EXPIRED CONTEXT:
150K tokens × 1.25 (cache write rate) = 187.5K token cost
DIFFERENCE: 12.5x more expensive

The Token Economics Breakdown

Understanding the cost structure changed how I view usage:

token-cost-multipliers.txt
+------------------------+------------------+------------------------+
| Token Type | Cost Multiplier | When Charged |
+------------------------+------------------+------------------------+
| Input (uncached) | 1.0x | Every message w/o cache|
| Cache read | ~0.1x | Reading cached context |
| Cache write | 1.25x | First msg after miss |
+------------------------+------------------+------------------------+

The cache write is the expensive part. It’s a one-time penalty, but if your context is large (like a big project with 100K+ tokens), that penalty is massive.

My First Attempt at a Solution

I tried to just start new sessions every time. But this created new problems:

my-first-approach.py
# I thought: just always start fresh
# Problem: I lose all context, instructions, and conversation history
def start_fresh_session():
# Lost: project context
# Lost: custom instructions
# Lost: conversation memory
# Result: Claude has to re-learn everything
pass

Starting fresh means Claude has no memory of what we were working on. That’s not practical for ongoing projects.

What Actually Works

After experimenting, I found several strategies that help:

Strategy 1: Use Max Tier for 1-Hour TTL

If you’re on Pro tier, the 5-minute cache expiry is brutal. Upgrading to Max tier gives you a 1-hour TTL:

tier-comparison.txt
Pro Tier:
- Cache expires after 5 minutes
- Coffee break = cache miss
- Lunch break = expensive cache rebuild
Max Tier:
- Cache expires after 1 hour
- Coffee break = cache HIT
- Lunch break = cache HIT
- Only overnight = cache miss

Strategy 2: Keepalive Messages

For long work sessions, send periodic messages to keep the cache alive:

keepalive-approach.sh
# Set a timer for 4 minutes (Pro) or 50 minutes (Max)
# Send a simple message to refresh the cache
# Option A: Manual approach
# Just type "continue" or "refresh" every few minutes
# Option B: Automated (if you're away)
while true; do
sleep 240 # 4 minutes for Pro tier
echo "cache refresh" | claude
done

Strategy 3: Monitor Cache Status

Claude Code exposes cache metrics in the status line. Watch for these indicators:

status-line-cache-metrics.json
{
"context_window": {
"total_input_tokens": 15234,
"total_output_tokens": 4521,
"context_window_size": 200000,
"used_percentage": 8,
"current_usage": {
"input_tokens": 8500,
"output_tokens": 1200,
"cache_creation_input_tokens": 5000, // Watch this!
"cache_read_input_tokens": 2000 // vs this
}
}
}

When cache_creation_input_tokens spikes, you’re paying the cache write penalty.

Strategy 4: Context Size Management

Large contexts make cache misses expensive. Use .claudeignore to reduce what gets loaded:

.claudeignore-example.txt
# Reduce context size
node_modules/
.git/
dist/
*.log
*.min.js
coverage/
.env

Smaller context = smaller cache write penalty.

Common Mistakes I Made

Mistake 1: Leaving Sessions Open Overnight

overnight-mistake.txt
Before bed:
Session: 150K tokens cached
Morning:
Cache: EXPIRED (5 min TTL × many hours)
First message "hey":
Result: 187.5K tokens charged for cache rebuild

Mistake 2: Ignoring the Usage Breakdown

I saw “22% usage” and panicked. But breaking it down:

usage-breakdown-analysis.txt
Total tokens charged: 192K+
- Cache reads: 92% of tokens (charged at 0.1x rate)
- Actual output: 0.015% of tokens
- Cache write: The real cost (1.25x rate)
Lesson: It wasn't the output, it was the cache rebuild

Mistake 3: Confusing Rate Limits with Usage Limits

rate-vs-usage.txt
Rate Limits:
- Reset automatically (5-hour, 7-day windows)
- Block you temporarily
Usage Limits:
- Consume your allocation
- Cache writes affect both
Cache writes consume rate limit AND usage limit simultaneously

Mistake 4: Not Checking Context Window Size

Big projects = big context = big cache penalty:

context-size-impact.txt
Small project (10K tokens):
Cache miss penalty: 10K × 1.25 = 12.5K tokens
Large project (150K tokens):
Cache miss penalty: 150K × 1.25 = 187.5K tokens
15x more expensive for large projects!

When to Accept the Cost vs. Start Fresh

I developed a decision tree:

cache-decision-tree.txt
Is context > 50K tokens?
|
+-- NO --> Just pay the cache write, it's manageable
|
+-- YES --> How long since last message?
|
+-- < 1 hour (Max tier) --> Cache is valid, proceed
|
+-- > 1 hour --> Do I need the full context?
|
+-- YES --> Pay the cache write
|
+-- NO --> Start fresh session
Copy only essential context

Summary

Saying “hey” in an old session isn’t expensive because of the word itself. It’s expensive because:

  1. The prompt cache expired (5 min TTL for Pro, 1 hour for Max)
  2. Your entire conversation context needs to be re-cached
  3. Cache writes cost 1.25x the normal input rate
  4. Large projects with 150K+ tokens get hit hardest

The solution: Either keep the cache alive with periodic messages, or accept the cost and plan for it. Starting fresh makes sense when you don’t need the full context.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments