Why Does Claude Code Consume So Much Usage When Returning to Old Sessions?

Mar 26, 2026

The Problem

I returned to a Claude Code session after leaving it open overnight, typed “hey” to continue working, and watched my usage drop by 22%. That single word consumed 192,000 tokens.

I stared at the usage breakdown in disbelief:

Input:           "hey" (~3 tokens)
Output:          ~30 tokens
Cache read:      192,000 tokens
Cache write:     192,000 tokens
Total impact:    22% of usage limit

Why did saying “hey” cost me so much?

My Investigation

I assumed Claude Code was somehow broken or overcharging. I dug into the Reddit discussions and found hundreds of users with the same complaint. One user reported that 92% of their tokens were cache reads, only 0.015% was actual output.

Then I found the real culprit: prompt caching TTL.

The Root Cause: Cache Expiry

Claude Code uses prompt caching to avoid re-processing your entire conversation on each message. But this cache has a limited lifetime:

+-------------+------------------+
| Tier        | Cache TTL        |
+-------------+------------------+
| Pro         | 5 minutes        |
| Max         | 1 hour           |
+-------------+------------------+

I left my session overnight. The cache expired. When I returned and sent a message:

Step 1: Cache miss detected
        |
        v
Step 2: Entire conversation context must be re-processed
        |
        v
Step 3: Cache rebuild triggers at 1.25x normal input rate
        |
        v
Step 4: 150K context → 187.5K tokens charged

The math is brutal:

CACHED CONTEXT:
  150K tokens × 0.1 (cache read rate) = 15K token equivalent

EXPIRED CONTEXT:
  150K tokens × 1.25 (cache write rate) = 187.5K token cost

DIFFERENCE: 12.5x more expensive

The Token Economics Breakdown

Understanding the cost structure changed how I view usage:

+------------------------+------------------+------------------------+
| Token Type             | Cost Multiplier  | When Charged           |
+------------------------+------------------+------------------------+
| Input (uncached)       | 1.0x             | Every message w/o cache|
| Cache read             | ~0.1x            | Reading cached context |
| Cache write            | 1.25x            | First msg after miss  |
+------------------------+------------------+------------------------+

The cache write is the expensive part. It’s a one-time penalty, but if your context is large (like a big project with 100K+ tokens), that penalty is massive.

My First Attempt at a Solution

I tried to just start new sessions every time. But this created new problems:

# I thought: just always start fresh
# Problem: I lose all context, instructions, and conversation history
def start_fresh_session():
    # Lost: project context
    # Lost: custom instructions
    # Lost: conversation memory
    # Result: Claude has to re-learn everything
    pass

Starting fresh means Claude has no memory of what we were working on. That’s not practical for ongoing projects.

What Actually Works

After experimenting, I found several strategies that help:

Strategy 1: Use Max Tier for 1-Hour TTL

If you’re on Pro tier, the 5-minute cache expiry is brutal. Upgrading to Max tier gives you a 1-hour TTL:

Pro Tier:
  - Cache expires after 5 minutes
  - Coffee break = cache miss
  - Lunch break = expensive cache rebuild

Max Tier:
  - Cache expires after 1 hour
  - Coffee break = cache HIT
  - Lunch break = cache HIT
  - Only overnight = cache miss

Strategy 2: Keepalive Messages

For long work sessions, send periodic messages to keep the cache alive:

# Set a timer for 4 minutes (Pro) or 50 minutes (Max)
# Send a simple message to refresh the cache

# Option A: Manual approach
# Just type "continue" or "refresh" every few minutes

# Option B: Automated (if you're away)
while true; do
    sleep 240  # 4 minutes for Pro tier
    echo "cache refresh" | claude
done

Strategy 3: Monitor Cache Status

Claude Code exposes cache metrics in the status line. Watch for these indicators:

{
  "context_window": {
    "total_input_tokens": 15234,
    "total_output_tokens": 4521,
    "context_window_size": 200000,
    "used_percentage": 8,
    "current_usage": {
      "input_tokens": 8500,
      "output_tokens": 1200,
      "cache_creation_input_tokens": 5000,  // Watch this!
      "cache_read_input_tokens": 2000       // vs this
    }
  }
}

When cache_creation_input_tokens spikes, you’re paying the cache write penalty.

Strategy 4: Context Size Management

Large contexts make cache misses expensive. Use .claudeignore to reduce what gets loaded:

# Reduce context size
node_modules/
.git/
dist/
*.log
*.min.js
coverage/
.env

Smaller context = smaller cache write penalty.

Common Mistakes I Made

Mistake 1: Leaving Sessions Open Overnight

Before bed:
  Session: 150K tokens cached

Morning:
  Cache: EXPIRED (5 min TTL × many hours)

First message "hey":
  Result: 187.5K tokens charged for cache rebuild

Mistake 2: Ignoring the Usage Breakdown

I saw “22% usage” and panicked. But breaking it down:

Total tokens charged: 192K+
  - Cache reads: 92% of tokens (charged at 0.1x rate)
  - Actual output: 0.015% of tokens
  - Cache write: The real cost (1.25x rate)

Lesson: It wasn't the output, it was the cache rebuild

Mistake 3: Confusing Rate Limits with Usage Limits

Rate Limits:
  - Reset automatically (5-hour, 7-day windows)
  - Block you temporarily

Usage Limits:
  - Consume your allocation
  - Cache writes affect both

Cache writes consume rate limit AND usage limit simultaneously

Mistake 4: Not Checking Context Window Size

Big projects = big context = big cache penalty:

Small project (10K tokens):
  Cache miss penalty: 10K × 1.25 = 12.5K tokens

Large project (150K tokens):
  Cache miss penalty: 150K × 1.25 = 187.5K tokens

15x more expensive for large projects!

When to Accept the Cost vs. Start Fresh

I developed a decision tree:

Is context > 50K tokens?
  |
  +-- NO --> Just pay the cache write, it's manageable
  |
  +-- YES --> How long since last message?
                |
                +-- < 1 hour (Max tier) --> Cache is valid, proceed
                |
                +-- > 1 hour --> Do I need the full context?
                                |
                                +-- YES --> Pay the cache write
                                |
                                +-- NO --> Start fresh session
                                           Copy only essential context

Summary

Saying “hey” in an old session isn’t expensive because of the word itself. It’s expensive because:

The prompt cache expired (5 min TTL for Pro, 1 hour for Max)
Your entire conversation context needs to be re-cached
Cache writes cost 1.25x the normal input rate
Large projects with 150K+ tokens get hit hardest

The solution: Either keep the cache alive with periodic messages, or accept the cost and plan for it. Starting fresh makes sense when you don’t need the full context.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 AWS Bedrock: Prompt Caching for Faster Model Inference
👨‍💻 Claude Code Status Line Documentation
👨‍💻 Reddit: Saying 'hey' cost me 22% of my usage limits
👨‍💻 Anthropic SDK Python - Token Counting

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!