What Is the Stale Chat Trap in Claude Code and How Do I Avoid It?

Mar 26, 2026

Problem

I typed “hey” into an old Claude Code session and watched 22% of my monthly usage vanish instantly:

[Returning to session after lunch break]
User: hey
Claude: Hello! How can I help you today?

Usage meter: -22% of monthly quota

Twenty-two percent. For a three-character greeting. That’s when I discovered the “stale chat trap”—a costly pitfall that’s especially brutal with Claude’s larger context windows.

Environment

Claude Code CLI
Session with ~100,000 tokens of conversation history
Break time: ~30 minutes
Expected usage for “hey”: minimal
Actual usage: 22% of monthly limit

What happened?

I was working on a complex project in Claude Code. After a productive morning, I stepped away for lunch. When I returned about 30 minutes later, I sent a simple greeting to continue:

[Session had been idle for 30 minutes]
Me: hey
Claude: Hello! How can I help you continue with the project?

[Usage drops from 78% to 56%]

I couldn’t believe it. The same session that had been consuming 1-2% per complex query suddenly ate 22% for a greeting.

So I tested this with a fresh approach:

[Hypothesis: Starting fresh might be cheaper]

Option A: Continue old session with summary request
Old session: "Sum up our approach so far"
Usage: -15%

Option B: Start new session with pasted summary
New session: [Pasted summary from old session] "Continue from here"
Usage: -0% additional (only the new context was processed)

The difference was staggering. Asking the old session to summarize cost 15%. Starting fresh with a pasted summary cost essentially nothing extra.

How to solve it?

I investigated how Claude’s prompt caching works and discovered the root cause.

My hypothesis: The cache expired during my break

Claude Code uses prompt caching to avoid re-processing:
- System prompts and instructions
- Conversation history
- Previously processed context

But caches have a TTL (Time To Live)...

I then tested the TTL behavior:

Test 1: Message within 5 minutes
[Active session, 2 min gap]
Me: Continue
Claude: Sure!
Usage: ~0.1% (cache hit)

Test 2: Message after 5+ minutes
[Session idle for 10 min]
Me: Continue
Claude: Sure!
Usage: ~15% (cache miss, full re-processing)

The cache TTL is typically 5 minutes. After that, everything gets re-processed from scratch.

Here’s the math for my situation:

My session had 100,000 tokens of history

Cache HIT (within 5 minutes):
- New message: "hey" (~3 tokens)
- Cache lookup: minimal
- Total processed: ~3 tokens
- Cost: negligible

Cache MISS (after 5+ minutes):
- System prompt: ~1,000 tokens
- Conversation history: 100,000 tokens
- New message: 3 tokens
- Response: ~50 tokens
- Total processed: 101,053 tokens
- Cost: ~22% of a 450K token monthly limit

That’s the stale chat trap: returning to an old session after cache expiration triggers full context re-processing.

The solution

I found three effective strategies to avoid this trap.

Strategy 1: Use /compact before walking away

The /compact command compresses your conversation into a summary:

# Before taking a break
/compact

# Claude creates a summary like:
# "We've been working on a React authentication system.
# Completed: login form, JWT handling.
# Next: implement refresh token rotation."

After compaction, your context shrinks from 100,000 tokens to maybe 2,000 tokens. If the cache expires, you only pay for 2,000 tokens instead of 100,000.

Without /compact:
Cache miss cost: 100,000 tokens = 22% of monthly limit

With /compact:
Cache miss cost: 2,000 tokens = 0.4% of monthly limit

Savings: 55x cheaper

Strategy 2: Start fresh when cache is dead

If you’ve been away for more than 5 minutes and your context is large:

# Check your context size first
/context

# If context is large (>20K tokens) and cache is expired:
/clear    # Start fresh

# Then paste only what you need
"I was working on [specific task]. Here's where we left off:
[brief summary of current state]
Please help me continue."

Comparison of approaches:

OLD SESSION APPROACH:
- Re-process 100K tokens of history
- Cost: 22% of monthly limit

NEW SESSION APPROACH:
- Paste 500 tokens of relevant context
- Process: ~500 tokens
- Cost: 0.1% of monthly limit

Strategy 3: Monitor proactively

Make these commands habit:

# Before a break, always run:
/compact

# When returning after >5 min, check:
/context
/cost

# If context is large and cost is climbing:
# Start a new session instead

The reason

The stale chat trap exists because of how prompt caching works:

How Claude Prompt Caching Works:

1. First message in session
   [System prompt: 1K tokens] + [Your message: 50 tokens]
   Total processed: 1,050 tokens
   Cache created: Yes

2. Second message (within 5 min)
   [System prompt: CACHED] + [History: CACHED] + [New message: 50 tokens]
   Total processed: 50 tokens (only the new part)
   Cache refreshed: Yes (TTL reset)

3. Third message (within 5 min of previous)
   [Everything still cached]
   Total processed: ~50 tokens
   Cache refreshed: Yes

4. You walk away for 10 minutes...
   Cache TTL expires (5 min of no cache hits)

5. You return and send "hey"
   [System prompt: RE-PROCESSED] + [History: RE-PROCESSED] + ["hey"]
   Total processed: 101,000+ tokens
   Cache recreated: Yes

The TTL varies by model:

Claude 3.5 Sonnet:     5 minutes (default)
Claude Opus 4.5:       1 hour (extended)
Claude Haiku 4.5:      1 hour (extended)
Claude Sonnet 4.5:     1 hour (extended)

The trap is especially painful with the 200K context window models because:

Context window = 200,000 tokens

Fill it to 50%:  100,000 tokens potentially re-processed
Fill it to 75%:  150,000 tokens potentially re-processed
Fill it to 100%: 200,000 tokens potentially re-processed

A single "hey" could theoretically consume your ENTIRE monthly allocation.

Common mistakes

I made these errors before understanding the trap:

Leaving long sessions open indefinitely - The longer the session, the more expensive the stale trap.
Not using /compact before breaks - Missing a free opportunity to shrink context.
Reviving old sessions out of habit - Sometimes starting fresh is 100x cheaper.
Ignoring the 5-minute TTL - Cache expires faster than you think.
Not checking /cost periodically - Usage creep is invisible until it’s too late.

Best practices

Here’s my new workflow:

START OF SESSION:
1. Begin with a clear objective
2. Use /compact if context grows past 20K tokens

BEFORE BREAKS:
1. Run /compact
2. Note where you left off

AFTER BREAKS (>5 min):
1. Check /context
2. If context > 10K tokens:
   a. Option A: Continue if task is almost done
   b. Option B: Start fresh with summary

END OF SESSION:
1. Run /compact one final time
2. Check /cost to track usage

Summary

The stale chat trap occurs when you return to an old Claude Code session after the prompt cache expires (typically 5 minutes). A simple message forces Claude to re-process your entire conversation history, potentially consuming massive tokens.

To avoid it:

Use /compact before breaks - Shrink your context from 100K to ~2K tokens
Start fresh when cache is expired - Paste only relevant context into a new session
Monitor with /context and /cost - Know your context size and usage

A 5-minute break shouldn’t cost 22% of your monthly usage. With these strategies, it won’t.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Saying 'hey' cost me 22% of my usage limits
👨‍💻 Anthropic Prompt Caching Documentation
👨‍💻 Claude Code Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!