Why Claude Code Drains Token Usage After Session Resume
Problem
I resumed a Claude Code session after lunch. Checked my usage stats. My cache_read_input_tokens were stuck at 15,451 while cache_creation_input_tokens grew to 42,970. That’s a 26% cache ratio—down from 67% when the session started.
Why was my entire conversation being re-processed as new tokens instead of using cached ones?
I assumed it was normal behavior. Maybe long sessions just degrade over time. Then I found a Reddit thread where someone reverse-engineered the minified cli.js and identified the actual bug.
What I Found
A user named Rangizingo on r/ClaudeAI traced the problem to a specific function in Claude Code’s session handling:
function db8(A){ if(A.type==="attachment"&&ss1()!=="ant"){ if(A.attachment.type==="hook_additional_context" &&a6(process.env.CLAUDE_CODE_SAVE_HOOK_ADDITIONAL_CONTEXT))return!0; return!1 // drops EVERYTHING else, including deferred_tools_delta } if(A.type==="progress"&&Ns6(A.data?.type))return!1; return!0}The bug is on line 4: return!1. This function filters what gets saved to session files. For non-Anthropic users (the ss1()!=="ant" check), it strips ALL attachment-type messages—including the deferred_tools_delta records that track which tools have been announced to the model.
How This Breaks Caching
Claude Code saves session state to JSONL files in ~/.claude/projects/. When you resume a session:
┌─────────────────────────────────────────────────────────────────────┐│ NORMAL RESUME (with deferred_tools_delta preserved) │├─────────────────────────────────────────────────────────────────────┤│ 1. Load session file ││ 2. Check deferred_tools_delta: "tools already announced" ││ 3. Skip re-announcing tools ││ 4. System reminders stay at messages[0] ││ 5. Cache prefix unchanged ││ 6. Cache_read tokens: HIGH │└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐│ BUGGY RESUME (deferred_tools_delta stripped by db8) │├─────────────────────────────────────────────────────────────────────┤│ 1. Load session file ││ 2. Check deferred_tools_delta: EMPTY (stripped!) ││ 3. Re-announce ALL tools from scratch ││ 4. System reminders shift from messages[0] to messages[N] ││ 5. Billing hash changes (first user message content differs) ││ 6. cache_control breakpoint shifts ││ 7. Cache_read tokens: LOW (everything is cache_creation) │└─────────────────────────────────────────────────────────────────────┘The message array gets shifted. The billing hash changes. The cache_control breakpoint moves. Your entire conversation history becomes cache_creation_input_tokens instead of cache_read_input_tokens.
The Numbers
The Reddit user provided concrete metrics comparing stock Claude Code to a patched version:
┌──────────────────────────────────────────────────────────────────────┐│ STOCK CLAUDE CODE (BUGGY) │├──────────────────────────────────────────────────────────────────────┤│ cache_read_input_tokens: 15,451 (stuck, never grows) ││ cache_creation_input_tokens: 42,970 (keeps growing) ││ Cache ratio: 26% ││ ││ Result: Every resume re-processes entire conversation as NEW tokens │└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐│ PATCHED CLAUDE CODE (FIXED) │├──────────────────────────────────────────────────────────────────────┤│ cache_read_input_tokens: 57,684 (grows properly) ││ cache_creation_input_tokens: 611 (minimal) ││ Cache ratio: 99% ││ ││ Result: Resume uses cached tokens, minimal new creation │└──────────────────────────────────────────────────────────────────────┘That’s a 73 percentage point difference. From 26% to 99% cache utilization.
How to Check Your Own Stats
Run this Python one-liner against your session files:
tail -50 ~/.claude/projects/*/*.jsonl | python3 -c "import sys, jsonfor line in sys.stdin: try: d = json.loads(line.strip()) except: continue u = d.get('usage') or d.get('message',{}).get('usage') if not u or 'cache_read_input_tokens' not in u: continue cr, cc = u.get('cache_read_input_tokens',0), u.get('cache_creation_input_tokens',0) total = cr + cc + u.get('input_tokens',0) print(f'CR:{cr:>7,} CC:{cc:>7,} ratio:{cr/total*100:.0f}%' if total else '')"If your cache ratio is stuck around 25-30% on resumed sessions, you’re hitting this bug.
Why This Matters for Your Wallet
Prompt caching exists to save money. When your conversation is cached, you pay 10% of the normal input token price for cached content. When cache breaks:
Normal cached session: 100K tokens × 10% = ~$0.30Buggy resumed session: 100K tokens × 100% = ~$3.00
Difference: ~10x higher cost per resumed sessionThe Reddit thread’s top comment (218 points) highlighted the irony: “All our software engineers aren’t writing code anymore.” They meant developers spend more time managing AI tools than coding. But the deeper irony? We needed AI to find the bug in the AI tool.
The Root Cause
The db8 function decides what persists in session files. Its logic:
- If message type is “attachment” AND user is NOT on Anthropic’s internal network
- Keep only “hook_additional_context” (if environment variable permits)
- Drop everything else—including
deferred_tools_delta
This filter exists for a reason: reducing session file size, maybe privacy considerations. But it accidentally strips the metadata that makes prompt caching work across session boundaries.
Related Knowledge
What is deferred_tools_delta?
Claude Code uses a tool announcement system. When you start a session, Claude Code “announces” available tools to the model. The deferred_tools_delta attachment records which tools have been announced so far. On resume, Claude Code reads this record to avoid re-announcing.
What is prompt caching?
Anthropic’s prompt caching lets you cache portions of your conversation that stay constant. The first message with cache_control: {"type": "ephemeral"} becomes a cache breakpoint. Subsequent requests can read from that cache instead of re-processing tokens.
Why does the message array shift matter?
The cache breakpoint is tied to message position. When tools get re-announced, they insert new messages, shifting everything downstream. The hash that identifies “same content” changes, breaking the cache match.
Summary
I found that Claude Code’s token drain after session resume is caused by the db8 function stripping deferred_tools_delta records from session files. This breaks the cache prefix on resume, causing your entire conversation to be processed as cache_creation tokens instead of cache_read.
The fix would be preserving deferred_tools_delta attachments in the session save logic. Until Anthropic patches this, long sessions will drain tokens faster than expected—especially if you resume frequently.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: Claude Code Token Drain Bug Discussion
- 👨💻 Claude Code Documentation
- 👨💻 Anthropic Prompt Caching Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments