Prompt Caching in Claude Code: 84% of Input Tokens Cached
The 84% Cache Hit Discovery
I tracked 1,289 requests in Claude Code totaling 100.9M tokens. The most surprising number wasn’t the token count — it was the cache hit rate: 84.2M out of 100.3M input tokens were served from cache.
That’s an 84% cache hit rate.
This means prompt caching saved me approximately 74% on input token costs. But here’s what surprised me: the cached requests weren’t any faster. The caching reduces costs, not latency.
How Prompt Caching Works
The Prefix-Based System
Anthropic’s prompt caching works on a simple principle:
Request 1: [System prompt][Files][History][New question] └────── Cached ──────┘
Request 2: [System prompt][Files][History][New question] └────── Cache HIT ─────┘ Only new content processedThe cache stores processed tokens at the beginning of your prompt. When subsequent requests share the same prefix, those tokens are retrieved from cache instead of being reprocessed.
What Gets Cached in Claude Code
| Component | Cache Behavior | Why It Works |
|---|---|---|
| System instructions | Always cached | Identical across requests |
| CLAUDE.md content | Usually cached | Stable project context |
| File contents | Frequently cached | Same files re-read often |
| Conversation history | Mostly cached | Grows incrementally |
Cache Hit Rate Breakdown
From my 100M token dataset:
Total Input Tokens: 100.3M (100%)├── Cached: 84.2M (84%)└── Uncached: 16.1M (16%)The 16% uncached portion represents:
- New files added to context
- Recent messages in the conversation
- My latest prompts and questions
- Any changed content
The Cost Savings
Pricing Comparison
| Token Type | Standard Price | Cached Price | Savings |
|---|---|---|---|
| Input | $3.00/M | $3.00/M | 0% |
| Cached Input | $3.00/M | $0.30/M | 90% |
| Output | $15.00/M | $15.00/M | 0% |
Real Dollar Impact
For my 100.3M input tokens:
Without caching: 100.3M × $3.00/M = $300.90
With 84% cache hit: Cached: 84.2M × $0.30/M = $25.26 Uncached: 16.1M × $3.00/M = $48.30 Total: $73.56
Savings: $227.34 (76%)Per-Request Cost
Without caching: $300.90 / 1,289 = $0.23/requestWith caching: $73.56 / 1,289 = $0.06/requestEach request costs about 6 cents with caching versus 23 cents without.
The Speed Tradeoff
What Caching Actually Does
✓ Reduces: Token processing cost (by 90% for cached tokens)✓ Reduces: API infrastructure load✗ Does NOT reduce: Response generation time✗ Does NOT reduce: Network latency✗ Does NOT reduce: Time to first tokenWhy Speed Doesn’t Improve
When a cache hit occurs:
- The cached tokens are retrieved from storage
- They’re fed into the model exactly like fresh tokens
- The model processes them at the same speed
- Response generation takes the same time
The model still “reads” all the context — it just gets it from cache instead of reprocessing raw text. This saves money but not time.
Maximizing Cache Hits
What Works
DO:- Keep CLAUDE.md stable during a session- Work on related files in sequence- Reference the same files consistently- Use focused, single-purpose sessions
DON'T:- Modify CLAUDE.md frequently mid-session- Jump between unrelated tasks- Add many new files at once- Mix different project contextsCache Invalidation Triggers
Any change to the cached prefix invalidates the cache:
| Action | Impact |
|---|---|
| Edit CLAUDE.md | Cache invalidated |
| Add new files to context | Cache invalidated |
| Long gap (>5 min) | Cache may expire |
| Switch to different model | Cache invalidated |
Practical Workflow
1. Start session with clear intent2. Load all needed files at once3. Keep CLAUDE.md unchanged4. Make related requests in sequence5. Close session when switching contextsWhen Prompt Caching Matters Most
High-Impact Scenarios
Prompt caching provides the most value when:
- Long sessions: Many requests on the same codebase
- Large context: Reading many files repeatedly
- Iterative development: Refining code through multiple turns
- Documentation-heavy projects: CLAUDE.md with substantial content
Low-Impact Scenarios
Caching helps less when:
- Short, single-request sessions
- Frequently changing context
- New projects with minimal established files
- Jumping between unrelated tasks
Summary
I tracked 100M tokens of Claude Code usage and found an 84% cache hit rate on input tokens. This saved approximately 76% on input costs — about $227 for my usage.
The key insight is that prompt caching optimizes cost, not speed. Your bill goes down significantly, but response latency stays the same.
To maximize savings:
- Keep context stable within sessions
- Work on related files together
- Don’t modify CLAUDE.md mid-session
- Start fresh when switching contexts
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Anthropic Prompt Caching Documentation
- 👨💻 Reddit: I tracked 100M tokens of Coding with Claude Code
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments