Skip to content

Prompt Caching in Claude Code: 84% of Input Tokens Cached

The 84% Cache Hit Discovery

I tracked 1,289 requests in Claude Code totaling 100.9M tokens. The most surprising number wasn’t the token count — it was the cache hit rate: 84.2M out of 100.3M input tokens were served from cache.

That’s an 84% cache hit rate.

This means prompt caching saved me approximately 74% on input token costs. But here’s what surprised me: the cached requests weren’t any faster. The caching reduces costs, not latency.

How Prompt Caching Works

The Prefix-Based System

Anthropic’s prompt caching works on a simple principle:

Cache mechanism
Request 1: [System prompt][Files][History][New question]
└────── Cached ──────┘
Request 2: [System prompt][Files][History][New question]
└────── Cache HIT ─────┘ Only new content processed

The cache stores processed tokens at the beginning of your prompt. When subsequent requests share the same prefix, those tokens are retrieved from cache instead of being reprocessed.

What Gets Cached in Claude Code

ComponentCache BehaviorWhy It Works
System instructionsAlways cachedIdentical across requests
CLAUDE.md contentUsually cachedStable project context
File contentsFrequently cachedSame files re-read often
Conversation historyMostly cachedGrows incrementally

Cache Hit Rate Breakdown

From my 100M token dataset:

Token distribution
Total Input Tokens: 100.3M (100%)
├── Cached: 84.2M (84%)
└── Uncached: 16.1M (16%)

The 16% uncached portion represents:

  • New files added to context
  • Recent messages in the conversation
  • My latest prompts and questions
  • Any changed content

The Cost Savings

Pricing Comparison

Token TypeStandard PriceCached PriceSavings
Input$3.00/M$3.00/M0%
Cached Input$3.00/M$0.30/M90%
Output$15.00/M$15.00/M0%

Real Dollar Impact

For my 100.3M input tokens:

Cost calculation
Without caching:
100.3M × $3.00/M = $300.90
With 84% cache hit:
Cached: 84.2M × $0.30/M = $25.26
Uncached: 16.1M × $3.00/M = $48.30
Total: $73.56
Savings: $227.34 (76%)

Per-Request Cost

Average costs
Without caching: $300.90 / 1,289 = $0.23/request
With caching: $73.56 / 1,289 = $0.06/request

Each request costs about 6 cents with caching versus 23 cents without.

The Speed Tradeoff

What Caching Actually Does

Caching effects
✓ Reduces: Token processing cost (by 90% for cached tokens)
✓ Reduces: API infrastructure load
✗ Does NOT reduce: Response generation time
✗ Does NOT reduce: Network latency
✗ Does NOT reduce: Time to first token

Why Speed Doesn’t Improve

When a cache hit occurs:

  1. The cached tokens are retrieved from storage
  2. They’re fed into the model exactly like fresh tokens
  3. The model processes them at the same speed
  4. Response generation takes the same time

The model still “reads” all the context — it just gets it from cache instead of reprocessing raw text. This saves money but not time.

Maximizing Cache Hits

What Works

Cache optimization
DO:
- Keep CLAUDE.md stable during a session
- Work on related files in sequence
- Reference the same files consistently
- Use focused, single-purpose sessions
DON'T:
- Modify CLAUDE.md frequently mid-session
- Jump between unrelated tasks
- Add many new files at once
- Mix different project contexts

Cache Invalidation Triggers

Any change to the cached prefix invalidates the cache:

ActionImpact
Edit CLAUDE.mdCache invalidated
Add new files to contextCache invalidated
Long gap (>5 min)Cache may expire
Switch to different modelCache invalidated

Practical Workflow

Session strategy
1. Start session with clear intent
2. Load all needed files at once
3. Keep CLAUDE.md unchanged
4. Make related requests in sequence
5. Close session when switching contexts

When Prompt Caching Matters Most

High-Impact Scenarios

Prompt caching provides the most value when:

  1. Long sessions: Many requests on the same codebase
  2. Large context: Reading many files repeatedly
  3. Iterative development: Refining code through multiple turns
  4. Documentation-heavy projects: CLAUDE.md with substantial content

Low-Impact Scenarios

Caching helps less when:

  • Short, single-request sessions
  • Frequently changing context
  • New projects with minimal established files
  • Jumping between unrelated tasks

Summary

I tracked 100M tokens of Claude Code usage and found an 84% cache hit rate on input tokens. This saved approximately 76% on input costs — about $227 for my usage.

The key insight is that prompt caching optimizes cost, not speed. Your bill goes down significantly, but response latency stays the same.

To maximize savings:

  • Keep context stable within sessions
  • Work on related files together
  • Don’t modify CLAUDE.md mid-session
  • Start fresh when switching contexts

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments