Prompt Caching in Claude Code: 84% of Input Tokens Cached

Mar 10, 2026

The 84% Cache Hit Discovery

I tracked 1,289 requests in Claude Code totaling 100.9M tokens. The most surprising number wasn’t the token count — it was the cache hit rate: 84.2M out of 100.3M input tokens were served from cache.

That’s an 84% cache hit rate.

This means prompt caching saved me approximately 74% on input token costs. But here’s what surprised me: the cached requests weren’t any faster. The caching reduces costs, not latency.

How Prompt Caching Works

The Prefix-Based System

Anthropic’s prompt caching works on a simple principle:

Request 1: [System prompt][Files][History][New question]
           └────── Cached ──────┘

Request 2: [System prompt][Files][History][New question]
           └────── Cache HIT ─────┘  Only new content processed

The cache stores processed tokens at the beginning of your prompt. When subsequent requests share the same prefix, those tokens are retrieved from cache instead of being reprocessed.

What Gets Cached in Claude Code

Component	Cache Behavior	Why It Works
System instructions	Always cached	Identical across requests
CLAUDE.md content	Usually cached	Stable project context
File contents	Frequently cached	Same files re-read often
Conversation history	Mostly cached	Grows incrementally

Cache Hit Rate Breakdown

From my 100M token dataset:

Total Input Tokens:    100.3M (100%)
├── Cached:             84.2M (84%)
└── Uncached:           16.1M (16%)

The 16% uncached portion represents:

New files added to context
Recent messages in the conversation
My latest prompts and questions
Any changed content

The Cost Savings

Pricing Comparison

Token Type	Standard Price	Cached Price	Savings
Input	$3.00/M	$3.00/M	0%
Cached Input	$3.00/M	$0.30/M	90%
Output	$15.00/M	$15.00/M	0%

Real Dollar Impact

For my 100.3M input tokens:

Without caching:
  100.3M × $3.00/M = $300.90

With 84% cache hit:
  Cached:  84.2M × $0.30/M = $25.26
  Uncached: 16.1M × $3.00/M = $48.30
  Total:                      $73.56

Savings: $227.34 (76%)

Per-Request Cost

Without caching: $300.90 / 1,289 = $0.23/request
With caching:    $73.56 / 1,289 = $0.06/request

Each request costs about 6 cents with caching versus 23 cents without.

The Speed Tradeoff

What Caching Actually Does

✓ Reduces: Token processing cost (by 90% for cached tokens)
✓ Reduces: API infrastructure load
✗ Does NOT reduce: Response generation time
✗ Does NOT reduce: Network latency
✗ Does NOT reduce: Time to first token

Why Speed Doesn’t Improve

When a cache hit occurs:

The cached tokens are retrieved from storage
They’re fed into the model exactly like fresh tokens
The model processes them at the same speed
Response generation takes the same time

The model still “reads” all the context — it just gets it from cache instead of reprocessing raw text. This saves money but not time.

Maximizing Cache Hits

What Works

DO:
- Keep CLAUDE.md stable during a session
- Work on related files in sequence
- Reference the same files consistently
- Use focused, single-purpose sessions

DON'T:
- Modify CLAUDE.md frequently mid-session
- Jump between unrelated tasks
- Add many new files at once
- Mix different project contexts

Cache Invalidation Triggers

Any change to the cached prefix invalidates the cache:

Action	Impact
Edit CLAUDE.md	Cache invalidated
Add new files to context	Cache invalidated
Long gap (>5 min)	Cache may expire
Switch to different model	Cache invalidated

Practical Workflow

1. Start session with clear intent
2. Load all needed files at once
3. Keep CLAUDE.md unchanged
4. Make related requests in sequence
5. Close session when switching contexts

When Prompt Caching Matters Most

High-Impact Scenarios

Prompt caching provides the most value when:

Long sessions: Many requests on the same codebase
Large context: Reading many files repeatedly
Iterative development: Refining code through multiple turns
Documentation-heavy projects: CLAUDE.md with substantial content

Low-Impact Scenarios

Caching helps less when:

Short, single-request sessions
Frequently changing context
New projects with minimal established files
Jumping between unrelated tasks

Summary

I tracked 100M tokens of Claude Code usage and found an 84% cache hit rate on input tokens. This saved approximately 76% on input costs — about $227 for my usage.

The key insight is that prompt caching optimizes cost, not speed. Your bill goes down significantly, but response latency stays the same.

To maximize savings:

Keep context stable within sessions
Work on related files together
Don’t modify CLAUDE.md mid-session
Start fresh when switching contexts

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Anthropic Prompt Caching Documentation
👨‍💻 Reddit: I tracked 100M tokens of Coding with Claude Code

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!