Why Is Token Usage DOWN with Claude's Larger Context Window?
I kept seeing “100% context used, 0% remaining” in my Claude conversations and my heart rate would spike. I’d immediately run /compact to free up space, convinced I was about to hit a wall.
But then I noticed something strange. My token usage wasn’t going up. It was going down. Way down.
What was going on?
The Old Anxiety Loop
Here’s what I used to do:
- Watch the context window percentage like a hawk
- Run
/compactpreemptively whenever it hit 70-80% - Start fresh conversations to avoid hitting limits
- Re-send the same documentation and system prompts over and over
Sound familiar? I was treating the context window like a gas tank that would strand me if it hit empty.
The Missing Piece: Context Caching
The reason token usage dropped? Context caching.
When you send the same content repeatedly in a conversation, Claude doesn’t re-process it from scratch. It caches the processed tokens and reuses them.
Here’s the math:
Scenario: 100K context document, 1K query
Without caching:- First query: 101K tokens charged- Second query: 101K tokens charged- Tenth query: 101K tokens charged- Total: 1,010K tokens
With caching:- First query: ~101K tokens (cache is built)- Second query: ~1K tokens (cache hit!)- Tenth query: ~1K tokens (still hitting cache)- Total: ~110K tokensThat’s roughly a 90% reduction in token usage. Cached tokens cost significantly less than fresh tokens.
Why I Was Sabotaging Myself
My old habits were actually increasing my costs:
Preemptive compaction - When I ran /compact unnecessarily, I threw away cached context that could have been reused.
Starting fresh conversations - New conversations can’t access cached context from previous ones. Every new chat meant re-processing the same system prompts and documentation.
Misunderstanding the UI - “100% context used” doesn’t mean the conversation must end. It just means the window is full. Claude can continue, and cached tokens stay cheap.
The Right Approach
Now I let the context window fill up naturally. I don’t compact until I actually need to. I keep conversations going instead of starting new ones.
The key insight: static content at the start of your prompt (system instructions, documentation, codebase context) gets cached. Dynamic content (new queries, responses) doesn’t need to be.
So if you’re still compulsively running /compact or starting fresh conversations to “save tokens” - stop. You’re probably spending more, not less.
Summary
In this post, I explained why Claude’s larger context window reduces token usage instead of increasing it. The key point is context caching - static content gets cached and reused across queries, making it dramatically cheaper. Old habits like preemptive compaction and starting fresh conversations actually increase costs by discarding cached context.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: Claude 1M Context Window Token Usage
- 👨💻 Anthropic Prompt Caching Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments