Skip to content

Why Is Token Usage DOWN with Claude's Larger Context Window?

I kept seeing “100% context used, 0% remaining” in my Claude conversations and my heart rate would spike. I’d immediately run /compact to free up space, convinced I was about to hit a wall.

But then I noticed something strange. My token usage wasn’t going up. It was going down. Way down.

What was going on?

The Old Anxiety Loop

Here’s what I used to do:

  • Watch the context window percentage like a hawk
  • Run /compact preemptively whenever it hit 70-80%
  • Start fresh conversations to avoid hitting limits
  • Re-send the same documentation and system prompts over and over

Sound familiar? I was treating the context window like a gas tank that would strand me if it hit empty.

The Missing Piece: Context Caching

The reason token usage dropped? Context caching.

When you send the same content repeatedly in a conversation, Claude doesn’t re-process it from scratch. It caches the processed tokens and reuses them.

Here’s the math:

Scenario: 100K context document, 1K query
Without caching:
- First query: 101K tokens charged
- Second query: 101K tokens charged
- Tenth query: 101K tokens charged
- Total: 1,010K tokens
With caching:
- First query: ~101K tokens (cache is built)
- Second query: ~1K tokens (cache hit!)
- Tenth query: ~1K tokens (still hitting cache)
- Total: ~110K tokens

That’s roughly a 90% reduction in token usage. Cached tokens cost significantly less than fresh tokens.

Why I Was Sabotaging Myself

My old habits were actually increasing my costs:

Preemptive compaction - When I ran /compact unnecessarily, I threw away cached context that could have been reused.

Starting fresh conversations - New conversations can’t access cached context from previous ones. Every new chat meant re-processing the same system prompts and documentation.

Misunderstanding the UI - “100% context used” doesn’t mean the conversation must end. It just means the window is full. Claude can continue, and cached tokens stay cheap.

The Right Approach

Now I let the context window fill up naturally. I don’t compact until I actually need to. I keep conversations going instead of starting new ones.

The key insight: static content at the start of your prompt (system instructions, documentation, codebase context) gets cached. Dynamic content (new queries, responses) doesn’t need to be.

So if you’re still compulsively running /compact or starting fresh conversations to “save tokens” - stop. You’re probably spending more, not less.

Summary

In this post, I explained why Claude’s larger context window reduces token usage instead of increasing it. The key point is context caching - static content gets cached and reused across queries, making it dramatically cheaper. Old habits like preemptive compaction and starting fresh conversations actually increase costs by discarding cached context.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments