Skip to content

Does Claude Pro Charge for Context in New Chats? Why First Messages Cost More

Problem

I opened a fresh Claude Pro chat - no history, no projects, no Artifacts. I asked a simple question: “What is 2+2?”

Then I checked my usage:

Usage: 4% of daily limit

4% for a math question? In a completely new chat?

I assumed new chats would have zero overhead. No context to load, no conversation history, no nothing. But my usage meter showed otherwise.

Was Anthropic charging me for “phantom context”?

Environment

  • Claude Pro subscription (Sonnet 4.5)
  • Web interface at claude.ai
  • Fresh chat session (no history)
  • Simple arithmetic question

What happened?

I wanted to understand why Claude Pro charges for new chats with no prior context.

Here’s what I expected:

NEW CHAT COST (my expectation):
├─ Model initialization: 0% (should be pre-loaded)
├─ System prompt: 0% (should be cached)
├─ User prompt: 0.1% (simple "2+2" question)
└─ Total: ~0.1%

Here’s what I saw:

NEW CHAT COST (reality):
├─ Model initialization: 2%
├─ System prompt: 0.5%
├─ User prompt: 1-1.5%
└─ Total: 3-4%

The Reddit thread I found showed I wasn’t alone:

“I ask this question on the web in a new chat, so there’s no context, project or anything else to load”

But users were still seeing 3-4% usage on their first message.

The core confusion: We expected “new chat” to mean “zero startup cost.” We were wrong.

How to solve it?

I tried to understand what was actually happening behind the scenes.

First, I tested with a genuinely simple prompt:

What is 2+2?

Usage: ~3%

Then I tested with an elaborated prompt:

What is 2+2? Please show your work and explain the mathematical principles behind addition.

Usage: ~4%

I realized my “simple” 2+2 question wasn’t so simple. Even the basic version triggered:

1. Model loading into GPU memory
2. System prompt application (Claude's instructions)
3. Prompt tokenization and processing
4. Response generation
5. Response tokenization

Each step costs tokens. Each step gets counted.

I tried testing continued chats to compare:

[First message in new chat]
"What is 2+2?"
Usage: 3-4%
[Follow-up in same chat]
"Okay, what about 3+3?"
Usage: ~1%

The follow-up was cheaper because:

  • Model already loaded (no initialization cost)
  • System prompt already applied
  • Only new tokens processed

So the solution for cost optimization:

OPTION 1: Batch questions in one chat
[Chat 1]
"2+2?"
"3+3?"
"4+4?"
Total cost: ~5%
OPTION 2: Start new chats for each question
[Chat 1] "2+2?" → 3-4%
[Chat 2] "3+3?" → 3-4%
[Chat 3] "4+4?" → 3-4%
Total cost: ~9-12%

The reason

I think the key reason for the confusion is misunderstanding “context loading” vs “model initialization.”

Context Loading (What I thought was happening)

This happens in continued conversations:

CONTINUED CHAT COST BREAKDOWN:
├─ Previous context retrieval: 0.5%
├─ New prompt tokens: 0.5-1%
├─ Response generation: 0.5-1%
└─ Total: 1.5-2.5%

The model needs to:

  • Retrieve conversation history from previous messages
  • Maintain thread continuity
  • Preserve context across turns

This cost increases with conversation length.

Model Initialization (What actually happens in new chats)

This happens in fresh conversations:

NEW CHAT COST BREAKDOWN:
├─ Model loading into GPU: 1.5-2%
├─ System prompt application: 0.5%
├─ User prompt processing: 0.5-1%
├─ Response generation: 0.5-1%
└─ Total: 3-4%

The model needs to:

  • Load Claude Sonnet 4.5 weights into GPU memory
  • Apply system prompt (Claude’s base instructions)
  • Process your initial prompt
  • Generate response

This is a fixed cost, regardless of how “simple” your question seems.

Why my “2+2” question cost more than I expected

I realized my question wasn’t just “2+2”. Even the simplest prompt triggers:

Input: "What is 2+2?"
Actual processing:
├─ Tokenize input: ["What", " is", " 2", "+", "2", "?"]
├─ Load model into memory: ~200MB
├─ Apply system prompt: ~5K tokens
├─ Process input tokens: ~6 tokens
├─ Generate response: ~50 tokens
└─ Total: ~5,056 tokens processed

Those ~5,000 tokens from the system prompt alone account for most of the 3-4% usage. My “2+2” question was just the tip of the iceberg.

What Anthropic could do better

The transparency issue:

CURRENT USAGE DISPLAY:
"Usage: 4%"
BETTER USAGE DISPLAY:
"Model initialization: 2%"
"System prompt: 1%"
"Your prompt: 0.5%"
"Response: 0.5%"
"Total: 4%"

Users need to see what they’re paying for. “Phantom context” feels unfair. “Model initialization” makes sense.

How LLM token pricing works

Large language models charge based on tokens processed:

1 token ≈ 4 characters (English)
1 token ≈ 0.75 words
Claude Pro daily limit: ~200K tokens
200K tokens ≈ 150K words ≈ 300 pages

Every request costs:

  • Input tokens (your prompt)
  • Output tokens (Claude’s response)
  • System tokens (Claude’s instructions)

GPU memory and model loading

Claude Sonnet 4.5 is a large model:

Model size: ~200GB
GPU memory required: ~200GB
Loading time: 1-2 seconds

This loading happens on every new chat (or every ~30 minutes of inactivity). It’s not “context” - it’s infrastructure.

Cost optimization strategies

Based on what I learned:

STRATEGY 1: Batch related questions
Start ONE chat for a coding session
Ask all related questions in that thread
Benefit: Pay initialization once, not 5 times
STRATEGY 2: Use Projects for complex workflows
Projects maintain context across chats
Reduces repeated initialization overhead
Benefit: Share context, not re-explain each time
STRATEGY 3: Track your usage patterns
Simple questions: ~3-4% per new chat
Complex questions: ~5-8% per new chat
Continued chats: ~1-2% per follow-up

Summary

In this post, I explained why Claude Pro charges 3-4% for first messages in new chats, even with no prior context. The key point is that new chats pay for model initialization (loading Claude Sonnet 4.5 into GPU memory and applying the system prompt), not context loading. This is a fixed technical overhead, not a hidden fee.

To optimize your Claude Pro usage, batch related questions in a single conversation rather than starting fresh chats. Continued conversations only pay for new tokens, while new chats pay for the full initialization overhead each time.

The transparency issue remains: Anthropic should break down usage by “initialization” vs “prompt” vs “response” so users understand what they’re paying for.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments