Does Claude Pro Charge for Context in New Chats? Why First Messages Cost More
Problem
I opened a fresh Claude Pro chat - no history, no projects, no Artifacts. I asked a simple question: “What is 2+2?”
Then I checked my usage:
Usage: 4% of daily limit4% for a math question? In a completely new chat?
I assumed new chats would have zero overhead. No context to load, no conversation history, no nothing. But my usage meter showed otherwise.
Was Anthropic charging me for “phantom context”?
Environment
- Claude Pro subscription (Sonnet 4.5)
- Web interface at claude.ai
- Fresh chat session (no history)
- Simple arithmetic question
What happened?
I wanted to understand why Claude Pro charges for new chats with no prior context.
Here’s what I expected:
NEW CHAT COST (my expectation):├─ Model initialization: 0% (should be pre-loaded)├─ System prompt: 0% (should be cached)├─ User prompt: 0.1% (simple "2+2" question)└─ Total: ~0.1%Here’s what I saw:
NEW CHAT COST (reality):├─ Model initialization: 2%├─ System prompt: 0.5%├─ User prompt: 1-1.5%└─ Total: 3-4%The Reddit thread I found showed I wasn’t alone:
“I ask this question on the web in a new chat, so there’s no context, project or anything else to load”
But users were still seeing 3-4% usage on their first message.
The core confusion: We expected “new chat” to mean “zero startup cost.” We were wrong.
How to solve it?
I tried to understand what was actually happening behind the scenes.
First, I tested with a genuinely simple prompt:
What is 2+2?Usage: ~3%
Then I tested with an elaborated prompt:
What is 2+2? Please show your work and explain the mathematical principles behind addition.Usage: ~4%
I realized my “simple” 2+2 question wasn’t so simple. Even the basic version triggered:
1. Model loading into GPU memory2. System prompt application (Claude's instructions)3. Prompt tokenization and processing4. Response generation5. Response tokenizationEach step costs tokens. Each step gets counted.
I tried testing continued chats to compare:
[First message in new chat]"What is 2+2?"Usage: 3-4%
[Follow-up in same chat]"Okay, what about 3+3?"Usage: ~1%The follow-up was cheaper because:
- Model already loaded (no initialization cost)
- System prompt already applied
- Only new tokens processed
So the solution for cost optimization:
OPTION 1: Batch questions in one chat[Chat 1]"2+2?""3+3?""4+4?"Total cost: ~5%
OPTION 2: Start new chats for each question[Chat 1] "2+2?" → 3-4%[Chat 2] "3+3?" → 3-4%[Chat 3] "4+4?" → 3-4%Total cost: ~9-12%The reason
I think the key reason for the confusion is misunderstanding “context loading” vs “model initialization.”
Context Loading (What I thought was happening)
This happens in continued conversations:
CONTINUED CHAT COST BREAKDOWN:├─ Previous context retrieval: 0.5%├─ New prompt tokens: 0.5-1%├─ Response generation: 0.5-1%└─ Total: 1.5-2.5%The model needs to:
- Retrieve conversation history from previous messages
- Maintain thread continuity
- Preserve context across turns
This cost increases with conversation length.
Model Initialization (What actually happens in new chats)
This happens in fresh conversations:
NEW CHAT COST BREAKDOWN:├─ Model loading into GPU: 1.5-2%├─ System prompt application: 0.5%├─ User prompt processing: 0.5-1%├─ Response generation: 0.5-1%└─ Total: 3-4%The model needs to:
- Load Claude Sonnet 4.5 weights into GPU memory
- Apply system prompt (Claude’s base instructions)
- Process your initial prompt
- Generate response
This is a fixed cost, regardless of how “simple” your question seems.
Why my “2+2” question cost more than I expected
I realized my question wasn’t just “2+2”. Even the simplest prompt triggers:
Input: "What is 2+2?"
Actual processing:├─ Tokenize input: ["What", " is", " 2", "+", "2", "?"]├─ Load model into memory: ~200MB├─ Apply system prompt: ~5K tokens├─ Process input tokens: ~6 tokens├─ Generate response: ~50 tokens└─ Total: ~5,056 tokens processedThose ~5,000 tokens from the system prompt alone account for most of the 3-4% usage. My “2+2” question was just the tip of the iceberg.
What Anthropic could do better
The transparency issue:
CURRENT USAGE DISPLAY:"Usage: 4%"
BETTER USAGE DISPLAY:"Model initialization: 2%""System prompt: 1%""Your prompt: 0.5%""Response: 0.5%""Total: 4%"Users need to see what they’re paying for. “Phantom context” feels unfair. “Model initialization” makes sense.
Related knowledge
How LLM token pricing works
Large language models charge based on tokens processed:
1 token ≈ 4 characters (English)1 token ≈ 0.75 words
Claude Pro daily limit: ~200K tokens200K tokens ≈ 150K words ≈ 300 pagesEvery request costs:
- Input tokens (your prompt)
- Output tokens (Claude’s response)
- System tokens (Claude’s instructions)
GPU memory and model loading
Claude Sonnet 4.5 is a large model:
Model size: ~200GBGPU memory required: ~200GBLoading time: 1-2 secondsThis loading happens on every new chat (or every ~30 minutes of inactivity). It’s not “context” - it’s infrastructure.
Cost optimization strategies
Based on what I learned:
STRATEGY 1: Batch related questionsStart ONE chat for a coding sessionAsk all related questions in that threadBenefit: Pay initialization once, not 5 times
STRATEGY 2: Use Projects for complex workflowsProjects maintain context across chatsReduces repeated initialization overheadBenefit: Share context, not re-explain each time
STRATEGY 3: Track your usage patternsSimple questions: ~3-4% per new chatComplex questions: ~5-8% per new chatContinued chats: ~1-2% per follow-upSummary
In this post, I explained why Claude Pro charges 3-4% for first messages in new chats, even with no prior context. The key point is that new chats pay for model initialization (loading Claude Sonnet 4.5 into GPU memory and applying the system prompt), not context loading. This is a fixed technical overhead, not a hidden fee.
To optimize your Claude Pro usage, batch related questions in a single conversation rather than starting fresh chats. Continued conversations only pay for new tokens, while new chats pay for the full initialization overhead each time.
The transparency issue remains: Anthropic should break down usage by “initialization” vs “prompt” vs “response” so users understand what they’re paying for.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: Claude consumes 4% usage on 2+2 question in Pro plan
- 👨💻 Anthropic Pricing Documentation
- 👨💻 Understanding LLM Token Costs
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments