Why Is My AI Coding Assistant Using So Many Tokens?
Problem
I have a Pro subscription for my AI coding assistant. Every week, I’d run out of tokens by day two or three. I wasn’t doing anything unusual—or so I thought.
When I asked around, I found others with the same problem:
“Token usage has been insanely bad for me the last two weeks. I’m hitting the weekly limit on pro in two days of fairly gentle usage” — Reddit user (Score: 3)
I needed to understand why.
The Hidden Token Costs
AI coding assistants consume tokens in ways you don’t see. Here’s what I discovered:
What You See
- Chat messages (prompts and responses)- Code generation- File editsWhat You Don’t See
- File reading (entire files processed)- Code analysis (scanning project structure)- Context retention (previous conversation turns)- Background agents (monitoring and analysis)- Tool invocations (search, read, write operations)Each hidden operation adds tokens. Let me show you the math.
Token Consumption Breakdown
I analyzed a “simple” task:
User prompt: "Fix the login bug" │ ▼┌─────────────────────────────────────┐│ Step 1: Read auth.py │ → 5,000 tokens│ (500 lines × ~10 tokens/line) │└─────────────────────────────────────┘ │ ▼┌─────────────────────────────────────┐│ Step 2: Read related imports │ → 9,000 tokens│ (3 files × 300 lines each) │└─────────────────────────────────────┘ │ ▼┌─────────────────────────────────────┐│ Step 3: Analyze codebase structure │ → 2,000 tokens└─────────────────────────────────────┘ │ ▼┌─────────────────────────────────────┐│ Step 4: Generate fix │ → 1,000 tokens└─────────────────────────────────────┘ │ ▼┌─────────────────────────────────────┐│ Step 5: Write edited file │ → 500 tokens└─────────────────────────────────────┘ │ ▼┌─────────────────────────────────────┐│ Step 6: Context overhead │ → 1,000 tokens└─────────────────────────────────────┘ │ ▼ Total: ~18,500 tokensOne “simple” fix consumed nearly 20,000 tokens. That’s when I understood the problem.
Why AI Coding Assistants Are Token-Hungry
Whole-File Processing
Unlike IDE features that work on snippets, AI assistants need complete context:
# The assistant needs:# - The entire login.py file# - Import dependencies# - Related test files# - Configuration files# - Database modelsEvery file read adds tokens to the context window.
Context Accumulation
The context window grows with every interaction:
Message 1: "Fix bug in file A" → Context: 5,000 tokensMessage 2: "Now fix bug in file B" → Context: 10,000 tokensMessage 3: "Also fix bug in file C" → Context: 15,000 tokensEach message keeps previous context in memory. The context grows, not resets.
Background Agents
From the Reddit discussion:
“Having 50 repos open, multiple worktrees, dozens of agents running continuously” — Reddit user (Score: 5)
Power users run continuous background operations:
- Code review agents
- Error detection agents
- Context gathering agents
- Optimization agents
Each agent maintains its own context overhead.
How I Started Monitoring
I built a simple token monitor:
import tiktokenfrom datetime import datetime
class TokenMonitor: def __init__(self, model="gpt-4"): self.encoding = tiktoken.encoding_for_model(model) self.usage_log = []
def count_tokens(self, text): """Count tokens in any text""" return len(self.encoding.encode(text))
def track_operation(self, operation_type, content): """Log each operation's token cost""" token_count = self.count_tokens(content) self.usage_log.append({ "operation": operation_type, "tokens": token_count, "timestamp": datetime.now() }) return token_count
def get_report(self): """See where tokens are going""" by_type = {} for entry in self.usage_log: op = entry["operation"] by_type[op] = by_type.get(op, 0) + entry["tokens"]
return { "total_tokens": sum(by_type.values()), "by_operation": by_type, "operations": len(self.usage_log) }
# Usagemonitor = TokenMonitor()monitor.track_operation("file_read", file_content)monitor.track_operation("user_prompt", user_message)monitor.track_operation("assistant_response", response)
print(monitor.get_report())The output showed me exactly where tokens were going:
{ "total_tokens": 45678, "by_operation": { "file_read": 35000, # 76% of usage! "user_prompt": 2000, "assistant_response": 8678 }}File reads were consuming 76% of my tokens.
How I Reduced Consumption
1. Batch Related Tasks
Request 1: "Fix bug A" → 10,000 tokensRequest 2: "Fix bug B" → 10,000 tokensRequest 3: "Fix bug C" → 10,000 tokensTotal: 30,000 tokens
After: Batched requestRequest: "Fix bugs A, B, C together"Total: ~15,000 tokens2. Reference Files by Name
Instead of re-reading files:
Read config.py (1,000 tokens)... later ...Read config.py again (1,000 tokens)
EfficientRead config.py once (1,000 tokens)Later: "In the config.py we read earlier..."3. Use Smaller Context
I started new sessions for unrelated tasks instead of letting context grow:
Session 1: Project A work (close when done)Session 2: Project B work (fresh start)Session 3: Quick fix (minimal context)Common Mistakes I Made
Mistake 1: Ignoring context growth Each message adds to context. Long conversations are expensive.
Mistake 2: Re-reading the same files Every read costs tokens. Reference by name instead.
Mistake 3: Not monitoring usage I was surprised when limits hit. Now I check daily.
Mistake 4: Assuming tokens = characters Code has higher token density than natural language. 100 characters of code can be 50 tokens.
Mistake 5: Leaving sessions open Long-running sessions accumulate massive context. Start fresh for new tasks.
Summary
In this post, I explained why AI coding assistants consume tokens so quickly. The key point is that hidden costs—file reads, context accumulation, and background operations—multiply your visible usage.
Start monitoring your token usage today. You’ll likely find, as I did, that file operations and context growth are the biggest consumers. One simple fix: close what you’re not using and batch related tasks together.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: AI Coding Token Usage
- 👨💻 OpenAI Token Documentation
- 👨💻 Anthropic Context Windows Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments