Skip to content

Why Is My AI Coding Assistant Using So Many Tokens?

Problem

I have a Pro subscription for my AI coding assistant. Every week, I’d run out of tokens by day two or three. I wasn’t doing anything unusual—or so I thought.

When I asked around, I found others with the same problem:

“Token usage has been insanely bad for me the last two weeks. I’m hitting the weekly limit on pro in two days of fairly gentle usage” — Reddit user (Score: 3)

I needed to understand why.

The Hidden Token Costs

AI coding assistants consume tokens in ways you don’t see. Here’s what I discovered:

What You See

Visible token usage
- Chat messages (prompts and responses)
- Code generation
- File edits

What You Don’t See

Hidden token usage
- File reading (entire files processed)
- Code analysis (scanning project structure)
- Context retention (previous conversation turns)
- Background agents (monitoring and analysis)
- Tool invocations (search, read, write operations)

Each hidden operation adds tokens. Let me show you the math.

Token Consumption Breakdown

I analyzed a “simple” task:

Token flow for 'Fix bug in auth.py'
User prompt: "Fix the login bug"
┌─────────────────────────────────────┐
│ Step 1: Read auth.py │ → 5,000 tokens
│ (500 lines × ~10 tokens/line) │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Step 2: Read related imports │ → 9,000 tokens
│ (3 files × 300 lines each) │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Step 3: Analyze codebase structure │ → 2,000 tokens
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Step 4: Generate fix │ → 1,000 tokens
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Step 5: Write edited file │ → 500 tokens
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Step 6: Context overhead │ → 1,000 tokens
└─────────────────────────────────────┘
Total: ~18,500 tokens

One “simple” fix consumed nearly 20,000 tokens. That’s when I understood the problem.

Why AI Coding Assistants Are Token-Hungry

Whole-File Processing

Unlike IDE features that work on snippets, AI assistants need complete context:

When you ask 'Fix the login bug'
# The assistant needs:
# - The entire login.py file
# - Import dependencies
# - Related test files
# - Configuration files
# - Database models

Every file read adds tokens to the context window.

Context Accumulation

The context window grows with every interaction:

Context growth pattern
Message 1: "Fix bug in file A" → Context: 5,000 tokens
Message 2: "Now fix bug in file B" → Context: 10,000 tokens
Message 3: "Also fix bug in file C" → Context: 15,000 tokens

Each message keeps previous context in memory. The context grows, not resets.

Background Agents

From the Reddit discussion:

“Having 50 repos open, multiple worktrees, dozens of agents running continuously” — Reddit user (Score: 5)

Power users run continuous background operations:

  • Code review agents
  • Error detection agents
  • Context gathering agents
  • Optimization agents

Each agent maintains its own context overhead.

How I Started Monitoring

I built a simple token monitor:

token_monitor.py
import tiktoken
from datetime import datetime
class TokenMonitor:
def __init__(self, model="gpt-4"):
self.encoding = tiktoken.encoding_for_model(model)
self.usage_log = []
def count_tokens(self, text):
"""Count tokens in any text"""
return len(self.encoding.encode(text))
def track_operation(self, operation_type, content):
"""Log each operation's token cost"""
token_count = self.count_tokens(content)
self.usage_log.append({
"operation": operation_type,
"tokens": token_count,
"timestamp": datetime.now()
})
return token_count
def get_report(self):
"""See where tokens are going"""
by_type = {}
for entry in self.usage_log:
op = entry["operation"]
by_type[op] = by_type.get(op, 0) + entry["tokens"]
return {
"total_tokens": sum(by_type.values()),
"by_operation": by_type,
"operations": len(self.usage_log)
}
# Usage
monitor = TokenMonitor()
monitor.track_operation("file_read", file_content)
monitor.track_operation("user_prompt", user_message)
monitor.track_operation("assistant_response", response)
print(monitor.get_report())

The output showed me exactly where tokens were going:

Example report
{
"total_tokens": 45678,
"by_operation": {
"file_read": 35000, # 76% of usage!
"user_prompt": 2000,
"assistant_response": 8678
}
}

File reads were consuming 76% of my tokens.

How I Reduced Consumption

Before: Separate requests
Request 1: "Fix bug A" → 10,000 tokens
Request 2: "Fix bug B" → 10,000 tokens
Request 3: "Fix bug C" → 10,000 tokens
Total: 30,000 tokens
After: Batched request
Request: "Fix bugs A, B, C together"
Total: ~15,000 tokens

2. Reference Files by Name

Instead of re-reading files:

Inefficient
Read config.py (1,000 tokens)
... later ...
Read config.py again (1,000 tokens)
Efficient
Read config.py once (1,000 tokens)
Later: "In the config.py we read earlier..."

3. Use Smaller Context

I started new sessions for unrelated tasks instead of letting context grow:

Context management
Session 1: Project A work (close when done)
Session 2: Project B work (fresh start)
Session 3: Quick fix (minimal context)

Common Mistakes I Made

Mistake 1: Ignoring context growth Each message adds to context. Long conversations are expensive.

Mistake 2: Re-reading the same files Every read costs tokens. Reference by name instead.

Mistake 3: Not monitoring usage I was surprised when limits hit. Now I check daily.

Mistake 4: Assuming tokens = characters Code has higher token density than natural language. 100 characters of code can be 50 tokens.

Mistake 5: Leaving sessions open Long-running sessions accumulate massive context. Start fresh for new tasks.

Summary

In this post, I explained why AI coding assistants consume tokens so quickly. The key point is that hidden costs—file reads, context accumulation, and background operations—multiply your visible usage.

Start monitoring your token usage today. You’ll likely find, as I did, that file operations and context growth are the biggest consumers. One simple fix: close what you’re not using and batch related tasks together.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments