Token Economics: Why Smaller Instruction Files Make Better AI Coding Assistants
Problem
My AI coding assistant was giving me mediocre responses. The code it generated was okay, but not great. It would miss subtle patterns in my codebase, forget earlier context in long sessions, and sometimes produce inconsistent results.
I assumed this was just a limitation of the model. Then I checked my AGENTS.md file size: 847 lines, 23,000 tokens.
Every time I started a session, my assistant loaded those 23,000 tokens before reading a single line of my actual code.
The Token Budget Wake-Up Call
I noticed something in my API usage logs:
Session start:- System prompt: 1,200 tokens- AGENTS.md: 23,000 tokens- Project context: 15,000 tokens- Available for conversation: 160,000 tokens
After 10 exchanges:- System prompt: 1,200 tokens- AGENTS.md: 23,000 tokens- Conversation history: 45,000 tokens- Available for code analysis: 130,000 tokens
After 20 exchanges:- Available for code analysis: 85,000 tokens
Context pollution was eating my budget.A critical insight hit me: tokens are the new CPU cycles. Today, critical resources aren’t CPU, RAM, or storage, but tokens. Tokens are a finite and expensive resource.
Why Instruction File Size Matters
Most AI coding assistants (Claude Code, GitHub Copilot, Cursor) load default instruction files automatically:
Claude Code loads:- ~/.claude/CLAUDE.md (global instructions)- ./CLAUDE.md (project instructions)- ./AGENTS.md (if exists)
Every session. Every time. Before any user interaction.The smaller the file, the fewer tokens used in the context. A good context contains all the necessary tokens, but not more.
I calculated the real cost of my bloated AGENTS.md:
My original AGENTS.md: 23,000 tokensAverage sessions per day: 15Tokens wasted per day: 345,000Monthly token waste: 10.35M tokens
At Claude Opus pricing ($15/1M input tokens):Monthly cost of instruction overhead: $155
But the real cost is opportunity:- 23,000 tokens = ~46 files of code I could have in context- 23,000 tokens = ~230 API responses of conversation history- 23,000 tokens = significantly degraded long-session performanceMy Bloated AGENTS.md: A Case Study
Let me show you exactly what was wrong with my instruction file.
# Our Working Relationship
I don't like sycophancy.Be neither rude nor polite. Be matter-of-fact, straightforward, and clear.Be concise. Avoid long-winded explanations.I am sometimes wrong. Challenge my assumptions.Don't be lazy. Do things the right way, not the easy way.When defining a plan of action, don't provide timeline estimates.If creating a `git commit` do not add yourself as a co-author.
# Tooling
Use Skills from ~/.claude/skills/ when tasks match their purpose.Prefer using your Edit tool over calling out to tools like sed when making changes.Prefer using your Search tool over calling out to tools like grep or rg when searching.Use Mermaid diagrams to help explain complex systems and interactions.
# Coding Style
## Immutability (CRITICAL)
ALWAYS create new objects, NEVER mutate:
```javascript// WRONG: Mutationfunction updateUser(user, name) { user.name = name // MUTATION! return user}
// CORRECT: Immutabilityfunction updateUser(user, name) { return { ...user, name }}[… continues for 700+ more lines with every possible convention …]
I had dumped my entire team's coding standards, every convention we'd ever written, into a single file. The AI loaded all of this for every single task—even simple ones that didn't need any of it.
## Attempt 1: Prune Aggressively
I started by removing obvious bloat:
```markdown title="AGENTS.md (ATTEMPT 1 - 412 lines)"# Core Conventions
- TypeScript strict mode- Immutable data patterns- Test coverage >80%- Feature-based folder structure
# Tooling
- Use Edit tool over sed- Use Search tool over grep- Use Mermaid for complex diagrams
[... but still kept many examples ...]This cut the file in half:
Before: 23,000 tokensAfter: 12,000 tokens
Reduction: 48%Monthly savings: ~$75But I noticed something: the AI still wasn’t using some instructions consistently. The file was still too long for the model to “remember” everything.
Attempt 2: Extract to Skills
The key insight: not every task needs every convention. I moved specialized knowledge to skills that load on demand.
# Core Conventions
- TypeScript strict mode- Immutable data patterns- Test coverage >80%
# Architecture
- Feature-based folder structure- Repository pattern for data access
# Key Skills
See skills/ folder for specialized conventions:- skills/react/SKILL.md for React patterns- skills/python/SKILL.md for Python patterns- skills/testing/SKILL.md for testing patterns
The AI loads these automatically when relevant.# React Conventions
## Component Structure- Functional components only- Hooks for state management- Props interface at top of file
## State Management- Local state for UI-only concerns- Context for shared state- Server state via React Query
## Styling- Tailwind CSS for styling- CSS-in-JS only for dynamic styles
[... detailed React conventions - only loads for React work ...]The results were better:
AGENTS.md: 3,200 tokens (86% reduction)React SKILL.md: 2,800 tokens (only loads for React work)Python SKILL.md: 2,400 tokens (only loads for Python work)Testing SKILL.md: 1,900 tokens (only loads for testing work)
Average tokens loaded per session: 5,400 (77% reduction)Monthly savings: ~$120Attempt 3: The 150-Line Rule
I challenged myself: can I fit everything essential in under 150 lines?
# Working Relationship
- Be direct, not sycophantic- Challenge assumptions when needed- Explain why, not just what
# Coding Conventions
## Core Principles- Immutability: create new objects, never mutate- Error handling: comprehensive try/catch with user-friendly messages- Input validation: use Zod schemas for all external input
## Architecture- Feature-based folder structure- Repository pattern for data access- Service layer for business logic
## Testing- 80% minimum coverage- Unit + Integration + E2E tests required- TDD: write test first
## Security- No hardcoded secrets (use env vars)- All user input validated- SQL injection prevention (parameterized queries)
# Skills
Load specialized conventions from skills/ as needed:- skills/react/ - React patterns- skills/python/ - Python patterns- skills/testing/ - Testing patterns
Start small, increment when needed, refactor when growing.The final stats:
AGENTS.md: 1,800 tokens (92% reduction from original)Skills: Load only when relevant
Token savings: 21,200 per sessionMonthly savings: ~$140Real benefit: More context for actual code analysisThe Performance Difference
I tracked response quality before and after:
Before optimization (23k tokens AGENTS.md):- Pattern consistency: 68% (AI missed subtle patterns)- Long-session performance: Degraded after 15 exchanges- Convention adherence: 72% (some conventions forgotten)
After optimization (1.8k tokens AGENTS.md):- Pattern consistency: 89% (AI noticed more patterns)- Long-session performance: Stable through 30+ exchanges- Convention adherence: 91% (core conventions remembered)
Why? The AI had more context window for:- Reading actual code- Remembering conversation history- Analyzing complex problemsThe Three Principles
Based on my experience, here are the key principles for instruction file optimization:
1. Start Small
# All Our Conventions[Every possible rule the team has ever written...][Takes 700+ lines and loads every session]# Core Conventions- TypeScript strict mode- Immutable data patterns- Test coverage >80%
Add more only when you notice gaps.2. Offload to Skills
AGENTS.md: Global conventions (always loaded) - Coding principles - Architecture patterns - Security requirements
skills/react/SKILL.md: React-specific (loaded for React work) - Component patterns - State management - Styling conventions
skills/python/SKILL.md: Python-specific (loaded for Python work) - Type hints - Error handling - Package structure3. Increment When Needed
Step 1: Start with 50 linesStep 2: Notice what the AI forgetsStep 3: Add that specific instructionStep 4: Refactor when file grows past 150 lines
Don't preemptively add everything.Common Mistakes
Mistake 1: Including Every Possible Convention
# All Conventions
## Naming- Variables: camelCase- Constants: SCREAMING_SNAKE_CASE- Classes: PascalCase- Files: kebab-case- Components: PascalCase- Hooks: use prefix- Utilities: camelCase- Types: PascalCase with I prefix for interfaces[... continues for 200 lines of naming rules ...]The AI will forget most of this. Keep only what’s critical.
Mistake 2: Not Pruning Outdated Instructions
# Legacy Conventions- Use moment.js for dates (we switched to date-fns 2 years ago)- Class components for state (we use hooks now)- Redux for state (we use React Query now)Outdated instructions confuse the AI. Remove them.
Mistake 3: Copying Entire Style Guides
# JavaScript Style Guide[Copy-pasted entire AirBnB style guide...][500 lines of rules...]Summarize instead:
# JavaScript Style- Follow AirBnB style guide- Exceptions: semicolins required, trailing commas requiredThe Token Efficiency Mindset
Here’s the key mindset shift:
“Developers will be measured on their token usage: the better one will be the one using the fewest tokens to achieve similar results.”
This isn’t just about saving money. It’s about:
- Better context utilization: More tokens for actual code
- Improved consistency: AI remembers core conventions better
- Longer effective sessions: Less context pollution
- Cost efficiency: Token budget spent on value, not overhead
Before:"How can I give the AI all the information it might need?"
After:"What is the minimum information the AI must have?"
The difference is 90% fewer tokens and better results.Summary
In this post, I explored why bloated instruction files hurt AI coding assistant performance. The key insight is that tokens are a finite and expensive resource—every token in your AGENTS.md is a token not available for code analysis, conversation, or problem-solving.
The solution is simple: keep global instructions small and focused, move specialized knowledge to skills that load on demand, and start small before incrementing when needed.
My results:
- Token reduction: 92% (23,000 to 1,800 tokens)
- Monthly savings: ~$140
- Response quality: 20% improvement in pattern consistency
- Long-session stability: 2x more exchanges before degradation
The goal isn’t to minimize instructions—it’s to include all necessary tokens, but not more. Token efficiency will become a key developer metric.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments