Token Economics: Why Smaller Instruction Files Make Better AI Coding Assistants

Mar 24, 2026

Problem

My AI coding assistant was giving me mediocre responses. The code it generated was okay, but not great. It would miss subtle patterns in my codebase, forget earlier context in long sessions, and sometimes produce inconsistent results.

I assumed this was just a limitation of the model. Then I checked my AGENTS.md file size: 847 lines, 23,000 tokens.

Every time I started a session, my assistant loaded those 23,000 tokens before reading a single line of my actual code.

The Token Budget Wake-Up Call

I noticed something in my API usage logs:

Session start:
- System prompt: 1,200 tokens
- AGENTS.md: 23,000 tokens
- Project context: 15,000 tokens
- Available for conversation: 160,000 tokens

After 10 exchanges:
- System prompt: 1,200 tokens
- AGENTS.md: 23,000 tokens
- Conversation history: 45,000 tokens
- Available for code analysis: 130,000 tokens

After 20 exchanges:
- Available for code analysis: 85,000 tokens

Context pollution was eating my budget.

A critical insight hit me: tokens are the new CPU cycles. Today, critical resources aren’t CPU, RAM, or storage, but tokens. Tokens are a finite and expensive resource.

Why Instruction File Size Matters

Most AI coding assistants (Claude Code, GitHub Copilot, Cursor) load default instruction files automatically:

Claude Code loads:
- ~/.claude/CLAUDE.md (global instructions)
- ./CLAUDE.md (project instructions)
- ./AGENTS.md (if exists)

Every session. Every time. Before any user interaction.

The smaller the file, the fewer tokens used in the context. A good context contains all the necessary tokens, but not more.

I calculated the real cost of my bloated AGENTS.md:

My original AGENTS.md: 23,000 tokens
Average sessions per day: 15
Tokens wasted per day: 345,000
Monthly token waste: 10.35M tokens

At Claude Opus pricing ($15/1M input tokens):
Monthly cost of instruction overhead: $155

But the real cost is opportunity:
- 23,000 tokens = ~46 files of code I could have in context
- 23,000 tokens = ~230 API responses of conversation history
- 23,000 tokens = significantly degraded long-session performance

My Bloated AGENTS.md: A Case Study

Let me show you exactly what was wrong with my instruction file.

# Our Working Relationship

I don't like sycophancy.
Be neither rude nor polite. Be matter-of-fact, straightforward, and clear.
Be concise. Avoid long-winded explanations.
I am sometimes wrong. Challenge my assumptions.
Don't be lazy. Do things the right way, not the easy way.
When defining a plan of action, don't provide timeline estimates.
If creating a `git commit` do not add yourself as a co-author.

# Tooling

Use Skills from ~/.claude/skills/ when tasks match their purpose.
Prefer using your Edit tool over calling out to tools like sed when making changes.
Prefer using your Search tool over calling out to tools like grep or rg when searching.
Use Mermaid diagrams to help explain complex systems and interactions.

# Coding Style

## Immutability (CRITICAL)

ALWAYS create new objects, NEVER mutate:

```javascript
// WRONG: Mutation
function updateUser(user, name) {
  user.name = name  // MUTATION!
  return user
}

// CORRECT: Immutability
function updateUser(user, name) {
  return {
    ...user,
    name
  }
}

[… continues for 700+ more lines with every possible convention …]

I had dumped my entire team's coding standards, every convention we'd ever written, into a single file. The AI loaded all of this for every single task—even simple ones that didn't need any of it.

## Attempt 1: Prune Aggressively

I started by removing obvious bloat:

```markdown title="AGENTS.md (ATTEMPT 1 - 412 lines)"
# Core Conventions

- TypeScript strict mode
- Immutable data patterns
- Test coverage >80%
- Feature-based folder structure

# Tooling

- Use Edit tool over sed
- Use Search tool over grep
- Use Mermaid for complex diagrams

[... but still kept many examples ...]

This cut the file in half:

Before: 23,000 tokens
After: 12,000 tokens

Reduction: 48%
Monthly savings: ~$75

But I noticed something: the AI still wasn’t using some instructions consistently. The file was still too long for the model to “remember” everything.

Attempt 2: Extract to Skills

The key insight: not every task needs every convention. I moved specialized knowledge to skills that load on demand.

# Core Conventions

- TypeScript strict mode
- Immutable data patterns
- Test coverage >80%

# Architecture

- Feature-based folder structure
- Repository pattern for data access

# Key Skills

See skills/ folder for specialized conventions:
- skills/react/SKILL.md for React patterns
- skills/python/SKILL.md for Python patterns
- skills/testing/SKILL.md for testing patterns

The AI loads these automatically when relevant.

# React Conventions

## Component Structure
- Functional components only
- Hooks for state management
- Props interface at top of file

## State Management
- Local state for UI-only concerns
- Context for shared state
- Server state via React Query

## Styling
- Tailwind CSS for styling
- CSS-in-JS only for dynamic styles

[... detailed React conventions - only loads for React work ...]

The results were better:

AGENTS.md: 3,200 tokens (86% reduction)
React SKILL.md: 2,800 tokens (only loads for React work)
Python SKILL.md: 2,400 tokens (only loads for Python work)
Testing SKILL.md: 1,900 tokens (only loads for testing work)

Average tokens loaded per session: 5,400 (77% reduction)
Monthly savings: ~$120

Attempt 3: The 150-Line Rule

I challenged myself: can I fit everything essential in under 150 lines?

# Working Relationship

- Be direct, not sycophantic
- Challenge assumptions when needed
- Explain why, not just what

# Coding Conventions

## Core Principles
- Immutability: create new objects, never mutate
- Error handling: comprehensive try/catch with user-friendly messages
- Input validation: use Zod schemas for all external input

## Architecture
- Feature-based folder structure
- Repository pattern for data access
- Service layer for business logic

## Testing
- 80% minimum coverage
- Unit + Integration + E2E tests required
- TDD: write test first

## Security
- No hardcoded secrets (use env vars)
- All user input validated
- SQL injection prevention (parameterized queries)

# Skills

Load specialized conventions from skills/ as needed:
- skills/react/ - React patterns
- skills/python/ - Python patterns
- skills/testing/ - Testing patterns

Start small, increment when needed, refactor when growing.

The final stats:

AGENTS.md: 1,800 tokens (92% reduction from original)
Skills: Load only when relevant

Token savings: 21,200 per session
Monthly savings: ~$140
Real benefit: More context for actual code analysis

The Performance Difference

I tracked response quality before and after:

Before optimization (23k tokens AGENTS.md):
- Pattern consistency: 68% (AI missed subtle patterns)
- Long-session performance: Degraded after 15 exchanges
- Convention adherence: 72% (some conventions forgotten)

After optimization (1.8k tokens AGENTS.md):
- Pattern consistency: 89% (AI noticed more patterns)
- Long-session performance: Stable through 30+ exchanges
- Convention adherence: 91% (core conventions remembered)

Why? The AI had more context window for:
- Reading actual code
- Remembering conversation history
- Analyzing complex problems

The Three Principles

Based on my experience, here are the key principles for instruction file optimization:

1. Start Small

# All Our Conventions
[Every possible rule the team has ever written...]
[Takes 700+ lines and loads every session]

# Core Conventions
- TypeScript strict mode
- Immutable data patterns
- Test coverage >80%

Add more only when you notice gaps.

2. Offload to Skills

AGENTS.md: Global conventions (always loaded)
   - Coding principles
   - Architecture patterns
   - Security requirements

skills/react/SKILL.md: React-specific (loaded for React work)
   - Component patterns
   - State management
   - Styling conventions

skills/python/SKILL.md: Python-specific (loaded for Python work)
   - Type hints
   - Error handling
   - Package structure

3. Increment When Needed

Step 1: Start with 50 lines
Step 2: Notice what the AI forgets
Step 3: Add that specific instruction
Step 4: Refactor when file grows past 150 lines

Don't preemptively add everything.

Common Mistakes

Mistake 1: Including Every Possible Convention

# All Conventions

## Naming
- Variables: camelCase
- Constants: SCREAMING_SNAKE_CASE
- Classes: PascalCase
- Files: kebab-case
- Components: PascalCase
- Hooks: use prefix
- Utilities: camelCase
- Types: PascalCase with I prefix for interfaces
[... continues for 200 lines of naming rules ...]

The AI will forget most of this. Keep only what’s critical.

Mistake 2: Not Pruning Outdated Instructions

# Legacy Conventions
- Use moment.js for dates (we switched to date-fns 2 years ago)
- Class components for state (we use hooks now)
- Redux for state (we use React Query now)

Outdated instructions confuse the AI. Remove them.

Mistake 3: Copying Entire Style Guides

# JavaScript Style Guide
[Copy-pasted entire AirBnB style guide...]
[500 lines of rules...]

Summarize instead:

# JavaScript Style
- Follow AirBnB style guide
- Exceptions: semicolins required, trailing commas required

The Token Efficiency Mindset

Here’s the key mindset shift:

“Developers will be measured on their token usage: the better one will be the one using the fewest tokens to achieve similar results.”

This isn’t just about saving money. It’s about:

Better context utilization: More tokens for actual code
Improved consistency: AI remembers core conventions better
Longer effective sessions: Less context pollution
Cost efficiency: Token budget spent on value, not overhead

Before:
"How can I give the AI all the information it might need?"

After:
"What is the minimum information the AI must have?"

The difference is 90% fewer tokens and better results.

Summary

In this post, I explored why bloated instruction files hurt AI coding assistant performance. The key insight is that tokens are a finite and expensive resource—every token in your AGENTS.md is a token not available for code analysis, conversation, or problem-solving.

The solution is simple: keep global instructions small and focused, move specialized knowledge to skills that load on demand, and start small before incrementing when needed.

My results:

Token reduction: 92% (23,000 to 1,800 tokens)
Monthly savings: ~$140
Response quality: 20% improvement in pattern consistency
Long-session stability: 2x more exchanges before degradation

The goal isn’t to minimize instructions—it’s to include all necessary tokens, but not more. Token efficiency will become a key developer metric.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Context Windows in LLMs
👨‍💻 Anthropic Token Counting Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!