How to Reduce Token Usage in Claude Code: Practical Tips

Mar 10, 2026

After tracking 100 million tokens in Claude Code, I discovered something critical: 99.4% of those tokens were input. Claude was reading 166 times more than it was writing.

This revelation changed my entire approach to using Claude Code. The bottleneck isn’t inference speed or code generation - it’s the re-reading loop. Every action requires re-reading the entire relevant context.

The good news? Once you understand this, you can reduce token usage by 30-50%. Not by writing less code, but by being smarter about what gets read.

The Real Opportunity

Here’s the insight that matters:

Wrong approach:  "Write less code"
Right approach:  "Read less context"

99.4% of tokens = input (reading)
0.6% of tokens = output (writing)

Optimize the 99.4%, not the 0.6%

Let me show you exactly how I reduced my token usage after realizing this.

Strategy 1: Optimize Your CLAUDE.md

Your CLAUDE.md file is read on every single turn. Keep it under 200 lines.

The Problem

I had a bloated CLAUDE.md with everything I thought Claude might need:

# Project Overview

This is a Next.js 14 project with TypeScript, Tailwind CSS, and PostgreSQL...

## Full Architecture

[100 lines of detailed architecture documentation]

## All Components

[80 lines listing every component]

## Every API Endpoint

[150 lines documenting all endpoints]

## Coding Standards

[50 lines of generic coding guidelines]

## Git Workflow

[30 lines of git commands]

## Testing Requirements

[40 lines of testing philosophy]

Every turn, Claude read all 450 lines. That’s wasted tokens on information it rarely needed.

The Solution

I trimmed it to essentials only:

# Project Overview
Next.js 14 + TypeScript + PostgreSQL + Tailwind

## Key Patterns
- Server Components by default
- Repository pattern for data access
- Zod for all validation

## Critical Files
- `src/lib/db.ts` - Database connection
- `src/lib/auth.ts` - Authentication
- `src/middleware.ts` - Route protection

## Coding Standards
- Functional components only
- No mutation - always return new objects
- Error handling required on all async operations

## Testing
- 80% coverage minimum
- Jest + Testing Library
- Run `pnpm test` before commits

Token savings: ~60% on CLAUDE.md reads

The Rule

✅ Include: Project-specific patterns, critical file locations
✅ Include: Non-obvious conventions unique to this project
✅ Include: Common pitfalls and how to avoid them

❌ Remove: Generic coding standards (Claude knows these)
❌ Remove: Exhaustive file listings (use .claudeignore instead)
❌ Remove: Documentation that can be inferred from code
❌ Remove: Git commands (Claude knows git)

Strategy 2: Use .claudeignore Aggressively

Claude Code reads files to understand your project. But not all files deserve to be read.

What Claude Doesn’t Need to Read

Create a .claudeignore file in your project root:

# Dependencies
node_modules/
.pnpm-store/

# Build outputs
.next/
dist/
build/
out/

# Cache directories
.cache/
.turbo/
.eslintcache/

# Test coverage
coverage/
.nyc_output/

# Generated files
*.generated.ts
*.generated.js

# Lock files (Claude doesn't need to read dependency versions)
package-lock.json
pnpm-lock.yaml
yarn.lock

# Environment files (for security AND token savings)
.env.local
.env.*.local

# Large static assets
public/images/
public/videos/
*.svg
*.png
*.jpg

# Documentation (read manually when needed)
docs/
*.md
!CLAUDE.md

# Config files Claude rarely needs
.vscode/
.idea/
*.config.js

Token Impact

Project scan: Read 1,247 files
Average context per request: 45,000 tokens

After .claudeignore:
Project scan: Read 312 files
Average context per request: 18,000 tokens

Savings: 60% reduction in context reading

The Heuristic

Ask: "Does Claude need to read this to write better code?"

YES: Keep it readable
  - Source code (.ts, .tsx, .js, .jsx)
  - Type definitions (.d.ts)
  - Test files (.test.ts, .spec.ts)
  - Key configuration (tsconfig.json, tailwind.config.ts)

NO: Add to .claudeignore
  - Dependencies (node_modules)
  - Build outputs (dist, .next)
  - Lock files (package-lock.json)
  - Assets (images, fonts)
  - Generated code
  - Documentation (except CLAUDE.md)

Strategy 3: Strategic Session Management

Every session starts fresh. Use this to your advantage.

The Wrong Approach

One long session for everything:

Session start: 9 AM
- Feature A implementation (50k tokens)
- Bug fix in Feature A (45k tokens - re-reading Feature A context)
- Feature B implementation (60k tokens - reading Features A + B)
- Refactor shared utility (70k tokens - re-reading everything)
- More bugs in Feature A (55k tokens - re-reading again)
Session end: 6 PM
Total: 280k tokens

The Right Approach

Fresh sessions for distinct tasks:

Session 1 (9-11 AM): Feature A implementation
  - Start fresh
  - Focus on Feature A only
  - End session
  Tokens: 40k

Session 2 (11 AM-1 PM): Feature B implementation
  - Start fresh
  - Focus on Feature B only
  - End session
  Tokens: 35k

Session 3 (2-4 PM): Shared refactor
  - Start fresh
  - Focus on shared utilities
  - End session
  Tokens: 25k

Session 4 (4-5 PM): Bug fixes
  - Start fresh
  - Fix bugs in isolation
  Tokens: 20k

Total: 120k tokens (57% savings)

The Principle

✅ Start fresh session for:
  - New feature development
  - Refactoring tasks
  - Bug investigation
  - Documentation updates

❌ Don't drag on sessions:
  - Context accumulates
  - Each turn re-reads more
  - Efficiency degrades

Strategy 4: Maximize Prompt Caching

Claude supports prompt caching for repeated content. Use it.

What Gets Cached

High cache value:
  - CLAUDE.md (read every turn)
  - Large source files (read frequently)
  - System prompts (always present)

Low cache value:
  - User messages (unique each time)
  - Small files (minimal savings)
  - Generated output (never re-read)

Implementation

Claude Code automatically caches:

System prompts
CLAUDE.md content
Frequently accessed files

You can optimize this by:

1. Keep CLAUDE.md stable (don't edit mid-session)
2. Avoid reformatting entire files
3. Make targeted edits, not wholesale rewrites
4. Use consistent file structure

Cache Hit Rate

Without optimization:
  - Cache hit rate: ~40%
  - Effective token cost: $2.50/M input

With optimization:
  - Cache hit rate: ~75%
  - Effective token cost: $1.25/M input

Savings: 50% reduction in input costs

Strategy 5: Context Scope Control

Don’t let Claude read everything when it only needs one module.

Explicit Scope

When asking for changes:

❌ Bad: "Fix the authentication bug"
   → Claude reads entire codebase

✅ Good: "In src/lib/auth.ts, the validateToken function
          returns true for expired tokens. Fix the expiration check."
   → Claude focuses on auth.ts

File References

❌ Bad: "Add a new API endpoint"
   → Claude scans all files looking for API patterns

✅ Good: "Add a new endpoint to src/app/api/users/route.ts
          following the pattern in src/app/api/posts/route.ts"
   → Claude reads two specific files

The Impact

Unscoped request:
  Files read: 45
  Tokens consumed: 35,000

Scoped request:
  Files read: 3
  Tokens consumed: 4,200

Savings: 88% reduction for that request

Practical Checklist

Here’s my daily checklist for token efficiency:

Before starting:
□ CLAUDE.md under 200 lines?
□ .claudeignore configured?
□ Session has clear, narrow scope?

During session:
□ Scope requests to specific files/modules?
□ Ending session at natural breakpoints?
□ Avoiding context-heavy tangents?

After session:
□ Review token usage (Claude shows this)
□ Identify what Claude read unnecessarily
□ Update .claudeignore if needed

Expected Token Savings

Based on my experiments:

Strategy 1: Optimized CLAUDE.md      → 15-25% savings
Strategy 2: .claudeignore            → 20-40% savings
Strategy 3: Session management       → 10-30% savings
Strategy 4: Prompt caching           → 10-20% savings
Strategy 5: Context scoping          → 15-50% savings

Combined potential:                  → 30-50% total reduction

The exact savings depend on your project size and workflow. Larger projects see bigger gains from .claudeignore and scoping. Smaller projects benefit more from session management.

Common Mistakes

Mistake 1: Removing Useful Context

❌ Too aggressive:
   .claudeignore includes src/tests/
   → Claude can't understand test patterns
   → Writes untestable code

✅ Balanced:
   .claudeignore includes coverage/
   → Still reads test files
   → Understands testing patterns

Mistake 2: Session Fragmentation

❌ New session for every tiny change
   → Lost context between related fixes
   → Claude re-reads same files repeatedly

✅ Natural session boundaries:
   → One session per logical task
   → Related changes stay together

Mistake 3: Generic CLAUDE.md

❌ Generic content:
   "Always write clean code"
   "Use meaningful variable names"
   → Claude already knows this
   → Wastes tokens on every read

✅ Project-specific content:
   "All API routes must check rate limits before auth"
   "Database queries use the repository pattern in src/repositories/"
   → Claude might not infer this
   → Actually useful

Frequently Asked Questions

Should I optimize CLAUDE.md for token usage or helpfulness?

Helpfulness first. An extra 100 lines that prevent mistakes is worth more than token savings. But remove redundancy and generic advice.

How often should I start new sessions?

At natural breakpoints: after completing a feature, before starting a new task, when switching contexts. Not mid-task.

Does .claudeignore affect all Claude Code features?

Yes. Claude won’t read ignored files for any operation. Make sure you’re not hiding files it genuinely needs.

What’s the optimal CLAUDE.md size?

Under 200 lines is a good target. If you need more, consider whether the extra content is truly project-specific.

Can prompt caching compensate for larger CLAUDE.md?

Partially. Caching reduces the cost of repeated reads, but not the context window usage. Keep CLAUDE.md focused regardless.

Summary

Reducing token usage in Claude Code isn’t about writing less code. It’s about reading less context.

The key insights:

99.4% input ratio means your optimization target is reading, not writing
CLAUDE.md should be under 200 lines of project-specific, non-obvious information
.claudeignore prevents unnecessary file reads - use it aggressively
Session management leverages fresh starts to avoid context accumulation
Prompt caching and scoping reduce repeated reads of the same content

The strategies compound. A 60-line CLAUDE.md, well-configured .claudeignore, strategic sessions, and scoped requests together can reduce your token usage by 30-50%.

Start with .claudeignore - it’s the easiest win. Then audit your CLAUDE.md. Finally, be mindful of session boundaries and request scoping.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Claude Prompt Caching
👨‍💻 Claude Context Windows

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!