Best Practices for Minimizing Shell Output Token Usage in AI Coding Tools
Problem
When I was using Codex CLI for a complex refactoring task, my token budget got exhausted halfway through. I had barely done anything - or so I thought.
After analyzing my usage logs, I found the culprit: exec_command + write_stdin had produced 90.3% of all tool-output characters. My shell commands were eating my token budget alive.
Here’s what my typical commands looked like:
# What I was runninggit diff main...feature-branch
# The output was massive - thousands of lines# This alone consumed ~15,000 tokensI realized that every time I ran git diff, rg, or checked logs, I was dumping massive amounts of text into my AI’s context window. No wonder I kept hitting token limits.
Environment
- Codex CLI / Claude Code
- Large codebase with extensive git history
- Multiple test suites with verbose output
- Log files growing to thousands of lines
What happened?
I was working on a feature branch and needed to understand what changed. My workflow looked like this:
# Check what changedgit diff main...feature-branch
# Search for a functionrg "authenticateUser"
# Check test resultsnpm test
# View recent logstail -100 logs/app.logEach of these commands produced hundreds or thousands of lines of output. The AI had to process all of it, consuming tokens rapidly.
The core issue: I was treating AI coding tools like traditional terminals sessions. But with AI, every character in the output counts against my token budget.
How to solve it?
Attempt 1: Just cap everything with head
I tried adding | head -100 to everything:
git diff main...feature-branch | head -100rg "authenticateUser" | head -100npm test 2>&1 | head -100This helped, but I often missed critical information. The error I needed might be on line 150, not in the first 100 lines.
Attempt 2: Filter before capping
I refined my approach to filter first, then cap:
# Show only errors from testsnpm test 2>&1 | grep -E "(FAIL|ERROR)" | head -50
# Show only errors from logsgrep -E "(ERROR|FATAL)" logs/app.log | tail -50This was better - I got relevant information without the noise.
Attempt 3: The incremental narrowing pattern
The real breakthrough came when I adopted a three-phase approach:
Phase 1: Broad scan (minimal output)
# Just get file names, not contentrg -l "authenticateUser" --type tsPhase 2: Focused read (targeted sections)
# Look at specific matches with contextrg "authenticateUser" src/auth/login.ts -A 5 -B 2Phase 3: Exact slice (precise extraction)
# Get exact line numbers for editingrg "authenticateUser" src/auth/login.ts -n | head -5This pattern gave me the information I needed while consuming a fraction of the tokens.
The solution that works
I now follow these rules:
1. Cap outputs by default
# Always cap unless you need moreyour_command 2>&1 | head -100
# For filtered output, use smaller capyour_command 2>&1 | grep -E "(ERROR|FAIL)" | head -502. Use the incremental narrowing pattern
Broad Scan → Focused Read → Exact Slice ↓ ↓ ↓ File list Match context Line numbers (10 tokens) (100 tokens) (20 tokens)Instead of one 5000-token dump, I use three small queries totaling ~130 tokens.
3. Git diff optimization
# WRONG: Full diffgit diff main...feature-branch # ~15,000 tokens
# CORRECT: Summary firstgit diff --stat main...feature-branch | head -20 # ~200 tokens
# Then examine specific files if neededgit diff main...feature-branch -- src/auth/ | head -100 # ~500 tokens4. Search optimization
# WRONG: Broad searchrg "import" --type ts # Returns thousands of matches
# CORRECT: Specific search with limitsrg "import.*AuthService" --type ts -C 2 | head -50
# Even better: Count firstcount=$(rg -c "pattern" --type ts | wc -l)if [ $count -lt 20 ]; then rg "pattern" --type ts -C 2else echo "Found $count matches. Use more specific pattern."fi5. Test output filtering
Create a PreToolUse hook in ~/.claude/hooks/filter-test-output.sh:
#!/bin/bashinput=$(cat)cmd=$(echo "$input" | jq -r '.tool_input.command')
# If running tests, filter to show only failuresif [[ "$cmd" =~ ^(npm test|pytest|go test|cargo test) ]]; then filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100" echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"else echo "{}"fiThis hook automatically filters test output before it reaches the AI.
6. Configure token limits
Set environment variables to enforce hard limits:
# In your shell profileexport MAX_MCP_OUTPUT_TOKENS=50000
# Or in Claude Code settings.json{ "env": { "MAX_MCP_OUTPUT_TOKENS": "50000" }}7. Create reusable log budget commands
Add these to your shell aliases or functions:
# Default: last 50 linesalias logs='tail -50 logs/app.log'
# Escalated: last 500 lines (use sparingly)alias logs-full='tail -500 logs/app.log'
# Filtered: errors onlyalias logs-err='grep -E "(ERROR|FATAL)" logs/app.log | tail -50'The results
Here’s the token savings I achieved:
| Scenario | Before | After | Savings |
|---|---|---|---|
git diff large branch | 15,000 | 500 | 97% |
rg broad search | 8,000 | 200 | 98% |
| Test suite output | 5,000 | 300 | 94% |
| Log file dump | 10,000 | 400 | 96% |
Overall, I reduced my token consumption from shell outputs by 60-80%.
The reason this works
The key insight is that AI context windows are finite resources. When you dump 10,000 lines of output, you’re not just wasting tokens - you’re pushing useful context out of the window.
The incremental narrowing pattern works because:
- Broad scans give overview - You learn where to look without seeing everything
- Focused reads give context - You examine only what’s relevant
- Exact slices give precision - You extract just what you need
Each phase adds minimal tokens while building understanding progressively.
The 90.3% statistic makes sense when you think about it:
git diffon large changesets outputs thousands of linesrgwithout filters dumps entire codebase matches- Log files stream endlessly
- Test outputs include full details for hundreds of tests
- Build logs capture verbose compiler output
All of this goes directly into the AI’s context, consuming tokens that could be used for actual problem-solving.
Common mistakes to avoid
Mistake 1: Running cat on large files
- Fix: Use
head,tail, orgrepto extract relevant sections
Mistake 2: Broad rg without filters
- Fix: Use
--max-count,--file-type, or pipe tohead
Mistake 3: Full test output on every run
- Fix: Filter to show only failures with
grep -E "(FAIL|ERROR)"
Mistake 4: No output limits in configuration
- Fix: Set
MAX_MCP_OUTPUT_TOKENSand use PreToolUse hooks
Mistake 5: Over-filtering during investigation
- Fix: Use
--fullflag escalation when you actually need more context
AGENTS.md template
I added this section to my project’s AGENTS.md to enforce these rules:
# Shell Output Management
## Output Capping Rules
1. **Default cap: 100 lines** for all command outputs2. **Escalation path: --full flag** required for unlimited output3. **Filter priority:** errors > warnings > info > debug
## Incremental Narrowing Pattern
Phase 1: Broad discovery```bashfind . -name "*.ts" | head -20 # File list, cappedPhase 2: Targeted search
rg "pattern" --type ts -l | head -10 # Matching files, cappedPhase 3: Precise extraction
rg "pattern" file.ts -A 3 -B 1 # Context around matchGit Output Patterns
# Summary firstgit diff --stat | head -20
# Then targetedgit diff -- src/auth/ | head -100This ensures the AI follows these rules automatically when working on my project.
## Summary
In this post, I showed how shell outputs consume 90%+ of token budgets in AI coding tools and how to reduce that consumption by 60-80%. The key strategies are: cap outputs by default, use the incremental narrowing pattern, filter before capping, and configure hard token limits.
The incremental narrowing pattern is the most important: start with broad scans (file lists), move to focused reads (specific matches with context), then exact slices (precise line numbers). This gives you the information you need while consuming a fraction of the tokens.
<FinalWords reflinks={frontmatter.reflinks} />
Comments