Skip to content

Best Practices for Minimizing Shell Output Token Usage in AI Coding Tools

Problem

When I was using Codex CLI for a complex refactoring task, my token budget got exhausted halfway through. I had barely done anything - or so I thought.

After analyzing my usage logs, I found the culprit: exec_command + write_stdin had produced 90.3% of all tool-output characters. My shell commands were eating my token budget alive.

Here’s what my typical commands looked like:

Terminal window
# What I was running
git diff main...feature-branch
# The output was massive - thousands of lines
# This alone consumed ~15,000 tokens

I realized that every time I ran git diff, rg, or checked logs, I was dumping massive amounts of text into my AI’s context window. No wonder I kept hitting token limits.

Environment

  • Codex CLI / Claude Code
  • Large codebase with extensive git history
  • Multiple test suites with verbose output
  • Log files growing to thousands of lines

What happened?

I was working on a feature branch and needed to understand what changed. My workflow looked like this:

Terminal window
# Check what changed
git diff main...feature-branch
# Search for a function
rg "authenticateUser"
# Check test results
npm test
# View recent logs
tail -100 logs/app.log

Each of these commands produced hundreds or thousands of lines of output. The AI had to process all of it, consuming tokens rapidly.

The core issue: I was treating AI coding tools like traditional terminals sessions. But with AI, every character in the output counts against my token budget.

How to solve it?

Attempt 1: Just cap everything with head

I tried adding | head -100 to everything:

Terminal window
git diff main...feature-branch | head -100
rg "authenticateUser" | head -100
npm test 2>&1 | head -100

This helped, but I often missed critical information. The error I needed might be on line 150, not in the first 100 lines.

Attempt 2: Filter before capping

I refined my approach to filter first, then cap:

Terminal window
# Show only errors from tests
npm test 2>&1 | grep -E "(FAIL|ERROR)" | head -50
# Show only errors from logs
grep -E "(ERROR|FATAL)" logs/app.log | tail -50

This was better - I got relevant information without the noise.

Attempt 3: The incremental narrowing pattern

The real breakthrough came when I adopted a three-phase approach:

Phase 1: Broad scan (minimal output)

Terminal window
# Just get file names, not content
rg -l "authenticateUser" --type ts

Phase 2: Focused read (targeted sections)

Terminal window
# Look at specific matches with context
rg "authenticateUser" src/auth/login.ts -A 5 -B 2

Phase 3: Exact slice (precise extraction)

Terminal window
# Get exact line numbers for editing
rg "authenticateUser" src/auth/login.ts -n | head -5

This pattern gave me the information I needed while consuming a fraction of the tokens.

The solution that works

I now follow these rules:

1. Cap outputs by default

Terminal window
# Always cap unless you need more
your_command 2>&1 | head -100
# For filtered output, use smaller cap
your_command 2>&1 | grep -E "(ERROR|FAIL)" | head -50

2. Use the incremental narrowing pattern

Broad Scan → Focused Read → Exact Slice
↓ ↓ ↓
File list Match context Line numbers
(10 tokens) (100 tokens) (20 tokens)

Instead of one 5000-token dump, I use three small queries totaling ~130 tokens.

3. Git diff optimization

Terminal window
# WRONG: Full diff
git diff main...feature-branch # ~15,000 tokens
# CORRECT: Summary first
git diff --stat main...feature-branch | head -20 # ~200 tokens
# Then examine specific files if needed
git diff main...feature-branch -- src/auth/ | head -100 # ~500 tokens

4. Search optimization

Terminal window
# WRONG: Broad search
rg "import" --type ts # Returns thousands of matches
# CORRECT: Specific search with limits
rg "import.*AuthService" --type ts -C 2 | head -50
# Even better: Count first
count=$(rg -c "pattern" --type ts | wc -l)
if [ $count -lt 20 ]; then
rg "pattern" --type ts -C 2
else
echo "Found $count matches. Use more specific pattern."
fi

5. Test output filtering

Create a PreToolUse hook in ~/.claude/hooks/filter-test-output.sh:

#!/bin/bash
input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')
# If running tests, filter to show only failures
if [[ "$cmd" =~ ^(npm test|pytest|go test|cargo test) ]]; then
filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100"
echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"
else
echo "{}"
fi

This hook automatically filters test output before it reaches the AI.

6. Configure token limits

Set environment variables to enforce hard limits:

Terminal window
# In your shell profile
export MAX_MCP_OUTPUT_TOKENS=50000
# Or in Claude Code settings.json
{
"env": {
"MAX_MCP_OUTPUT_TOKENS": "50000"
}
}

7. Create reusable log budget commands

Add these to your shell aliases or functions:

Terminal window
# Default: last 50 lines
alias logs='tail -50 logs/app.log'
# Escalated: last 500 lines (use sparingly)
alias logs-full='tail -500 logs/app.log'
# Filtered: errors only
alias logs-err='grep -E "(ERROR|FATAL)" logs/app.log | tail -50'

The results

Here’s the token savings I achieved:

ScenarioBeforeAfterSavings
git diff large branch15,00050097%
rg broad search8,00020098%
Test suite output5,00030094%
Log file dump10,00040096%

Overall, I reduced my token consumption from shell outputs by 60-80%.

The reason this works

The key insight is that AI context windows are finite resources. When you dump 10,000 lines of output, you’re not just wasting tokens - you’re pushing useful context out of the window.

The incremental narrowing pattern works because:

  1. Broad scans give overview - You learn where to look without seeing everything
  2. Focused reads give context - You examine only what’s relevant
  3. Exact slices give precision - You extract just what you need

Each phase adds minimal tokens while building understanding progressively.

The 90.3% statistic makes sense when you think about it:

  • git diff on large changesets outputs thousands of lines
  • rg without filters dumps entire codebase matches
  • Log files stream endlessly
  • Test outputs include full details for hundreds of tests
  • Build logs capture verbose compiler output

All of this goes directly into the AI’s context, consuming tokens that could be used for actual problem-solving.

Common mistakes to avoid

Mistake 1: Running cat on large files

  • Fix: Use head, tail, or grep to extract relevant sections

Mistake 2: Broad rg without filters

  • Fix: Use --max-count, --file-type, or pipe to head

Mistake 3: Full test output on every run

  • Fix: Filter to show only failures with grep -E "(FAIL|ERROR)"

Mistake 4: No output limits in configuration

  • Fix: Set MAX_MCP_OUTPUT_TOKENS and use PreToolUse hooks

Mistake 5: Over-filtering during investigation

  • Fix: Use --full flag escalation when you actually need more context

AGENTS.md template

I added this section to my project’s AGENTS.md to enforce these rules:

# Shell Output Management
## Output Capping Rules
1. **Default cap: 100 lines** for all command outputs
2. **Escalation path: --full flag** required for unlimited output
3. **Filter priority:** errors > warnings > info > debug
## Incremental Narrowing Pattern
Phase 1: Broad discovery
```bash
find . -name "*.ts" | head -20 # File list, capped

Phase 2: Targeted search

Terminal window
rg "pattern" --type ts -l | head -10 # Matching files, capped

Phase 3: Precise extraction

Terminal window
rg "pattern" file.ts -A 3 -B 1 # Context around match

Git Output Patterns

Terminal window
# Summary first
git diff --stat | head -20
# Then targeted
git diff -- src/auth/ | head -100
This ensures the AI follows these rules automatically when working on my project.
## Summary
In this post, I showed how shell outputs consume 90%+ of token budgets in AI coding tools and how to reduce that consumption by 60-80%. The key strategies are: cap outputs by default, use the incremental narrowing pattern, filter before capping, and configure hard token limits.
The incremental narrowing pattern is the most important: start with broad scans (file lists), move to focused reads (specific matches with context), then exact slices (precise line numbers). This gives you the information you need while consuming a fraction of the tokens.
<FinalWords reflinks={frontmatter.reflinks} />

Comments