Best Practices for Minimizing Shell Output Token Usage in AI Coding Tools

Mar 2, 2026

Problem

When I was using Codex CLI for a complex refactoring task, my token budget got exhausted halfway through. I had barely done anything - or so I thought.

After analyzing my usage logs, I found the culprit: exec_command + write_stdin had produced 90.3% of all tool-output characters. My shell commands were eating my token budget alive.

Here’s what my typical commands looked like:

# What I was running
git diff main...feature-branch

# The output was massive - thousands of lines
# This alone consumed ~15,000 tokens

I realized that every time I ran git diff, rg, or checked logs, I was dumping massive amounts of text into my AI’s context window. No wonder I kept hitting token limits.

Environment

Codex CLI / Claude Code
Large codebase with extensive git history
Multiple test suites with verbose output
Log files growing to thousands of lines

What happened?

I was working on a feature branch and needed to understand what changed. My workflow looked like this:

# Check what changed
git diff main...feature-branch

# Search for a function
rg "authenticateUser"

# Check test results
npm test

# View recent logs
tail -100 logs/app.log

Each of these commands produced hundreds or thousands of lines of output. The AI had to process all of it, consuming tokens rapidly.

The core issue: I was treating AI coding tools like traditional terminals sessions. But with AI, every character in the output counts against my token budget.

How to solve it?

Attempt 1: Just cap everything with head

I tried adding | head -100 to everything:

git diff main...feature-branch | head -100
rg "authenticateUser" | head -100
npm test 2>&1 | head -100

This helped, but I often missed critical information. The error I needed might be on line 150, not in the first 100 lines.

Attempt 2: Filter before capping

I refined my approach to filter first, then cap:

# Show only errors from tests
npm test 2>&1 | grep -E "(FAIL|ERROR)" | head -50

# Show only errors from logs
grep -E "(ERROR|FATAL)" logs/app.log | tail -50

This was better - I got relevant information without the noise.

Attempt 3: The incremental narrowing pattern

The real breakthrough came when I adopted a three-phase approach:

Phase 1: Broad scan (minimal output)

# Just get file names, not content
rg -l "authenticateUser" --type ts

Phase 2: Focused read (targeted sections)

# Look at specific matches with context
rg "authenticateUser" src/auth/login.ts -A 5 -B 2

Phase 3: Exact slice (precise extraction)

# Get exact line numbers for editing
rg "authenticateUser" src/auth/login.ts -n | head -5

This pattern gave me the information I needed while consuming a fraction of the tokens.

The solution that works

I now follow these rules:

1. Cap outputs by default

# Always cap unless you need more
your_command 2>&1 | head -100

# For filtered output, use smaller cap
your_command 2>&1 | grep -E "(ERROR|FAIL)" | head -50

2. Use the incremental narrowing pattern

Broad Scan → Focused Read → Exact Slice
    ↓              ↓             ↓
  File list    Match context   Line numbers
  (10 tokens)   (100 tokens)   (20 tokens)

Instead of one 5000-token dump, I use three small queries totaling ~130 tokens.

3. Git diff optimization

# WRONG: Full diff
git diff main...feature-branch  # ~15,000 tokens

# CORRECT: Summary first
git diff --stat main...feature-branch | head -20  # ~200 tokens

# Then examine specific files if needed
git diff main...feature-branch -- src/auth/ | head -100  # ~500 tokens

4. Search optimization

# WRONG: Broad search
rg "import" --type ts  # Returns thousands of matches

# CORRECT: Specific search with limits
rg "import.*AuthService" --type ts -C 2 | head -50

# Even better: Count first
count=$(rg -c "pattern" --type ts | wc -l)
if [ $count -lt 20 ]; then
  rg "pattern" --type ts -C 2
else
  echo "Found $count matches. Use more specific pattern."
fi

5. Test output filtering

Create a PreToolUse hook in ~/.claude/hooks/filter-test-output.sh:

#!/bin/bash
input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')

# If running tests, filter to show only failures
if [[ "$cmd" =~ ^(npm test|pytest|go test|cargo test) ]]; then
  filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100"
  echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"
else
  echo "{}"
fi

This hook automatically filters test output before it reaches the AI.

6. Configure token limits

Set environment variables to enforce hard limits:

# In your shell profile
export MAX_MCP_OUTPUT_TOKENS=50000

# Or in Claude Code settings.json

{
  "env": {
    "MAX_MCP_OUTPUT_TOKENS": "50000"
  }
}

7. Create reusable log budget commands

Add these to your shell aliases or functions:

# Default: last 50 lines
alias logs='tail -50 logs/app.log'

# Escalated: last 500 lines (use sparingly)
alias logs-full='tail -500 logs/app.log'

# Filtered: errors only
alias logs-err='grep -E "(ERROR|FATAL)" logs/app.log | tail -50'

The results

Here’s the token savings I achieved:

Scenario	Before	After	Savings
`git diff` large branch	15,000	500	97%
`rg` broad search	8,000	200	98%
Test suite output	5,000	300	94%
Log file dump	10,000	400	96%

Overall, I reduced my token consumption from shell outputs by 60-80%.

The reason this works

The key insight is that AI context windows are finite resources. When you dump 10,000 lines of output, you’re not just wasting tokens - you’re pushing useful context out of the window.

The incremental narrowing pattern works because:

Broad scans give overview - You learn where to look without seeing everything
Focused reads give context - You examine only what’s relevant
Exact slices give precision - You extract just what you need

Each phase adds minimal tokens while building understanding progressively.

The 90.3% statistic makes sense when you think about it:

git diff on large changesets outputs thousands of lines
rg without filters dumps entire codebase matches
Log files stream endlessly
Test outputs include full details for hundreds of tests
Build logs capture verbose compiler output

All of this goes directly into the AI’s context, consuming tokens that could be used for actual problem-solving.

Common mistakes to avoid

Mistake 1: Running cat on large files

Fix: Use head, tail, or grep to extract relevant sections

Mistake 2: Broad rg without filters

Fix: Use --max-count, --file-type, or pipe to head

Mistake 3: Full test output on every run

Fix: Filter to show only failures with grep -E "(FAIL|ERROR)"

Mistake 4: No output limits in configuration

Fix: Set MAX_MCP_OUTPUT_TOKENS and use PreToolUse hooks

Mistake 5: Over-filtering during investigation

Fix: Use --full flag escalation when you actually need more context

AGENTS.md template

I added this section to my project’s AGENTS.md to enforce these rules:

# Shell Output Management

## Output Capping Rules

1. **Default cap: 100 lines** for all command outputs
2. **Escalation path: --full flag** required for unlimited output
3. **Filter priority:** errors > warnings > info > debug

## Incremental Narrowing Pattern

Phase 1: Broad discovery
```bash
find . -name "*.ts" | head -20           # File list, capped

Phase 2: Targeted search

rg "pattern" --type ts -l | head -10     # Matching files, capped

Phase 3: Precise extraction

rg "pattern" file.ts -A 3 -B 1           # Context around match

Git Output Patterns

# Summary first
git diff --stat | head -20

# Then targeted
git diff -- src/auth/ | head -100

This ensures the AI follows these rules automatically when working on my project.

## Summary

In this post, I showed how shell outputs consume 90%+ of token budgets in AI coding tools and how to reduce that consumption by 60-80%. The key strategies are: cap outputs by default, use the incremental narrowing pattern, filter before capping, and configure hard token limits.

The incremental narrowing pattern is the most important: start with broad scans (file lists), move to focused reads (specific matches with context), then exact slices (precise line numbers). This gives you the information you need while consuming a fraction of the tokens.

<FinalWords reflinks={frontmatter.reflinks} />