How to Implement Autonomous Iteration Loops in Claude Code for Self-Improving Automation
Problem
I wanted Claude Code to improve my code automatically until a measurable goal was reached. But every time I tried, I ran into the same issues:
- Claude would make multiple changes at once, and I couldn’t tell which one helped
- When something broke, I had no way to automatically revert
- There was no systematic way to track whether metrics were actually improving
- I had to manually verify every change and decide whether to keep or discard it
Here’s what my typical workflow looked like:
1. Ask Claude to improve test coverage2. Claude makes 10 changes across 5 files3. I run tests - some pass, some fail4. I try to figure out which change broke what5. I manually revert the bad changes6. I ask Claude to try again7. Repeat until exhaustedThis was slow, error-prone, and required constant manual oversight. I needed a way to make Claude autonomously iterate with built-in verification and automatic rollback.
What I discovered
I found a Reddit discussion about implementing Karpathy’s autoresearch pattern in Claude Code. The key insight was:
“You define a goal, a metric, and a verification command … then Claude loops forever: make one atomic change -> git commit -> verify -> keep if improved, revert if not -> repeat”
This pattern creates a self-improving loop where:
- Every improvement stacks (committed to git)
- Every failure auto-reverts (no manual cleanup)
- Progress is logged in a TSV file
- The loop continues until the goal metric is achieved
How it works
The core concept is building an autonomous iteration loop with three components:
┌─────────────────────────────────────────────────────────────┐│ Autonomous Iteration Loop │├─────────────────────────────────────────────────────────────┤│ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Make │ │ Commit │ │ Verify │ ││ │ Change │───▶│ to Git │───▶│ Metric │ ││ └──────────┘ └──────────┘ └──────────┘ ││ ▲ │ ││ │ ▼ ││ │ ┌───────────────┐ ││ │ │ Improved? │ ││ │ └───────────────┘ ││ │ │ │ ││ │ YES │ │ NO ││ │ ▼ ▼ ││ │ ┌────────┐ ┌────────┐ ││ │ │ Keep │ │ Revert │ ││ └─────────────────────┤ │ │ │ ││ └────────┘ └────────┘ ││ ││ Goal: Test coverage >= 80% ││ Metric: Current coverage percentage ││ Verification: npm test -- --coverage ││ │└─────────────────────────────────────────────────────────────┘Component 1: The Goal
Define what success looks like. This must be specific and measurable:
- Test coverage reaches 80%- Bundle size drops below 100KB- Lighthouse score exceeds 90- API response time under 200ms- SEO score reaches 95Component 2: The Metric
A single number you can extract programmatically:
# Test coveragenpm test -- --coverage 2>&1 | grep "All files" | awk '{print $4}'
# Bundle sizedu -k dist/bundle.js | awk '{print $1}'
# Lighthouse scorenpx lighthouse https://example.com --output=json | jq '.categories.performance.score'
# API response timecurl -w "%{time_total}" -o /dev/null -s https://api.example.com/endpointComponent 3: The Verification Command
A command that runs tests or checks and exits with success/failure:
# Run tests and check coveragenpm test -- --coverage --coverageThreshold='{"global":{"lines":80}}'
# Build and check bundle sizenpm run build && test $(du -k dist/bundle.js | cut -f1) -lt 100
# Run lighthouse checknpx lighthouse https://example.com --budget-path=budget.jsonImplementing the loop
I created a Claude Code skill that implements this pattern:
# Autonomous Iteration Skill
## PurposeImprove code iteratively until a goal metric is achieved.
## Parameters- goal: Description of what success looks like- metric_command: Shell command that outputs the current metric value- verify_command: Shell command that exits 0 on success, non-zero on failure- max_iterations: Maximum iterations before stopping (default: 100)
## Process1. Read current metric value2. Make ONE atomic change toward the goal3. Commit change with descriptive message4. Run verification command5. If verification passes and metric improved: - Keep the change - Log progress to iteration-log.tsv - Continue to next iteration6. If verification fails or metric worsened: - Revert the last commit - Log failure to iteration-log.tsv - Try a different approach7. Repeat until goal is achieved or max_iterations reachedThe skill implementation
# Autonomous Iteration Loop
You are an autonomous improvement agent. Your job is to iteratively improve code until a specific goal is achieved.
## Input Parameters- goal: {{goal}}- metric_command: {{metric_command}}- verify_command: {{verify_command}}- max_iterations: {{max_iterations}}
## Initialization1. Create a log file: iteration-log.tsv2. Add header: iteration | timestamp | metric | action | commit_sha3. Run metric_command to get baseline4. Record baseline as iteration 0
## Main Loop (repeat until goal achieved or max_iterations)
### Step 1: Analyze Current StateRun the metric command and verification:```bashcurrent_metric=$( {{metric_command}} ){{verify_command}}verify_result=$?Step 2: Check Goal
If goal is achieved:
- Log success
- Report final results
- STOP
Step 3: Make One Atomic Change
- Analyze what needs improvement
- Make exactly ONE focused change
- Do NOT make multiple changes at once
Step 4: Commit Change
git add -Agit commit -m "iteration: [describe single change]"current_sha=$(git rev-parse HEAD)Step 5: Verify
Run verification command:
{{verify_command}}Step 6: Evaluate Results
Run metric command again:
new_metric=$( {{metric_command}} )If verification passed AND new_metric > current_metric:
# Keep the changeecho "{{iteration}} | $(date -Iseconds) | $new_metric | kept | $current_sha" >> iteration-log.tsvElse:
# Revert the changegit reset --hard HEAD~1echo "{{iteration}} | $(date -Iseconds) | $current_metric | reverted | N/A" >> iteration-log.tsvStep 7: Continue
Increment iteration counter and repeat from Step 1.
Important Rules
- Always make ONE change per iteration
- Always commit before verifying
- Always revert if verification fails or metric worsens
- Always log each iteration
- Stop when goal is achieved or max_iterations reached
## Real-world example
I used this pattern to improve test coverage in a Node.js project:
```bash title="Running the iteration skill"# Initial state: 45% test coverage# Goal: 80% test coverage
# Invoke the skill/iterate \ goal="Test coverage >= 80%" \ metric_command="npm test -- --coverage 2>&1 | grep 'All files' | awk '{print $4}' | tr -d '%'" \ verify_command="npm test" \ max_iterations=50The skill ran through iterations:
iteration | timestamp | metric | action | commit_sha0 | 2026-03-15T10:00:00 | 45 | baseline | N/A1 | 2026-03-15T10:02:15 | 47 | kept | a1b2c3d2 | 2026-03-15T10:04:30 | 52 | kept | e4f5g6h3 | 2026-03-15T10:06:45 | 48 | reverted| N/A4 | 2026-03-15T10:09:00 | 55 | kept | i7j8k9l5 | 2026-03-15T10:11:15 | 58 | kept | m2n3o4p...23 | 2026-03-15T11:45:00 | 81 | kept | x9y0z1aEach iteration:
- Made one atomic change (adding tests for a single function)
- Committed the change
- Ran verification (all tests must pass)
- Checked metric (coverage percentage)
- Kept or reverted based on results
After 23 iterations, coverage reached 81% and the goal was achieved.
What can go wrong
Mistake 1: Making multiple changes at once
# WRONG: Multiple changes in one iteration- Add test for function A- Refactor function B- Update config for function C- Commit all together
If something breaks, you don't know which change caused it.The fix is to enforce one atomic change per iteration:
# CORRECT: One change per iterationIteration 1: Add test for function A -> verify -> commit/revertIteration 2: Add test for function B -> verify -> commit/revertIteration 3: Add test for function C -> verify -> commit/revertMistake 2: Not committing before verification
# WRONG: Verify then commit- Make change- Run verification- If pass, then commit
Problem: If you forget to commit, you lose track of what worked.The fix is to always commit first:
# CORRECT: Commit then verify- Make change- Commit immediately- Run verification- If fail, revert commitMistake 3: Metric that doesn’t match the goal
# WRONG: Mismatched metric and goalGoal: Improve performanceMetric: Lines of code
Adding more code doesn't necessarily improve performance.The fix is to ensure metric directly measures the goal:
# CORRECT: Aligned metric and goalGoal: Improve performanceMetric: API response time in millisecondsMistake 4: No maximum iterations
# WRONG: Infinite loop potentialThe skill runs until goal is achieved, but what if the goal is impossible?The fix is to always set a maximum:
# CORRECT: Safety limitmax_iterations=100Advanced patterns
Pattern 1: Multiple metrics
You can track multiple metrics and only keep changes that improve ALL of them:
metrics: - coverage: npm test -- --coverage 2>&1 | grep 'All files' | awk '{print $4}' - bundle_size: du -k dist/bundle.js | awk '{print $1}' - lint_errors: npm run lint 2>&1 | grep -c "error"
keep_if: coverage: increased bundle_size: decreased_or_same lint_errors: same_or_decreasedPattern 2: Database-driven loops
Combine with MCP servers to create loops that respond to real data:
metric_command: | psql -c "SELECT AVG(response_time) FROM api_logs WHERE created_at > NOW() - INTERVAL '1 hour'" -t
goal: "Average API response time < 200ms"
# Each iteration can:# 1. Query database for slow endpoints# 2. Optimize one query or add one index# 3. Measure impact on average response time# 4. Keep or revert based on resultsPattern 3: Analytics-driven loops
Use analytics APIs to drive improvements:
metric_command: | curl -s "https://analytics.example.com/api/lighthouse?url=mysite.com" | jq '.performance'
goal: "Lighthouse performance score >= 90"
# Each iteration can:# 1. Analyze Lighthouse report# 2. Make one optimization# 3. Re-run Lighthouse# 4. Keep or revert based on score changeProven use cases
Based on the Reddit discussion, this pattern works well for:
- Test coverage improvement: Add tests one function at a time, measure coverage
- Bundle size optimization: Remove one unused import/dependency at a time, measure size
- Lighthouse score enhancement: Make one performance fix at a time, measure score
- API response time reduction: Optimize one query at a time, measure latency
- SEO score optimization: Fix one issue at a time, measure score
The key insight is that these are all measurable goals where:
- You can make incremental progress
- Each change can be independently verified
- Metrics provide clear feedback on whether change helped
Summary
In this post, I showed how to implement autonomous iteration loops in Claude Code. The key points are:
- Define a clear, measurable goal
- Create a metric command that outputs a single number
- Create a verification command that passes or fails
- Loop: make one change -> commit -> verify -> keep or revert
- Log each iteration for tracking
This pattern turns Claude Code into an autonomous agent that can systematically improve code without constant manual oversight. Every improvement is preserved in git history, and every failure is automatically rolled back.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments