How to Implement Autonomous Iteration Loops in Claude Code for Self-Improving Automation

Mar 15, 2026

Problem

I wanted Claude Code to improve my code automatically until a measurable goal was reached. But every time I tried, I ran into the same issues:

Claude would make multiple changes at once, and I couldn’t tell which one helped
When something broke, I had no way to automatically revert
There was no systematic way to track whether metrics were actually improving
I had to manually verify every change and decide whether to keep or discard it

Here’s what my typical workflow looked like:

1. Ask Claude to improve test coverage
2. Claude makes 10 changes across 5 files
3. I run tests - some pass, some fail
4. I try to figure out which change broke what
5. I manually revert the bad changes
6. I ask Claude to try again
7. Repeat until exhausted

This was slow, error-prone, and required constant manual oversight. I needed a way to make Claude autonomously iterate with built-in verification and automatic rollback.

What I discovered

I found a Reddit discussion about implementing Karpathy’s autoresearch pattern in Claude Code. The key insight was:

“You define a goal, a metric, and a verification command … then Claude loops forever: make one atomic change -> git commit -> verify -> keep if improved, revert if not -> repeat”

This pattern creates a self-improving loop where:

Every improvement stacks (committed to git)
Every failure auto-reverts (no manual cleanup)
Progress is logged in a TSV file
The loop continues until the goal metric is achieved

How it works

The core concept is building an autonomous iteration loop with three components:

┌─────────────────────────────────────────────────────────────┐
│                  Autonomous Iteration Loop                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐            │
│   │  Make    │    │  Commit  │    │ Verify   │            │
│   │  Change  │───▶│  to Git  │───▶│  Metric  │            │
│   └──────────┘    └──────────┘    └──────────┘            │
│        ▲                                 │                 │
│        │                                 ▼                 │
│        │                         ┌───────────────┐        │
│        │                         │   Improved?    │        │
│        │                         └───────────────┘        │
│        │                            │         │           │
│        │                       YES  │         │  NO       │
│        │                            ▼         ▼           │
│        │                     ┌────────┐  ┌────────┐       │
│        │                     │  Keep  │  │ Revert │       │
│        └─────────────────────┤        │  │        │       │
│                              └────────┘  └────────┘       │
│                                                             │
│   Goal: Test coverage >= 80%                                │
│   Metric: Current coverage percentage                       │
│   Verification: npm test -- --coverage                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Component 1: The Goal

Define what success looks like. This must be specific and measurable:

- Test coverage reaches 80%
- Bundle size drops below 100KB
- Lighthouse score exceeds 90
- API response time under 200ms
- SEO score reaches 95

Component 2: The Metric

A single number you can extract programmatically:

# Test coverage
npm test -- --coverage 2>&1 | grep "All files" | awk '{print $4}'

# Bundle size
du -k dist/bundle.js | awk '{print $1}'

# Lighthouse score
npx lighthouse https://example.com --output=json | jq '.categories.performance.score'

# API response time
curl -w "%{time_total}" -o /dev/null -s https://api.example.com/endpoint

Component 3: The Verification Command

A command that runs tests or checks and exits with success/failure:

# Run tests and check coverage
npm test -- --coverage --coverageThreshold='{"global":{"lines":80}}'

# Build and check bundle size
npm run build && test $(du -k dist/bundle.js | cut -f1) -lt 100

# Run lighthouse check
npx lighthouse https://example.com --budget-path=budget.json

Implementing the loop

I created a Claude Code skill that implements this pattern:

# Autonomous Iteration Skill

## Purpose
Improve code iteratively until a goal metric is achieved.

## Parameters
- goal: Description of what success looks like
- metric_command: Shell command that outputs the current metric value
- verify_command: Shell command that exits 0 on success, non-zero on failure
- max_iterations: Maximum iterations before stopping (default: 100)

## Process
1. Read current metric value
2. Make ONE atomic change toward the goal
3. Commit change with descriptive message
4. Run verification command
5. If verification passes and metric improved:
   - Keep the change
   - Log progress to iteration-log.tsv
   - Continue to next iteration
6. If verification fails or metric worsened:
   - Revert the last commit
   - Log failure to iteration-log.tsv
   - Try a different approach
7. Repeat until goal is achieved or max_iterations reached

The skill implementation

# Autonomous Iteration Loop

You are an autonomous improvement agent. Your job is to iteratively improve code until a specific goal is achieved.

## Input Parameters
- goal: {{goal}}
- metric_command: {{metric_command}}
- verify_command: {{verify_command}}
- max_iterations: {{max_iterations}}

## Initialization
1. Create a log file: iteration-log.tsv
2. Add header: iteration | timestamp | metric | action | commit_sha
3. Run metric_command to get baseline
4. Record baseline as iteration 0

## Main Loop (repeat until goal achieved or max_iterations)

### Step 1: Analyze Current State
Run the metric command and verification:
```bash
current_metric=$( {{metric_command}} )
{{verify_command}}
verify_result=$?

Step 2: Check Goal

If goal is achieved:

Log success
Report final results
STOP

Step 3: Make One Atomic Change

Analyze what needs improvement
Make exactly ONE focused change
Do NOT make multiple changes at once

Step 4: Commit Change

git add -A
git commit -m "iteration: [describe single change]"
current_sha=$(git rev-parse HEAD)

Step 5: Verify

Run verification command:

{{verify_command}}

Step 6: Evaluate Results

Run metric command again:

new_metric=$( {{metric_command}} )

If verification passed AND new_metric > current_metric:

# Keep the change
echo "{{iteration}} | $(date -Iseconds) | $new_metric | kept | $current_sha" >> iteration-log.tsv

Else:

# Revert the change
git reset --hard HEAD~1
echo "{{iteration}} | $(date -Iseconds) | $current_metric | reverted | N/A" >> iteration-log.tsv

Step 7: Continue

Increment iteration counter and repeat from Step 1.

Important Rules

Always make ONE change per iteration
Always commit before verifying
Always revert if verification fails or metric worsens
Always log each iteration
Stop when goal is achieved or max_iterations reached

## Real-world example

I used this pattern to improve test coverage in a Node.js project:

```bash title="Running the iteration skill"
# Initial state: 45% test coverage
# Goal: 80% test coverage

# Invoke the skill
/iterate \
  goal="Test coverage >= 80%" \
  metric_command="npm test -- --coverage 2>&1 | grep 'All files' | awk '{print $4}' | tr -d '%'" \
  verify_command="npm test" \
  max_iterations=50

The skill ran through iterations:

iteration | timestamp           | metric | action   | commit_sha
0         | 2026-03-15T10:00:00 | 45     | baseline | N/A
1         | 2026-03-15T10:02:15 | 47     | kept     | a1b2c3d
2         | 2026-03-15T10:04:30 | 52     | kept     | e4f5g6h
3         | 2026-03-15T10:06:45 | 48     | reverted| N/A
4         | 2026-03-15T10:09:00 | 55     | kept     | i7j8k9l
5         | 2026-03-15T10:11:15 | 58     | kept     | m2n3o4p
...
23        | 2026-03-15T11:45:00 | 81     | kept     | x9y0z1a

Each iteration:

Made one atomic change (adding tests for a single function)
Committed the change
Ran verification (all tests must pass)
Checked metric (coverage percentage)
Kept or reverted based on results

After 23 iterations, coverage reached 81% and the goal was achieved.

What can go wrong

Mistake 1: Making multiple changes at once

# WRONG: Multiple changes in one iteration
- Add test for function A
- Refactor function B
- Update config for function C
- Commit all together

If something breaks, you don't know which change caused it.

The fix is to enforce one atomic change per iteration:

# CORRECT: One change per iteration
Iteration 1: Add test for function A -> verify -> commit/revert
Iteration 2: Add test for function B -> verify -> commit/revert
Iteration 3: Add test for function C -> verify -> commit/revert

Mistake 2: Not committing before verification

# WRONG: Verify then commit
- Make change
- Run verification
- If pass, then commit

Problem: If you forget to commit, you lose track of what worked.

The fix is to always commit first:

# CORRECT: Commit then verify
- Make change
- Commit immediately
- Run verification
- If fail, revert commit

Mistake 3: Metric that doesn’t match the goal

# WRONG: Mismatched metric and goal
Goal: Improve performance
Metric: Lines of code

Adding more code doesn't necessarily improve performance.

The fix is to ensure metric directly measures the goal:

# CORRECT: Aligned metric and goal
Goal: Improve performance
Metric: API response time in milliseconds

Mistake 4: No maximum iterations

# WRONG: Infinite loop potential
The skill runs until goal is achieved, but what if the goal is impossible?

The fix is to always set a maximum:

# CORRECT: Safety limit
max_iterations=100

Advanced patterns

Pattern 1: Multiple metrics

You can track multiple metrics and only keep changes that improve ALL of them:

metrics:
  - coverage: npm test -- --coverage 2>&1 | grep 'All files' | awk '{print $4}'
  - bundle_size: du -k dist/bundle.js | awk '{print $1}'
  - lint_errors: npm run lint 2>&1 | grep -c "error"

keep_if:
  coverage: increased
  bundle_size: decreased_or_same
  lint_errors: same_or_decreased

Pattern 2: Database-driven loops

Combine with MCP servers to create loops that respond to real data:

metric_command: |
  psql -c "SELECT AVG(response_time) FROM api_logs WHERE created_at > NOW() - INTERVAL '1 hour'" -t

goal: "Average API response time < 200ms"

# Each iteration can:
# 1. Query database for slow endpoints
# 2. Optimize one query or add one index
# 3. Measure impact on average response time
# 4. Keep or revert based on results

Pattern 3: Analytics-driven loops

Use analytics APIs to drive improvements:

metric_command: |
  curl -s "https://analytics.example.com/api/lighthouse?url=mysite.com" | jq '.performance'

goal: "Lighthouse performance score >= 90"

# Each iteration can:
# 1. Analyze Lighthouse report
# 2. Make one optimization
# 3. Re-run Lighthouse
# 4. Keep or revert based on score change

Proven use cases

Based on the Reddit discussion, this pattern works well for:

Test coverage improvement: Add tests one function at a time, measure coverage
Bundle size optimization: Remove one unused import/dependency at a time, measure size
Lighthouse score enhancement: Make one performance fix at a time, measure score
API response time reduction: Optimize one query at a time, measure latency
SEO score optimization: Fix one issue at a time, measure score

The key insight is that these are all measurable goals where:

You can make incremental progress
Each change can be independently verified
Metrics provide clear feedback on whether change helped

Summary

In this post, I showed how to implement autonomous iteration loops in Claude Code. The key points are:

Define a clear, measurable goal
Create a metric command that outputs a single number
Create a verification command that passes or fails
Loop: make one change -> commit -> verify -> keep or revert
Log each iteration for tracking

This pattern turns Claude Code into an autonomous agent that can systematically improve code without constant manual oversight. Every improvement is preserved in git history, and every failure is automatically rolled back.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Claude Code skill applying Karpathy's autoresearch

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!