Skip to content

Build Automated Verification Loops for AI Coding: Near-Zero Cost Error Detection

I was pairing with an AI assistant on a feature implementation. After it finished coding, I ran the tests manually, saw three failures, copied the error messages, pasted them back to the AI, waited for it to fix, ran tests again, and repeated this cycle five times.

That’s when it hit me: I was the bottleneck in my own AI coding workflow.

The Problem

Manual verification with AI coding looks like this:

AI writes code → I run tests → I copy errors → I paste to AI → AI fixes → repeat

Every error requires my manual intervention. The feedback loop is slow. I’m essentially acting as a slow, error-prone messenger between the test runner and the AI.

The worst part? This is considered “normal” in most AI coding workflows.

But here’s the thing: if you need to manually copy-paste errors to the AI, the verification loop isn’t built yet.

First Attempt: Run Tests After Implementation

My first attempt was simple. I asked the AI to run tests immediately after implementing code.

Terminal window
npm test && npm run typecheck

The AI would implement the feature, then run this command and see the failures directly.

But there was a problem. The test output was enormous:

PASS src/utils/parser.test.ts
PASS src/utils/formatter.test.ts
PASS src/services/api.test.ts
PASS src/components/Button.test.tsx
... (50 more passing tests)
FAIL src/services/user.test.ts
✕ should validate email format
✕ should handle missing fields
PASS src/utils/date.test.ts
PASS src/hooks/useAuth.test.ts
... (30 more passing tests)

The AI had to scroll through 80 passing tests to find the 2 failures. In a codebase with hundreds of tests, the passing tests would consume most of the context window before the AI even saw the failures.

This approach flooded the context with irrelevant information.

Second Attempt: Filter to Failures Only

I needed to show only what mattered: the failures.

Terminal window
npm test 2>&1 | grep -A5 "FAIL\|Error"

This filtered the output to only show lines containing “FAIL” or “Error” with 5 lines of context.

Better, but not good enough. Some test frameworks output failures differently. And type errors from TypeScript don’t match this pattern. I needed a more robust solution.

Third Attempt: Structured Verification Script

I wrote a verification script that ran tests, typecheck, and lint, then filtered the combined output:

verify.sh
#!/bin/bash
# Run all checks
test_output=$(npm test 2>&1)
typecheck_output=$(npm run typecheck 2>&1)
lint_output=$(npm run lint 2>&1)
# Extract failures
failures=$(echo -e "$test_output\n$typecheck_output\n$lint_output" | grep -B2 -A10 "FAIL\|error\|Error")
if [ -n "$failures" ]; then
echo "VERIFICATION FAILED:"
echo "$failures"
exit 1
fi
echo "All checks passed"

Now the AI could run ./verify.sh and see only the failures. This was better, but still required the AI to manually run the script.

I was still in the loop. The AI wasn’t automatically seeing the failures after each edit.

Fourth Attempt: Hook-Based Verification

What if verification happened automatically after every file edit?

I configured a post-tool-use hook:

{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "./verify.sh"
}
]
}
]
}
}

Now every time the AI edited a file, verification ran automatically. The hook filtered the output and only showed failures.

But there was a new problem: noise.

Every edit triggered verification. If I made 10 small edits to fix one issue, the hook ran 10 times. The first 9 runs would show the same failures. This wasted time and context.

Fifth Attempt: Intelligent Filtering

I added a debounce mechanism and smart filtering:

verify-smart.sh
#!/bin/bash
# Debounce: only run if no edits in last 2 seconds
sleep 2
# Run verification
output=$(npm test -- --reporter=verbose 2>&1; npm run typecheck 2>&1)
# Filter to failures, max 50 lines
failures=$(echo "$output" | grep -B2 -A10 "FAIL\|error TS" | head -50)
if [ -n "$failures" ]; then
echo "$failures"
exit 1
fi

The hook configuration also changed to only trigger on relevant file types:

{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "./verify-smart.sh",
"timeout": 30000
}
]
}
]
}
}

This was close to what I wanted, but still not fully automated.

The Solution: AI Writes Tests During Implementation

The key insight: tests shouldn’t be an afterthought.

Instead of:

  1. AI implements feature
  2. Run tests (failures appear)
  3. AI fixes

I changed my workflow to:

  1. AI implements feature AND writes tests together
  2. Verification runs automatically
  3. AI sees failures and fixes
┌─────────────────────────────────────────────────────────┐
│ AI CODING LOOP │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Write │───▶│ Verify │───▶│ Fix │ │
│ │ Code+Test│ │ Auto │ │ Failures │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ▲ │ │
│ └─────────────────────────────────┘ │
│ │
│ Human input: only the initial task brief │
└─────────────────────────────────────────────────────────┘

I started writing task briefs with explicit verification sections:

## Task: Implement user authentication
### Implementation
- Create AuthService class with login/logout methods
- Add session management with JWT tokens
- Implement password hashing with bcrypt
### Tests (write alongside implementation)
- Unit tests for AuthService methods
- Test invalid credentials handling
- Test token expiration
### Verification
- Run: `npm test`
- Typecheck: `npm run typecheck`
- Filter: Show only failures
- AI should see failures directly, iterate to fix

The AI now writes tests as part of the implementation, not after. Verification runs automatically. Failures are filtered. The loop is closed.

Why This Matters

Near-zero verification cost means you can iterate rapidly.

Before:

  • Implement feature: 2 minutes
  • Run tests manually: 30 seconds
  • Copy-paste errors: 1 minute
  • AI fixes: 1 minute
  • Repeat 3-5 times: 5-10 minutes total

After:

  • Implement feature with tests: 2 minutes
  • Verification runs automatically: 0 seconds (parallel)
  • AI sees filtered failures immediately: 0 seconds
  • AI fixes: 1 minute
  • Total: 3 minutes

The speedup isn’t just time. It’s cognitive load. You don’t need to context-switch between coding and verification. The AI handles both.

Common Mistakes

I’ve made all of these:

Mistake 1: Running tests manually after AI finishes

This breaks the flow. The AI should run tests, not you.

Mistake 2: Dumping entire test output into context

100 passing tests consume tokens and hide the 2 failures that matter.

Mistake 3: Tests as afterthought

If tests are written after implementation, the verification loop starts too late.

Mistake 4: Skipping type checking

TypeScript catches issues tests miss. Always include typecheck in verification.

The Litmus Test

Ask yourself: “If I need to manually copy-paste errors to the AI, is the verification loop built?”

No. The loop isn’t finished until the AI sees failures directly and fixes them without human mediation.

This pattern applies beyond testing:

  • Linting: Auto-fix style issues without human review
  • Type checking: Fix type errors before they become runtime errors
  • Security scanning: Catch vulnerabilities during development, not in production
  • Performance profiling: Detect regressions during feature work

The principle is the same: automate the feedback loop, remove the human messenger.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments