Build Automated Verification Loops for AI Coding: Near-Zero Cost Error Detection
I was pairing with an AI assistant on a feature implementation. After it finished coding, I ran the tests manually, saw three failures, copied the error messages, pasted them back to the AI, waited for it to fix, ran tests again, and repeated this cycle five times.
That’s when it hit me: I was the bottleneck in my own AI coding workflow.
The Problem
Manual verification with AI coding looks like this:
AI writes code → I run tests → I copy errors → I paste to AI → AI fixes → repeatEvery error requires my manual intervention. The feedback loop is slow. I’m essentially acting as a slow, error-prone messenger between the test runner and the AI.
The worst part? This is considered “normal” in most AI coding workflows.
But here’s the thing: if you need to manually copy-paste errors to the AI, the verification loop isn’t built yet.
First Attempt: Run Tests After Implementation
My first attempt was simple. I asked the AI to run tests immediately after implementing code.
npm test && npm run typecheckThe AI would implement the feature, then run this command and see the failures directly.
But there was a problem. The test output was enormous:
PASS src/utils/parser.test.tsPASS src/utils/formatter.test.tsPASS src/services/api.test.tsPASS src/components/Button.test.tsx... (50 more passing tests)FAIL src/services/user.test.ts ✕ should validate email format ✕ should handle missing fieldsPASS src/utils/date.test.tsPASS src/hooks/useAuth.test.ts... (30 more passing tests)The AI had to scroll through 80 passing tests to find the 2 failures. In a codebase with hundreds of tests, the passing tests would consume most of the context window before the AI even saw the failures.
This approach flooded the context with irrelevant information.
Second Attempt: Filter to Failures Only
I needed to show only what mattered: the failures.
npm test 2>&1 | grep -A5 "FAIL\|Error"This filtered the output to only show lines containing “FAIL” or “Error” with 5 lines of context.
Better, but not good enough. Some test frameworks output failures differently. And type errors from TypeScript don’t match this pattern. I needed a more robust solution.
Third Attempt: Structured Verification Script
I wrote a verification script that ran tests, typecheck, and lint, then filtered the combined output:
#!/bin/bash# Run all checkstest_output=$(npm test 2>&1)typecheck_output=$(npm run typecheck 2>&1)lint_output=$(npm run lint 2>&1)
# Extract failuresfailures=$(echo -e "$test_output\n$typecheck_output\n$lint_output" | grep -B2 -A10 "FAIL\|error\|Error")
if [ -n "$failures" ]; then echo "VERIFICATION FAILED:" echo "$failures" exit 1fi
echo "All checks passed"Now the AI could run ./verify.sh and see only the failures. This was better, but still required the AI to manually run the script.
I was still in the loop. The AI wasn’t automatically seeing the failures after each edit.
Fourth Attempt: Hook-Based Verification
What if verification happened automatically after every file edit?
I configured a post-tool-use hook:
{ "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "./verify.sh" } ] } ] }}Now every time the AI edited a file, verification ran automatically. The hook filtered the output and only showed failures.
But there was a new problem: noise.
Every edit triggered verification. If I made 10 small edits to fix one issue, the hook ran 10 times. The first 9 runs would show the same failures. This wasted time and context.
Fifth Attempt: Intelligent Filtering
I added a debounce mechanism and smart filtering:
#!/bin/bash# Debounce: only run if no edits in last 2 secondssleep 2
# Run verificationoutput=$(npm test -- --reporter=verbose 2>&1; npm run typecheck 2>&1)
# Filter to failures, max 50 linesfailures=$(echo "$output" | grep -B2 -A10 "FAIL\|error TS" | head -50)
if [ -n "$failures" ]; then echo "$failures" exit 1fiThe hook configuration also changed to only trigger on relevant file types:
{ "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "./verify-smart.sh", "timeout": 30000 } ] } ] }}This was close to what I wanted, but still not fully automated.
The Solution: AI Writes Tests During Implementation
The key insight: tests shouldn’t be an afterthought.
Instead of:
- AI implements feature
- Run tests (failures appear)
- AI fixes
I changed my workflow to:
- AI implements feature AND writes tests together
- Verification runs automatically
- AI sees failures and fixes
┌─────────────────────────────────────────────────────────┐│ AI CODING LOOP ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Write │───▶│ Verify │───▶│ Fix │ ││ │ Code+Test│ │ Auto │ │ Failures │ ││ └──────────┘ └──────────┘ └──────────┘ ││ ▲ │ ││ └─────────────────────────────────┘ ││ ││ Human input: only the initial task brief │└─────────────────────────────────────────────────────────┘I started writing task briefs with explicit verification sections:
## Task: Implement user authentication
### Implementation- Create AuthService class with login/logout methods- Add session management with JWT tokens- Implement password hashing with bcrypt
### Tests (write alongside implementation)- Unit tests for AuthService methods- Test invalid credentials handling- Test token expiration
### Verification- Run: `npm test`- Typecheck: `npm run typecheck`- Filter: Show only failures- AI should see failures directly, iterate to fixThe AI now writes tests as part of the implementation, not after. Verification runs automatically. Failures are filtered. The loop is closed.
Why This Matters
Near-zero verification cost means you can iterate rapidly.
Before:
- Implement feature: 2 minutes
- Run tests manually: 30 seconds
- Copy-paste errors: 1 minute
- AI fixes: 1 minute
- Repeat 3-5 times: 5-10 minutes total
After:
- Implement feature with tests: 2 minutes
- Verification runs automatically: 0 seconds (parallel)
- AI sees filtered failures immediately: 0 seconds
- AI fixes: 1 minute
- Total: 3 minutes
The speedup isn’t just time. It’s cognitive load. You don’t need to context-switch between coding and verification. The AI handles both.
Common Mistakes
I’ve made all of these:
Mistake 1: Running tests manually after AI finishes
This breaks the flow. The AI should run tests, not you.
Mistake 2: Dumping entire test output into context
100 passing tests consume tokens and hide the 2 failures that matter.
Mistake 3: Tests as afterthought
If tests are written after implementation, the verification loop starts too late.
Mistake 4: Skipping type checking
TypeScript catches issues tests miss. Always include typecheck in verification.
The Litmus Test
Ask yourself: “If I need to manually copy-paste errors to the AI, is the verification loop built?”
No. The loop isn’t finished until the AI sees failures directly and fixes them without human mediation.
Related Knowledge
This pattern applies beyond testing:
- Linting: Auto-fix style issues without human review
- Type checking: Fix type errors before they become runtime errors
- Security scanning: Catch vulnerabilities during development, not in production
- Performance profiling: Detect regressions during feature work
The principle is the same: automate the feedback loop, remove the human messenger.
Reference Links
- Engineering SOP Best Practices - Verification loop principles
- AI Coding Test Automation - Patterns for AI-assisted testing
- Self-Healing Code Loops - Advanced verification techniques
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments