Skip to content

How to Build Self-Verifying AI Coding Workflows That Catch Their Own Errors

I kept finding the same problem in my AI coding workflow: the model would generate code, I’d review it, catch an error, ask it to fix, and repeat. The human verification bottleneck was exhausting me.

Then I saw a comment on r/ClaudeAI that clicked: “Not ‘trust but verify.’ Not ‘review the output.’ Build verification into the process itself so bad output can’t ship.”

The Old Workflow: Human as Quality Gate

My old workflow looked like this:

Manual review bottleneck
1. AI generates code
2. Human reviews output
3. Human catches errors
4. Human asks AI to fix
5. Repeat until correct

The problems were obvious in hindsight:

  • I became the verification bottleneck
  • Errors slipped through when I got tired
  • Context grew, my attention degraded

The worst part: the model was perfectly capable of seeing its own errors. It just needed to be shown them before I saw the output.

The Shift: System Verification, Not Model Verification

The key distinction that changed everything:

Wrong vs. Right approach
WRONG: Ask the model to verify its own output
"Please check your code for errors"
Result: Model hallucinates that it checked
RIGHT: System verifies, model sees result
Code runs typecheck
If fail: model sees error, regenerates
If pass: human sees output

The model doesn’t verify itself. The system verifies. The model responds to verification results.

A Concrete Example: ASCII Diagram Verification

I saw this pattern play out with ASCII diagram generation. Models struggle with ASCII art alignment because they generate text left-to-right without spatial awareness.

The old approach:

Manual verification loop
Model generates diagram
→ Human: "The corners don't connect"
→ Model regenerates
→ Human: "Still broken on the right side"
→ Model regenerates
→ Human finally approves

The self-verifying approach:

Self-verifying diagram generation
def generate_diagram(spec):
for attempt in range(MAX_ATTEMPTS):
grid_output = grid_engine.generate(spec)
verification_result = verifier.check(grid_output)
if verification_result.passed:
return grid_output.render()
# Model sees failure, adjusts, tries again
spec = adjust_from_errors(spec, verification_result.errors)
raise VerificationFailed()
# User only sees successfully verified output

The verifier checks: do corners connect? Are columns aligned? The model never “verifies” anything. It responds to verification failures.

The Acceptance Criteria Pattern

I now define tasks with explicit acceptance criteria that the system checks:

Task with acceptance criteria
task: "Implement user authentication"
acceptance_criteria:
- tests_pass: true
- typecheck_clean: true
- no_console_logs: true
- coverage_minimum: 80

The system runs these checks. The model cannot mark the task complete until they pass.

This shifts the failure mode from “hallucination” to “missing files”:

Failure mode shift
Before: Model claims it wrote tests (hallucination)
→ Human discovers missing tests later
After: System checks: do test files exist?
→ Model sees "tests not found", generates them

Missing files are trivial to debug. Hallucinations are not.

Why This Works

The insight from the Reddit thread:

“Once you start checking programmatically, the failure mode shifts from hallucination to missing files. Way easier to debug.”

Rules degrade as context grows. Infrastructure doesn’t.

A CLAUDE.md rule that says “always run tests after editing” works until the context fills up and the model forgets. A script that runs tests after every edit works forever.

Implementation Patterns I Use

Post-Edit Typechecks

Run typecheck on the file you just touched, not the whole project:

Targeted typecheck
# After editing src/auth/login.ts
tsc --noEmit src/auth/login.ts

Pre-Task-Completion Test Suite

Before the model can claim a task is done, the test suite runs:

Task completion flow
1. Model claims task complete
2. System runs: npm test
3. If fail: Model sees failures, continues
4. If pass: Task marked complete

Artifact Existence Checks

For generated artifacts (images, files, configs), verify existence:

Artifact verification
Task: "Generate 5 component files"
Check: Do all 5 files exist?
Do they contain expected exports?
Result: Model cannot proceed until check passes

Common Mistakes I Made

Treating verification as a prompt

This doesn't work
Prompt: "Please verify your output is correct before responding."
Result: Model says "I verified it" without actually checking.

Running verification once at the end

This misses intermediate errors
1. Model writes 10 files
2. Run verification
3. Find errors in file 2
4. Model now has to remember what it did in file 2

Better: verify after each file.

Checking too many things

Over-verification
acceptance_criteria:
- tests_pass: true
- typecheck_clean: true
- lint_clean: true
- coverage_minimum: 100
- no_any_types: true
- cyclomatic_complexity: under_10
- documentation_complete: true

Focus on what matters for the task. Over-verification creates friction without value.

Relying on AI to verify without code

The model cannot “check” anything. It can only respond to check results. Verification requires actual code execution.

When This Pattern Shines

This approach is most valuable when:

  • You’re iterating on the same type of output repeatedly
  • Errors follow predictable patterns
  • Verification can be automated with code
  • The cost of a bug escaping is high

It’s overkill when:

  • One-off tasks that won’t repeat
  • Errors are hard to programmatically detect
  • The verification code would be more complex than the task

The Mental Model

Workflow comparison
Traditional:
Human → AI → Human reviews → Human fixes
(Human as quality gate)
Self-verifying:
Human → AI → System verifies → AI fixes if needed → Human
(Human only sees verified output)

The model proposes. The system verifies. You never see the broken version.

Summary

Stop reviewing AI output manually. Build verification into the process so bad output literally cannot ship. The model proposes, the system verifies, and you only see the verified result.

This is the difference between an AI assistant that needs constant supervision and one that can operate autonomously. The verification infrastructure you build today will work long after your prompt rules have been forgotten.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments