How to Build Self-Verifying AI Coding Workflows That Catch Their Own Errors
I kept finding the same problem in my AI coding workflow: the model would generate code, I’d review it, catch an error, ask it to fix, and repeat. The human verification bottleneck was exhausting me.
Then I saw a comment on r/ClaudeAI that clicked: “Not ‘trust but verify.’ Not ‘review the output.’ Build verification into the process itself so bad output can’t ship.”
The Old Workflow: Human as Quality Gate
My old workflow looked like this:
1. AI generates code2. Human reviews output3. Human catches errors4. Human asks AI to fix5. Repeat until correctThe problems were obvious in hindsight:
- I became the verification bottleneck
- Errors slipped through when I got tired
- Context grew, my attention degraded
The worst part: the model was perfectly capable of seeing its own errors. It just needed to be shown them before I saw the output.
The Shift: System Verification, Not Model Verification
The key distinction that changed everything:
WRONG: Ask the model to verify its own output "Please check your code for errors" Result: Model hallucinates that it checked
RIGHT: System verifies, model sees result Code runs typecheck If fail: model sees error, regenerates If pass: human sees outputThe model doesn’t verify itself. The system verifies. The model responds to verification results.
A Concrete Example: ASCII Diagram Verification
I saw this pattern play out with ASCII diagram generation. Models struggle with ASCII art alignment because they generate text left-to-right without spatial awareness.
The old approach:
Model generates diagram→ Human: "The corners don't connect"→ Model regenerates→ Human: "Still broken on the right side"→ Model regenerates→ Human finally approvesThe self-verifying approach:
def generate_diagram(spec): for attempt in range(MAX_ATTEMPTS): grid_output = grid_engine.generate(spec) verification_result = verifier.check(grid_output)
if verification_result.passed: return grid_output.render()
# Model sees failure, adjusts, tries again spec = adjust_from_errors(spec, verification_result.errors)
raise VerificationFailed()
# User only sees successfully verified outputThe verifier checks: do corners connect? Are columns aligned? The model never “verifies” anything. It responds to verification failures.
The Acceptance Criteria Pattern
I now define tasks with explicit acceptance criteria that the system checks:
task: "Implement user authentication"acceptance_criteria: - tests_pass: true - typecheck_clean: true - no_console_logs: true - coverage_minimum: 80The system runs these checks. The model cannot mark the task complete until they pass.
This shifts the failure mode from “hallucination” to “missing files”:
Before: Model claims it wrote tests (hallucination) → Human discovers missing tests later
After: System checks: do test files exist? → Model sees "tests not found", generates themMissing files are trivial to debug. Hallucinations are not.
Why This Works
The insight from the Reddit thread:
“Once you start checking programmatically, the failure mode shifts from hallucination to missing files. Way easier to debug.”
Rules degrade as context grows. Infrastructure doesn’t.
A CLAUDE.md rule that says “always run tests after editing” works until the context fills up and the model forgets. A script that runs tests after every edit works forever.
Implementation Patterns I Use
Post-Edit Typechecks
Run typecheck on the file you just touched, not the whole project:
# After editing src/auth/login.tstsc --noEmit src/auth/login.tsPre-Task-Completion Test Suite
Before the model can claim a task is done, the test suite runs:
1. Model claims task complete2. System runs: npm test3. If fail: Model sees failures, continues4. If pass: Task marked completeArtifact Existence Checks
For generated artifacts (images, files, configs), verify existence:
Task: "Generate 5 component files"Check: Do all 5 files exist? Do they contain expected exports?Result: Model cannot proceed until check passesCommon Mistakes I Made
Treating verification as a prompt
Prompt: "Please verify your output is correct before responding."Result: Model says "I verified it" without actually checking.Running verification once at the end
1. Model writes 10 files2. Run verification3. Find errors in file 24. Model now has to remember what it did in file 2Better: verify after each file.
Checking too many things
acceptance_criteria: - tests_pass: true - typecheck_clean: true - lint_clean: true - coverage_minimum: 100 - no_any_types: true - cyclomatic_complexity: under_10 - documentation_complete: trueFocus on what matters for the task. Over-verification creates friction without value.
Relying on AI to verify without code
The model cannot “check” anything. It can only respond to check results. Verification requires actual code execution.
When This Pattern Shines
This approach is most valuable when:
- You’re iterating on the same type of output repeatedly
- Errors follow predictable patterns
- Verification can be automated with code
- The cost of a bug escaping is high
It’s overkill when:
- One-off tasks that won’t repeat
- Errors are hard to programmatically detect
- The verification code would be more complex than the task
The Mental Model
Traditional: Human → AI → Human reviews → Human fixes (Human as quality gate)
Self-verifying: Human → AI → System verifies → AI fixes if needed → Human (Human only sees verified output)The model proposes. The system verifies. You never see the broken version.
Summary
Stop reviewing AI output manually. Build verification into the process so bad output literally cannot ship. The model proposes, the system verifies, and you only see the verified result.
This is the difference between an AI assistant that needs constant supervision and one that can operate autonomously. The verification infrastructure you build today will work long after your prompt rules have been forgotten.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments