What to Do When AI Can't Debug Your Code
I was stuck. After three attempts, Codex still couldn’t fix my authentication bug. Each suggestion looked reasonable, but the error persisted. Same TypeError, same stack trace, same frustration.
The pattern was familiar: I’d paste the error, AI would suggest a fix, I’d apply it, test it, and… still broken. Round and round, eating up 30 minutes with nothing to show for it.
Then I changed my approach. I stopped asking the AI to fix the bug and started investigating myself. What I found surprised me—and changed how I work with AI coding assistants forever.
The Problem: AI Gets Stuck in Guessing Mode
When AI repeatedly fails to solve a problem, it’s usually not because the solution is too complex. It’s because the AI lacks concrete evidence about what’s actually wrong.
Here’s what happens when you keep asking AI to fix:
Developer: "Fix this authentication bug"AI: *suggests fix based on error message*Developer: "Still failing"AI: *suggests different fix based on same error*Developer: "Same error"AI: *suggests similar fix*# AI is guessing because it doesn't have the real dataThe AI is debugging in the abstract. It sees your error message but not your data. It understands your code structure but not your runtime state. Without concrete evidence, it’s shooting in the dark.
The Insight: A Reddit Post Changed Everything
I found a post on r/codex that crystallized this problem:
“After three unsuccessful attempts, Codex still couldn’t fix the issue. So I investigated the data myself and wrote the root cause you see on the first screen—something Codex initially disagreed with.”
The poster had done something I hadn’t: they investigated manually instead of asking AI to investigate for them.
The key insight came next:
“Then I asked it to write a test for the case and reproduce the steps causing the problem. Once it did that, it fixed the issue.”
Wait—ask for a test, not a fix?
“A lot of the time the model is not really stuck on code, it is stuck on having the wrong frame for the problem. Once you write down the actual failure mode and force a repro or test, it stops wandering and gets useful fast.”
That was it. The AI wasn’t stuck on the solution—it was stuck on understanding the problem.
The Protocol: Human-AI Handoff for Debugging
I developed a workflow that works reliably when AI gets stuck:
┌─────────────────────────────────────────────────────────────┐│ AI DEBUGGING DECISION TREE │├─────────────────────────────────────────────────────────────┤│ ││ 1. Ask AI to fix → Success? → Done ││ │ ││ No (after 2-3 tries) ││ ↓ ││ 2. STOP asking AI to fix ││ ↓ ││ 3. Investigate manually ││ - Check logs, data, traces ││ - Form root cause hypothesis ││ - Write down diagnosis ││ ↓ ││ 4. Ask AI to write TEST (not fix) ││ - Provide your diagnosis ││ - Request failing test case ││ ↓ ││ 5. Run test → Confirms bug? ││ │ ││ Yes ││ ↓ ││ 6. Ask AI to make test pass ││ - AI can now self-verify ││ - Clear success criteria ││ ↓ ││ 7. Test passes → Done ││ │└─────────────────────────────────────────────────────────────┘Let me show you how this worked in practice with my authentication bug.
Case Study: The Authentication Bug
What Didn’t Work: Repeated Fix Requests
# Attempt 1Me: "Fix this authentication bug - I'm getting TypeError in token validation"AI: "The issue is likely in the comparison. Try adding a null check..."Me: "Still getting TypeError"# 10 minutes wasted
# Attempt 2Me: "Same error, here's the full stack trace..."AI: "I see the problem now. The token object might not have the expires_at attribute..."Me: "No, it has the attribute. Still failing."# 10 more minutes wasted
# Attempt 3Me: "Same TypeError. Can you fix it?"AI: "Try converting the timestamp to datetime..."# Still broken. 30 minutes total, no progress.What Did Work: Investigate First, Test Second, Fix Third
Step 1: Manual Investigation
I opened my database and checked the actual token data:
# What I found in the database:legacy_tokens = db.query(Token).filter(Token.expires_at == None).all()print(f"Found {len(legacy_tokens)} legacy tokens")# Output: Found 847 legacy tokens
# The actual error in my logs:# TypeError: '<' not supported between instances of 'datetime' and 'NoneType'There it was. Legacy tokens from a migration had expires_at = None. When the code compared datetime.now() < token.expires_at, it failed.
Step 2: Document My Diagnosis
I wrote down exactly what I found:
ROOT CAUSE:The bug is in token validation. When expires_at is None(legacy tokens created before the migration), the comparison'datetime.now() < token.expires_at' raises TypeError.
Evidence:- 847 legacy tokens have expires_at = None- Error occurs at line 42 in auth.py- Legacy tokens should be treated as expired, not crashStep 3: Ask AI to Write a Test, Not a Fix
Write a test that reproduces this bug:
Given: a token with expires_at = None (legacy token)When: we call validate_token(token)Then: it should raise TypeError
Root cause: The comparison 'datetime.now() < token.expires_at'fails when expires_at is None.
Here's the validation code:[provided relevant code snippet]Step 4: AI Writes the Test
import pytestfrom datetime import datetime
def test_validate_token_handles_none_expires_at(): """Legacy tokens with expires_at=None should not crash validation.""" legacy_token = Token( user_id=1, created_at=datetime.now(), expires_at=None # Legacy token from before migration )
with pytest.raises(TypeError): validate_token(legacy_token)I ran the test. It failed—exactly as expected. Bug reproduced!
Step 5: Now Ask AI to Fix
The test test_validate_token_handles_none_expires_at is failing.Fix the validate_token function to handle None for expires_at.Legacy tokens (expires_at=None) should be treated as expired.Step 6: AI Fixes with Confidence
def validate_token(token: Token) -> bool: """Validate token is not expired. Legacy tokens are considered expired.""" if token.expires_at is None: return False # Legacy tokens are expired return datetime.now() < token.expires_atTest passed. Done.
Why This Works: Cognitive Division of Labor
The key insight is that humans and AI have complementary strengths:
HUMANS EXCEL AT: AI EXCELS AT:- Investigation - Implementation- Pattern recognition - Test writing- Domain knowledge - Systematic fixes- Data inspection - Code generation- Hypothesis formation - Verification
COMBINED WORKFLOW:Human investigates → Documents finding → AI tests → AI fixesWhen AI lacks evidence, it guesses. When AI has a failing test, it solves.
The Efficiency Comparison
Let’s compare the two approaches:
┌──────────────────────────────────────────────────────────┐│ APPROACH: Ask AI to fix directly │├──────────────────────────────────────────────────────────┤│ AI attempts: 3-5+ ││ Human time: 0 min (just pasting errors) ││ Outcome: Often fails, AI stuck in guess loop ││ Total time: 30+ minutes │└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐│ APPROACH: Investigate + Test + Fix │├──────────────────────────────────────────────────────────┤│ AI attempts: 2 (1 test, 1 fix) ││ Human time: 10-15 min (investigation + diagnosis) ││ Outcome: Succeeds reliably ││ Total time: 15-20 minutes │└──────────────────────────────────────────────────────────┘The “lazy” approach of just asking AI to fix takes longer and fails more often. The “disciplined” approach of investigating first is actually faster.
Common Mistakes to Avoid
I’ve made all of these mistakes. Learn from my failures:
Mistake 1: Asking AI to fix without evidence
❌ "Fix this bug" → AI has no context, will guess
✅ "Here's the root cause: [specific diagnosis]. Write a test that reproduces it." → AI has target to aim forMistake 2: Refusing to investigate manually
❌ "AI should do all the work" → AI can't inspect your database, logs, or runtime state
✅ "I'll investigate the data, AI will implement the fix" → Division of labor plays to strengthsMistake 3: Accepting AI’s initial disagreement
AI: "That diagnosis doesn't seem right..."You: *doubts yourself*
✅ Stick to your evidence. If you found the bug in the data, you're right until proven otherwise.Mistake 4: Asking AI to fix before there’s a test
❌ Ask AI to fix → Hope it works → Test manually → Repeat
✅ Ask AI to test → Verify bug reproduced → Ask AI to fix → Test passesMistake 5: Giving up after AI fails
❌ "AI couldn't fix it, must be too hard" → You gave up at the exact moment you should change approach
✅ "AI failed 3 times. Time to investigate myself." → AI failure is a signal, not a stopping pointWhen to Use This Protocol
Not every debugging session needs this workflow. Here’s when to switch approaches:
┌────────────────────────────────────────────────────────┐│ USE PROTOCOL WHEN: │├────────────────────────────────────────────────────────┤│ - AI has failed 2-3 times on same bug ││ - Bug involves data/state you can inspect ││ - Error message is generic (TypeError, NoneType) ││ - AI suggestions "feel right" but don't work ││ - You suspect domain-specific or legacy issues │└────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────┐│ SKIP PROTOCOL (ASK AI DIRECTLY) WHEN: │├────────────────────────────────────────────────────────┤│ - Bug is clearly in code you can see ││ - Error message is specific (missing import, typo) ││ - AI has good context from previous messages ││ - Quick fix attempt number 1 │└────────────────────────────────────────────────────────┘The Mindset Shift
The hardest part isn’t the technique—it’s the mindset. We’ve been trained to think “AI should do everything.” But AI is a tool, not a replacement for engineering thinking.
When I hit a bug now, I ask myself: “What does AI know that I don’t?” Usually, it’s syntax and patterns. “What do I know that AI doesn’t?” Usually, it’s data, state, and domain context.
The most effective debugging happens when I combine my investigation with AI’s implementation speed.
What I Learned
After adopting this protocol, my debugging sessions changed:
- Before: 30 minutes of back-and-forth with AI, high frustration, uncertain outcomes
- After: 15 minutes of investigation + 5 minutes with AI, reliable success
The lesson: AI failure is information, not a dead end. When AI can’t fix your bug, it’s telling you it needs more context. Your job is to provide that context through investigation and tests.
Next time your AI coding assistant gets stuck, stop asking it to fix. Start investigating. Write a test. Then let AI succeed.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit r/codex Discussion
- 👨💻 Anthropic Claude Debugging Guide
- 👨💻 OpenAI Prompt Engineering Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments