Why LLMs Fail at Visual Tasks Like ASCII Diagrams (And How to Fix It)
I asked Claude to draw an ASCII diagram. It produced broken arrows.
So I asked again. Different prompt. Still broken.
Then I tried GPT-4. Same problem. Different broken arrows, but still broken.
I spent two hours tweaking prompts. “Be more careful.” “Check alignment.” “Use fixed-width font.” Nothing worked.
That’s when I realized: I was trying to prompt my way out of a structural limitation.
The Problem: Models Can’t See What They Draw
LLMs generate text token by token. Left to right. Top to bottom. They don’t have spatial awareness of what they’ve already created.
When a model draws:
+---+ +---+| A |----->| B |+---+ +---+ | v +----+ | C | +----+It doesn’t know the arrow from B to C is misaligned until after it’s committed to the output. By then, it’s too late.
The model literally cannot verify its own visual work. It’s blind to the structure it just created.
Why All the “Fixes” Failed
I tried everything I could think of:
[x] "Be more careful with alignment" -> Still broken[x] "Check your output before finishing" -> Model can't check what it can't see[x] "Use this template" -> Works for template, fails on variations[x] "Draw in a grid" -> Model loses track of position[x] Screenshot + vision model verification -> Too slow, still unreliableEvery workaround shared the same flaw: I was asking the model to do something it structurally cannot do.
Then I saw a thread on r/ClaudeAI. Alex Ellis (OpenFaaS founder) had posted about the same problem. 2.8K views, tons of replies. Every single response suggested fixing the output: manual vim fixes, Python validators, switching to Mermaid, using Excalidraw.
Nobody asked why the model fails at this task.
The Structural Explanation
The issue isn’t prompting. It’s architecture.
Model generates: +---+ | A | <- Has no idea where the right edge is +---+ <- Just placing characters
Model generates: +---+ | B | <- Where did A end? Can't check. +---+
Model generates: -----> <- Arrow starts... somewhere? Hope it lines up.The model has zero spatial memory. It generates each token based on probability, not visual coherence. When it places the arrow shaft, it’s guessing where it should connect based on training data, not actual position.
This is why “check your output” prompts don’t work. The model can review its text, but it can’t “see” the visual structure any more than you can see the visual structure of a word you just wrote.
The Real Fix: Replace Visual with Code
I stopped asking the model to draw. I started asking it to generate coordinates.
Instead of:
Prompt: "Draw an ASCII diagram with three connected boxes"I built a grid engine with a coordinate API:
# Model generates coordinates, not ASCIIclass GridDiagram: def __init__(self, width=80, height=40): self.boxes = [] self.arrows = [] self.grid = [[' ' for _ in range(width)] for _ in range(height)]
def add_box(self, label, row, col, width=3, height=2): self.boxes.append({ 'label': label, 'row': row, 'col': col, 'width': width, 'height': height })
def add_arrow(self, from_label, to_label, direction='right'): self.arrows.append({ 'from': from_label, 'to': to_label, 'direction': direction })
def verify(self): """Check all connections are valid before rendering.""" errors = [] for arrow in self.arrows: from_box = self._find_box(arrow['from']) to_box = self._find_box(arrow['to']) if not self._can_connect(from_box, to_box, arrow['direction']): errors.append(f"Cannot connect {arrow['from']} to {arrow['to']}") return len(errors) == 0, errors
def render(self): """Only called after verification passes.""" for box in self.boxes: self._draw_box(box) for arrow in self.arrows: self._draw_arrow(arrow) return '\n'.join(''.join(row) for row in self.grid)The model’s job becomes:
diagram = GridDiagram()diagram.add_box("A", row=0, col=0)diagram.add_box("B", row=0, col=10)diagram.add_box("C", row=3, col=7)diagram.add_arrow("A", "B", direction="right")diagram.add_arrow("B", "C", direction="down")
# Verification happens automaticallyif not diagram.verify(): # Ask model to regenerate coordinates pass
# Only render if verification passesprint(diagram.render())The model never has to “see” alignment. It follows an API. Code verifies the result.
The Verifier Catches What Models Can’t See
Here’s what the verification logic checks:
def _can_connect(self, from_box, to_box, direction): """Verify that an arrow can physically connect two boxes."""
# Find the connection points from_edge = self._get_edge_position(from_box, direction) to_edge = self._get_entry_position(to_box, direction)
# Check 1: Is there clear space for the arrow shaft? if not self._path_clear(from_edge, to_edge, direction): return False
# Check 2: Are boxes overlapping? if self._boxes_overlap(from_box, to_box): return False
# Check 3: Does the arrow actually connect to both boxes? if not self._valid_connection_point(from_edge, from_box): return False if not self._valid_connection_point(to_edge, to_box): return False
return TrueThe verifier catches problems the model literally cannot perceive:
- Corners with missing connections
- Arrow heads with no shaft leading to them
- Gaps in arrow runs
- Boxes that overlap
Test Results
I ran 31 test cases through this system:
Test Case | Result----------------------------------+--------Simple two-box horizontal arrow | PASSThree-box vertical flow | PASSBidirectional arrows | PASSCrossing arrows | PASSOverlapping boxes (should fail) | FAIL (correctly rejected)Arrow with no target | FAIL (correctly rejected)Arrow to non-existent box | FAIL (correctly rejected)...----------------------------------+--------Total: 31 testsFalse positives: 0False negatives: 0Zero false positives on valid diagrams. The verifier catches what the model cannot see.
The Broader Pattern
This isn’t just about ASCII diagrams. The same principle applies to:
Task | Model Struggles With------------------------+--------------------------------ASCII diagrams | Spatial alignmentTables | Column width consistencyCode formatting | Indentation depthStructured output | Nested bracket matchingJSON generation | Brace balanceAll of these are visual/spatial tasks where the model lacks awareness of its own output structure.
The solution pattern is always the same:
1. Identify what the model cannot verify (visual output)2. Replace with what the model CAN verify (API calls, coordinates)3. Add programmatic verification4. Only render after verification passesWhy This Works
Models are excellent at:
- Following API specifications
- Generating structured data (coordinates, parameters)
- Producing valid function calls
Models are terrible at:
- Verifying visual output they just created
- Maintaining spatial awareness across tokens
- “Seeing” the result of their generation
The coordinate-based approach plays to the model’s strengths. Instead of fighting the architecture, it works with it.
A Warning About the Obvious “Solution”
You might think: “Just use Mermaid or Excalidraw instead.”
That’s not solving the problem. That’s switching tools. It’s valid if you just need a diagram. But if you’re building a system that needs to generate visual output, you still need to understand why models fail.
The same failure mode will show up whenever you ask a model to produce something it cannot verify:
"Format this JSON nicely" -> Model can't verify brace balance"Create a markdown table" -> Model can't verify column alignment"Draw a box around this text" -> Model can't verify box closureEach of these can be solved with the same pattern: replace the visual task with a verifiable API task.
The Three-Question Test
Before asking any model to produce visual output, I now ask:
1. Can the model verify what it just produced? If no -> The task is structurally problematic
2. Can I replace the visual output with API calls? If yes -> Do that instead
3. Can I add programmatic verification? If yes -> Verify before showing resultIf the answer to all three is “no,” I’ve found a task models simply cannot do reliably.
Summary
LLMs cannot “see” visual output they generate. This is a structural limitation, not a prompting failure.
The solution: don’t ask models to do visual tasks. Ask them to generate coordinates and parameters. Then use code to verify and render.
The pattern is simple but powerful:
Bad: Model -> Visual Output -> Hope it's rightGood: Model -> Coordinates -> Verify -> RenderIf you find yourself writing prompt after prompt trying to fix broken visual output, stop. You’re fighting architecture. Build a verification layer instead.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments