Why LLMs Fail at Visual Tasks Like ASCII Diagrams (And How to Fix It)

Mar 27, 2026

Cowrie

Dev @ Bswen

I asked Claude to draw an ASCII diagram. It produced broken arrows.

So I asked again. Different prompt. Still broken.

Then I tried GPT-4. Same problem. Different broken arrows, but still broken.

I spent two hours tweaking prompts. “Be more careful.” “Check alignment.” “Use fixed-width font.” Nothing worked.

That’s when I realized: I was trying to prompt my way out of a structural limitation.

The Problem: Models Can’t See What They Draw

LLMs generate text token by token. Left to right. Top to bottom. They don’t have spatial awareness of what they’ve already created.

When a model draws:

+---+      +---+
| A |----->| B |
+---+      +---+
     |
     v
    +----+
    | C  |
    +----+

It doesn’t know the arrow from B to C is misaligned until after it’s committed to the output. By then, it’s too late.

The model literally cannot verify its own visual work. It’s blind to the structure it just created.

Why All the “Fixes” Failed

I tried everything I could think of:

[x] "Be more careful with alignment"        -> Still broken
[x] "Check your output before finishing"    -> Model can't check what it can't see
[x] "Use this template"                     -> Works for template, fails on variations
[x] "Draw in a grid"                        -> Model loses track of position
[x] Screenshot + vision model verification  -> Too slow, still unreliable

Every workaround shared the same flaw: I was asking the model to do something it structurally cannot do.

Then I saw a thread on r/ClaudeAI. Alex Ellis (OpenFaaS founder) had posted about the same problem. 2.8K views, tons of replies. Every single response suggested fixing the output: manual vim fixes, Python validators, switching to Mermaid, using Excalidraw.

Nobody asked why the model fails at this task.

The Structural Explanation

The issue isn’t prompting. It’s architecture.

Model generates: +---+
                  | A |   <- Has no idea where the right edge is
                  +---+     <- Just placing characters

Model generates:       +---+
                      | B |   <- Where did A end? Can't check.
                      +---+

Model generates: ----->    <- Arrow starts... somewhere? Hope it lines up.

The model has zero spatial memory. It generates each token based on probability, not visual coherence. When it places the arrow shaft, it’s guessing where it should connect based on training data, not actual position.

This is why “check your output” prompts don’t work. The model can review its text, but it can’t “see” the visual structure any more than you can see the visual structure of a word you just wrote.

The Real Fix: Replace Visual with Code

I stopped asking the model to draw. I started asking it to generate coordinates.

Instead of:

Prompt: "Draw an ASCII diagram with three connected boxes"

I built a grid engine with a coordinate API:

# Model generates coordinates, not ASCII
class GridDiagram:
    def __init__(self, width=80, height=40):
        self.boxes = []
        self.arrows = []
        self.grid = [[' ' for _ in range(width)] for _ in range(height)]

    def add_box(self, label, row, col, width=3, height=2):
        self.boxes.append({
            'label': label,
            'row': row,
            'col': col,
            'width': width,
            'height': height
        })

    def add_arrow(self, from_label, to_label, direction='right'):
        self.arrows.append({
            'from': from_label,
            'to': to_label,
            'direction': direction
        })

    def verify(self):
        """Check all connections are valid before rendering."""
        errors = []
        for arrow in self.arrows:
            from_box = self._find_box(arrow['from'])
            to_box = self._find_box(arrow['to'])
            if not self._can_connect(from_box, to_box, arrow['direction']):
                errors.append(f"Cannot connect {arrow['from']} to {arrow['to']}")
        return len(errors) == 0, errors

    def render(self):
        """Only called after verification passes."""
        for box in self.boxes:
            self._draw_box(box)
        for arrow in self.arrows:
            self._draw_arrow(arrow)
        return '\n'.join(''.join(row) for row in self.grid)

The model’s job becomes:

diagram = GridDiagram()
diagram.add_box("A", row=0, col=0)
diagram.add_box("B", row=0, col=10)
diagram.add_box("C", row=3, col=7)
diagram.add_arrow("A", "B", direction="right")
diagram.add_arrow("B", "C", direction="down")

# Verification happens automatically
if not diagram.verify():
    # Ask model to regenerate coordinates
    pass

# Only render if verification passes
print(diagram.render())

The model never has to “see” alignment. It follows an API. Code verifies the result.

The Verifier Catches What Models Can’t See

Here’s what the verification logic checks:

def _can_connect(self, from_box, to_box, direction):
    """Verify that an arrow can physically connect two boxes."""

    # Find the connection points
    from_edge = self._get_edge_position(from_box, direction)
    to_edge = self._get_entry_position(to_box, direction)

    # Check 1: Is there clear space for the arrow shaft?
    if not self._path_clear(from_edge, to_edge, direction):
        return False

    # Check 2: Are boxes overlapping?
    if self._boxes_overlap(from_box, to_box):
        return False

    # Check 3: Does the arrow actually connect to both boxes?
    if not self._valid_connection_point(from_edge, from_box):
        return False
    if not self._valid_connection_point(to_edge, to_box):
        return False

    return True

The verifier catches problems the model literally cannot perceive:

Corners with missing connections
Arrow heads with no shaft leading to them
Gaps in arrow runs
Boxes that overlap

Test Results

I ran 31 test cases through this system:

Test Case                          | Result
----------------------------------+--------
Simple two-box horizontal arrow    | PASS
Three-box vertical flow           | PASS
Bidirectional arrows              | PASS
Crossing arrows                   | PASS
Overlapping boxes (should fail)   | FAIL (correctly rejected)
Arrow with no target              | FAIL (correctly rejected)
Arrow to non-existent box         | FAIL (correctly rejected)
...
----------------------------------+--------
Total: 31 tests
False positives: 0
False negatives: 0

Zero false positives on valid diagrams. The verifier catches what the model cannot see.

The Broader Pattern

This isn’t just about ASCII diagrams. The same principle applies to:

Task                    | Model Struggles With
------------------------+--------------------------------
ASCII diagrams          | Spatial alignment
Tables                  | Column width consistency
Code formatting         | Indentation depth
Structured output       | Nested bracket matching
JSON generation         | Brace balance

All of these are visual/spatial tasks where the model lacks awareness of its own output structure.

The solution pattern is always the same:

1. Identify what the model cannot verify (visual output)
2. Replace with what the model CAN verify (API calls, coordinates)
3. Add programmatic verification
4. Only render after verification passes

Why This Works

Models are excellent at:

Following API specifications
Generating structured data (coordinates, parameters)
Producing valid function calls

Models are terrible at:

Verifying visual output they just created
Maintaining spatial awareness across tokens
“Seeing” the result of their generation

The coordinate-based approach plays to the model’s strengths. Instead of fighting the architecture, it works with it.

A Warning About the Obvious “Solution”

You might think: “Just use Mermaid or Excalidraw instead.”

That’s not solving the problem. That’s switching tools. It’s valid if you just need a diagram. But if you’re building a system that needs to generate visual output, you still need to understand why models fail.

The same failure mode will show up whenever you ask a model to produce something it cannot verify:

"Format this JSON nicely"     -> Model can't verify brace balance
"Create a markdown table"     -> Model can't verify column alignment
"Draw a box around this text" -> Model can't verify box closure

Each of these can be solved with the same pattern: replace the visual task with a verifiable API task.

The Three-Question Test

Before asking any model to produce visual output, I now ask:

1. Can the model verify what it just produced?
   If no -> The task is structurally problematic

2. Can I replace the visual output with API calls?
   If yes -> Do that instead

3. Can I add programmatic verification?
   If yes -> Verify before showing result

If the answer to all three is “no,” I’ve found a task models simply cannot do reliably.

Summary

LLMs cannot “see” visual output they generate. This is a structural limitation, not a prompting failure.

The solution: don’t ask models to do visual tasks. Ask them to generate coordinates and parameters. Then use code to verify and render.

The pattern is simple but powerful:

Bad:  Model -> Visual Output -> Hope it's right
Good: Model -> Coordinates -> Verify -> Render

If you find yourself writing prompt after prompt trying to fix broken visual output, stop. You’re fighting architecture. Build a verification layer instead.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: ClaudeAI

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!