AI Writes Working Code But Fails at Architecture: What the 26-Point Gap Means
The Problem
I asked an AI to write a network connection handler. It produced working code—connections opened, data flowed, responses returned. But when I looked at the architecture, I saw a mess.
The connection layer embedded the HTTP parser directly. Each fetch() call created its own epoll instance (a Linux I/O notification mechanism). Three different timer implementations coexisted in the same codebase. Request types were split into two separate hierarchies with no clear relationship.
The code worked. But I couldn’t maintain it.
Then I ran an experiment. I asked an NGINX engineer to score the AI-generated code against human-guided architecture on 6 dimensions. The results:
| Dimension | AI-Only Score | After Human Guidance |
|---|---|---|
| Functionality | 8/10 | 8/10 (unchanged) |
| Readability | 8/10 | 8/10 (unchanged) |
| Async Design | 7/14 | 14/14 (+7) |
| Abstraction Quality | 5/9 | 9/9 (+4) |
| Maintainability | 4/7 | 7/7 (+3) |
| Code Organization | 5/8 | 8/8 (+3) |
| Total | 59/105 | 85/105 |
The 26-point gap came entirely from architecture. AI scored well on functionality and readability—things that don’t need strategic judgment. It failed where design decisions matter.
Why AI Writes “Piled Together” Code
AI writes code like building a house without blueprints. Each wall is well-built, but the overall layout makes no sense.
Here’s what the AI generated for a connection structure:
// AI's approach: Everything in one layertypedef struct connection_s { // Connection state int fd; int state;
// HTTP parser embedded here http_parser parser; char buffer[4096];
// Timer mixed in rbtree_node_t timer;
// Request state scattered int request_type; void *request_data;} connection_t;See the problem? The connection struct knows about HTTP parsing, buffering, timers, and request state. These are separate concerns that should be separate concepts.
When the HTTP parser changes, the connection struct changes. When timer implementation changes, the connection struct changes. When I need to reuse the connection logic for a different protocol, I can’t—I’m locked into HTTP.
This is the “piled together” pattern. Modules are stacked without clear boundaries. AI generates code that works locally but creates chaos globally.
What Proper Architecture Looks Like
I gave the AI architectural constraints instead of specific code changes:
“Separate concerns into distinct concepts. The connection layer shouldn’t know about HTTP. The timer mechanism shouldn’t be tied to connections.”
From these constraints, the AI derived the refactoring:
// Proper layering with clear abstractionstypedef struct connection_s { int fd; int state;} connection_t;
typedef struct http_peer_s { connection_t *conn; http_parser *parser; buffer_t *in; buffer_t *out;} http_peer_t;
typedef struct timer_s { rbtree_node_t node; void (*handler)(void *data); void *data;} timer_t;Now each concept is isolated. The connection layer handles only TCP state. The HTTP peer layer composes connection with parsing and buffering. The timer is a standalone utility.
I can test each component independently. I can reuse the connection layer for WebSockets or custom protocols. When the HTTP spec changes, I modify http_peer_t without touching connection_t.
The Constraint-Not-Solution Pattern
The key insight: I didn’t tell the AI to “split this struct into three.” I gave it architectural direction.
This pattern works because:
-
Constraints propagate. One accurate constraint (“separate concerns”) generates dozens of specific changes across files.
-
AI handles derivation well. Given a direction, AI produces correct implementation details.
-
Humans provide judgment. I know that connection and HTTP parsing should be separate. AI doesn’t—it just knows syntax.
The mistake I see often: engineers give AI specific implementation instructions instead of architectural guidance.
WRONG: "Split the connection struct into three structs"RIGHT: "The connection layer shouldn't know about HTTP parsing"The first tells AI what to do. The second tells AI the design principle. The second produces better architecture because AI applies the principle consistently across the codebase.
Why Architecture Matters More as AI Improves
Here’s the paradox: as code-writing costs approach zero, architecture becomes MORE important, not less.
Before AI, writing code was expensive. You’d think carefully about design because implementation took time. Bad architecture hurt, but you could only write so much technical debt.
With AI, implementation is cheap. You can generate thousands of lines in minutes. Without architectural guidance, you generate technical debt at scale—fast code that’s impossible to maintain.
I’ve seen teams produce more unmaintainable code in one month with AI than they could produce in a year manually. The bottleneck shifts from “how fast can we write code?” to “how fast can we understand what we wrote?”
Architecture is the blueprint. Code is the bricks. AI lays bricks faster than any human, but cannot draw blueprints.
Common Architectural Mistakes When Using AI
I’ve made these mistakes myself. Learn from them:
Reviewing only functionality and naming. AI does these well. The problems hide in module boundaries and dependency directions.
Ignoring module boundaries. Check which modules depend on which. Circular dependencies, wrong-layer dependencies—these are where AI-generated architecture fails.
Not expressing concepts as proper types. AI often represents domain concepts as primitives (int, string) instead of types. This scatters logic across the codebase.
Leaving complexity unisolated. Complex logic (async state machines, protocol parsing) should be isolated. AI tends to spread complexity around.
Here’s an example of the type problem:
// AI's approach: Primitive typesvoid handle_request(int type, char* data, int len);// Better: Express concepts as typestypedef struct { request_type_t type; buffer_t data;} request_t;
void handle_request(request_t* req);The typed version catches errors at compile time. The primitive version requires runtime checks everywhere.
How to Guide AI Toward Better Architecture
I use a checklist when reviewing AI-generated code:
-
Check layer boundaries. Does lower-level code know about higher-level concepts?
-
Check abstraction quality. Are concepts expressed as types or scattered as primitives?
-
Check dependency direction. Do dependencies point inward (toward core) or outward?
-
Check cohesion. Does each module do one thing, or multiple unrelated things?
-
Check coupling. Can I change one module without changing others?
When I find problems, I don’t fix the code. I give the AI the architectural constraint:
“This module has two responsibilities. Split them.”
“This type knows about its caller. Invert the dependency.”
“This concept is represented as three parameters. Make it a type.”
AI then produces the refactoring. I review the architecture again. The cycle continues until the design is sound.
The 26-Point Gap in Practice
Let me show you a real example. The AI generated this async fetch implementation:
// AI's approach: Each fetch creates its own event loopint fetch(const char* url, response_t* out) { int epfd = epoll_create1(0); // New epoll instance per fetch
int sockfd = connect_nonblock(url); epoll_event ev = {.events = EPOLLIN | EPOLLOUT, .data.fd = sockfd}; epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev);
while (1) { epoll_event events[1]; int n = epoll_wait(epfd, events, 1, timeout);
if (n > 0) { // Handle I/O... } }
close(epfd); return 0;}This works for a single fetch. But what if I need 100 concurrent fetches? I’d have 100 epoll instances, 100 event loops, 100 separate wait calls.
The architectural constraint:
“Use a shared event loop. The fetch function should register callbacks, not run its own event loop.”
AI’s refactored version:
// Shared event loop architecturetypedef struct { const char* url; response_t* out; callback_t on_complete; void* user_data;} fetch_request_t;
void fetch_async(fetch_request_t* req) { int sockfd = connect_nonblock(req->url);
// Register with shared event loop event_loop_register( global_loop, sockfd, EPOLLIN | EPOLLOUT, on_socket_ready, req );}
// Shared event loop runs elsewherevoid event_loop_run(event_loop_t* loop) { while (1) { epoll_event events[MAX_EVENTS]; int n = epoll_wait(loop->epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) { callback_t cb = events[i].data.callback; cb(events[i].data.user_data); } }}Now I can handle thousands of concurrent fetches with one event loop. The architecture scales because the design is right.
Summary
In this post, I showed why AI-generated code works but lacks coherent design. The 26-point gap between working code and maintainable code comes entirely from architectural dimensions: async design, abstraction quality, maintainability, and code organization.
The solution is giving constraints, not solutions. AI handles implementation details well when given clear architectural direction. Without that direction, AI generates “piled together” code—modules stacked without clear boundaries.
As AI makes code-writing cheaper, architecture becomes more valuable. The bottleneck shifts from implementation to understanding. Architecture is the blueprint that makes AI-generated code maintainable.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 NGINX Architecture Principles
- 👨💻 Software Architecture Patterns
- 👨💻 AI Code Generation Limitations Discussion
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments