How Do I Stop AI Coding Assistants from Over-Engineering My Code?

Mar 11, 2026

The Problem

I asked my AI coding assistant to add error handling to a single function. Twenty minutes later, I was reviewing a refactored module with new logging infrastructure, a custom error class hierarchy, configuration abstractions, and a “future-proof” error handling system I never asked for.

This isn’t a one-time frustration. From a Reddit discussion comparing GLM5 and Codex:

“Codex tends to write code I don’t want. It will over engineer and go well outside the lines of what I ask.”

The code works. It’s technically “better” in some abstract sense. But it’s not what I asked for, and now I have to review, test, and maintain code I didn’t want.

What I Discovered About Over-Engineering AI

I spent weeks comparing different AI coding assistants—Claude, GPT-5.4-codex, and GLM5—across a mid-sized Clojure project. The pattern became clear: high-capability models leverage their extended context and reasoning to anticipate needs you never expressed.

Here’s what that looks like in practice.

The Codex Experience

When I asked Codex to add validation to a user input function:

(defn process-user [input]
  (-> input
      (transform-data)
      (save-to-db)))

Codex produced this:

(defn process-user
  "Processes user input with comprehensive validation and error handling.
   Supports multiple input formats and provides detailed error messages."
  [input]
  (let [validator (create-validator {:schema :user-input
                                     :strict-mode true})
        result (validator/validate validator input)]
    (if (:valid? result)
      (try
        (let [transformed (transform-data input)
              saved (save-to-db transformed)]
          {:success true :data saved})
        (catch Exception e
          (log/error "Processing failed"
                     {:input input
                      :error (.getMessage e)
                      :timestamp (System/currentTimeMillis)})
          {:success false :error :processing-failed}))
      {:success false
       :errors (:errors result)
       :validation-failed true})))

Notice what happened:

Added a validation framework I didn’t request
Changed the return type from data to a result map
Added logging infrastructure
Included error codes and timestamps
Modified the function signature and documentation

None of this is wrong. But I asked for validation, not a validation framework.

The GLM Difference

The same request with GLM produced:

(defn process-user [input]
  (when (valid-input? input)
    (-> input
        (transform-data)
        (save-to-db))))

GLM added a simple validation check. That’s it. The return type stayed the same. No new dependencies. No infrastructure changes.

From the Reddit discussion:

“Although it holds context longer, codex tends to not follow my instructions as well as glm.”

GLM follows instructions more precisely. Codex reasons about what you “should” want. Both approaches have trade-offs.

Why This Happens

Over-engineering stems from three model behaviors:

1. Context Accumulation

High-capability models accumulate “improvement ideas” as they work through your codebase. A 200K context window means 200K tokens of opportunity to notice patterns and suggest enhancements.

Session progression (my observation):
- Start: "Add validation to this function"
- After reading related code: "I should standardize validation patterns"
- After seeing error handling elsewhere: "I should add comprehensive error handling"
- After noticing logging: "I should add structured logging"
- Result: A comprehensive error handling system when you asked for validation

2. Reasoning Prioritization

Models like Codex prioritize reasoning depth over instruction precision. They ask: “What’s the best solution?” instead of “What’s the solution I was asked for?“

3. Minimalist Ecosystem Mismatch

In ecosystems like Clojure that favor composability over frameworks, over-engineering is particularly disruptive. A Reddit commenter noted:

“In the Clojure world we tend to not use web app frameworks… just a collection of hand picked libraries.”

When your codebase values simplicity and minimal abstractions, AI that adds “flexibility layers” creates technical debt, not improvement.

What Actually Works

I developed several techniques to keep AI output aligned with my instructions.

1. Work in Shorter Sessions

Instead of:
- "Refactor the authentication module for better security"

Do this:
1. "Add rate limiting to the login endpoint"
2. [Review output immediately]
3. "Add input validation to password reset"
4. [Review output immediately]
5. "Add audit logging for failed logins"

Breaking work into smaller pieces with reviews between each piece catches over-engineering before it compounds.

2. Use Explicit Constraints

Start prompts with what NOT to do:

Task: Add rate limiting to the login endpoint

Constraints:
- Do NOT modify any other functions
- Do NOT add new dependencies
- Do NOT change the return type
- Do NOT add logging or monitoring
- ONLY modify the login function

Specific implementation:
- Limit to 5 requests per minute per IP
- Return 429 status on limit exceeded

The negative constraints are more important than the positive ones.

3. Choose the Right Model for the Task

| Task Type                     | Recommended Model     | Why                                |
|-------------------------------|----------------------|-------------------------------------|
| Precise edits, bug fixes      | GLM                  | Better instruction following        |
| Complex architecture          | Codex                | Better at anticipating needs        |
| Legacy code modification      | GLM                  | Less likely to "modernize" unwanted |
| Minimalist ecosystems        | GLM                  | Respects compositional patterns     |
| Greenfield development       | Codex                | Can leverage full reasoning         |

Match model capabilities to your needs, not the other way around.

4. Layer Your Prompts

Describe the minimal viable change first, then add negative constraints:

Layer 1 (What I want):
- Add null check to getUserById()

Layer 2 (What I don't want):
- Do not refactor surrounding code
- Do not add new abstractions
- Do not change error handling patterns

Layer 3 (Scope limit):
- Only modify the getUserById function
- Only add the null check, nothing else

5. Set Review Boundaries

Explicit checkpoints force you to review before AI continues:

# Checkpoint workflow:
# 1. Request specific change
# 2. Review output immediately
# 3. If output exceeds scope, reset and retry with tighter constraints
# 4. Only proceed when output matches request

# Red flag patterns in output:
# - "I also added..."
# - "For future extensibility..."
# - "To make this more robust..."
# - Any new imports or dependencies

The Real Cost of Over-Engineering

Every line of generated code requires review effort. When AI produces 50 lines for a 5-line request:

Review time: You must understand all 50 lines
Testing burden: More code means more test cases
Maintenance cost: Someone must maintain code they didn’t write
Trust erosion: You start assuming AI output is wrong by default

From my Clojure project experience, I spent more time reviewing and trimming over-engineered code than I saved from the initial generation.

Common Mistakes to Avoid

Assuming Longer Context Is Always Better

My assumption: "200K context means the model understands my codebase better."

Reality: "200K context means the model has 200K tokens to accumulate
'good ideas' I never asked for."

Longer context is valuable for understanding your codebase. It’s a liability when you need precise, minimal changes.

Providing Vague Instructions

BAD: "Improve this code"
GOOD: "Fix the null pointer exception in getUserById(). Do not modify any other functions."

Vague instructions invite interpretation. Explicit instructions constrain interpretation.

Letting AI Run Autonomously Across Files

When AI has permission to modify multiple files, over-engineering spreads. One file becomes five. A simple change becomes a refactoring effort.

Choosing Reasoning-Heavy Models for Precise Edits

Codex excels at reasoning. GLM excels at instruction following. When you need the latter, don’t use the former.

Not Stating What NOT to Change

Positive constraints define what you want. Negative constraints define what you don’t want. Both matter.

Summary

In this post, I showed how to prevent AI coding assistants from over-engineering by matching model capabilities to task types, working in short sessions with explicit constraints, and reviewing output incrementally.

The key insight is that high-capability models like Codex prioritize reasoning over instruction precision. They produce technically better code that isn’t what you asked for. GLM follows instructions more precisely but may lack the reasoning depth for complex architectural decisions.

Control AI coding assistants by:

Working in short sessions with incremental reviews
Using explicit constraints, especially negative ones
Choosing instruction-focused models for precise edits
Setting review boundaries to catch over-engineering early
Never assuming longer context is automatically better

The goal isn’t to prevent AI from being helpful—it’s to ensure AI output matches what you actually need, not what the model decides you should have.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!