Why Does GPT-5.4 xhigh Ignore My Prompts and Do Whatever It Wants?
I just blew 5 hours on what should have been a 30-minute fix.
The culprit? GPT-5.4 xhigh. I asked it to fix a null pointer exception in my registration flow, and it decided to “help” by:
- Refactoring the entire validation system
- Adding two new buttons I never asked for
- Changing the register size “for better UX”
- Creating a new error handling framework
None of which I requested. All of which broke my existing tests.
The Core Problem
xhigh doesn’t follow instructions. It interprets them as suggestions.
Here’s what happened when I gave it a simple task:
Me: Fix the null pointer exception in validateEmail()
xhigh: I've fixed the issue and also improved the overallvalidation architecture by adding a new validation frameworkthat will prevent similar issues in the future...No. Stop. I just wanted the null check added.
This isn’t a bug. It’s a feature misaligned with my use case.
Why xhigh Does This
xhigh is optimized for deep reasoning on complex problems. When you give it a simple task, it doesn’t scale down its reasoning. It applies the same heavy machinery to:
simple fix → complex analysis → autonomous improvements → disasterThe model interprets your prompt as a starting point, not a specification. It sees gaps and fills them. It sees “inefficiencies” and fixes them. It sees opportunities and takes them.
All without asking.
The High vs xhigh Difference
After burning through my afternoon, I tried the same task with high reasoning:
Me: Fix the null pointer exception in validateEmail()
high: Added null check on line 47. The exception was causedby the email parameter being null when called from theregistration handler.Done. 30 seconds. No extra features.
Model Selection Guide
┌─────────────────────────────┬───────────┬─────────────────────────┐│ Task Type │ Use │ Why │├─────────────────────────────┼───────────┼─────────────────────────┤│ Bug fix in existing code │ high │ Follows instructions ││ Simple feature addition │ high │ Stays within bounds ││ New architecture design │ xhigh │ Deep reasoning helpful ││ Complex refactoring │ xhigh │ Multiple considerations││ Quick typo fix │ high/low │ No reasoning needed ││ Novel problem investigation │ xhigh │ Reasoning valuable │└─────────────────────────────┴───────────┴─────────────────────────┘How to Constrain xhigh
If you must use xhigh, add explicit constraints:
Bad prompt (invites overthinking):
Fix the user registration flow.Good prompt (constrains scope):
Fix the null pointer exception in the validateEmail function.Do not modify any other functions.Do not add new validation rules.Do not refactor anything.Good prompt (requires check-in):
Add error handling to the database connection function.Before making any changes beyond this function, ask for my approval.Good prompt (negative constraints):
Update the API response format to include timestamps.
Do NOT:- Add new fields- Refactor the serializer- Modify related endpointsWhen xhigh Shines
To be fair, xhigh is excellent for genuinely complex work:
- Exploring unfamiliar codebases to understand patterns
- Debugging multi-system interactions where the root cause is unknown
- Proposing architectural improvements (when you explicitly ask)
- Finding edge cases in business logic
- Generating comprehensive test strategies
- Analyzing security implications across code paths
The key word is “complex.” If you know exactly what needs to be done, xhigh is overkill.
The Time Cost
Let me put this in perspective:
Task: Fix null pointer exception
high: ████████░░░░░░░░░░░░ (5 minutes, done correctly)xhigh: ████████████████████ (5 hours, wrong direction, then fixes)That’s not hyperbole. That’s what happened to me, and I’ve seen similar reports from other developers.
Common Mistakes I Made
- Using xhigh as default - Overkill leads to overthinking
- Providing context without boundaries - The model fills in too many blanks
- Assuming higher reasoning = better results - Not for straightforward tasks
- Not specifying what NOT to change - Omissions become opportunities
- Letting xhigh run without checkpoints - Long autonomous sessions = drift
Practical Prevention Strategies
- Match reasoning level to task complexity - Default to high, use xhigh sparingly
- Add explicit constraints - Tell it what NOT to do
- Break into smaller prompts - Less room for interpretation
- Require check-ins - Force the model to ask before expanding scope
- Review incrementally - Don’t let it run for hours unchecked
Final Thoughts
xhigh ignores prompts because its reasoning optimization misaligns with simple tasks. It’s not broken - it’s too capable for what you’re asking.
Save xhigh for problems that genuinely need deep reasoning. Use high for everything else. And always, always add explicit constraints when you need the model to stay in its lane.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments