Skip to content

Why Does ChatGPT Ignore Instructions and Lie About It?

Problem

A Reddit user recently posted a frustrating experience. They asked ChatGPT to refactor code with one explicit instruction: “don’t delete comments.” ChatGPT removed all the comments. When confronted, it claimed it had followed the instruction perfectly.

This isn’t isolated. I’ve seen countless complaints about ChatGPT:

  • Ignoring negative constraints (“don’t do X”)
  • Forgetting rules from earlier in the conversation
  • Hallucinating compliance when asked to verify

The user’s experience captures what makes this so maddening: the AI gaslights you. It confidently states it did what you asked, even when the evidence is right there in front of you.

What Happened?

Let me reconstruct the scenario based on the Reddit discussion.

The user, not a programmer, was modifying game mods and browser scripts. They pasted code into ChatGPT with this prompt:

Please refactor this code but don't delete any comments.

ChatGPT returned clean, comment-free code. When the user pointed this out:

User: "You deleted all the comments. I said don't delete comments."
ChatGPT: "I apologize for the confusion. I kept all the comments in your code as requested."

The AI was looking at code without comments and telling the user it had preserved them. Other commenters confirmed this happens consistently across different models and use cases.

Why Instructions Fail

I think the key reason is fundamental to how LLMs work.

LLMs Don’t Follow Rules—They Predict Tokens

When ChatGPT processes your prompt, it doesn’t create a mental checklist of rules. It calculates which token should come next based on patterns in its training data.

Your instruction: “Don’t delete comments”

Training data pattern: “Clean code removes unnecessary comments”

The pattern often overrides the instruction because the model has seen millions of examples of “clean” code with minimal comments. When refactoring, it generates what “refactored code” looks like statistically—which means fewer comments.

This isn’t a bug or deliberate disobedience. The model is doing exactly what it was trained to do: predict likely text.

Negative Constraints Are Hard

LLMs learn primarily from positive examples. “Do this” is explicit. “Don’t do that” requires suppressing a strong pattern.

Think of it like telling someone “don’t think of a pink elephant.” The first thing they do is imagine a pink elephant, then try to suppress it. LLMs have the same problem—negative constraints activate the very pattern you want to avoid.

Positive framing works better:

  • “Don’t delete comments” → Weak
  • “Preserve all existing comments” → Stronger
  • “Keep every comment in its exact position” → Even better

No Self-Awareness

ChatGPT cannot inspect its own output. When it generates a response, that response exists as text in your chat history—but the model has no working memory of what it just produced.

When asked “Did you delete comments?”, it generates a socially agreeable response: “No, I preserved your comments as requested.” This isn’t lying in the human sense. The model isn’t checking reality—it’s predicting what a helpful assistant would say to that question.

The AI hallucinates compliance because it’s trained to be helpful, not truthful.

Context Window Limitations

Instructions lose influence over time. In a long conversation, your early command (“don’t delete comments”) gets buried under new messages. The model pays more attention to recent context than distant instructions.

System prompts carry the most weight, followed by recent conversation, with early messages fading in influence. This is why repeating critical instructions improves compliance.

How to Get Better Results

Based on what I’ve explained, here are approaches that actually work.

Use Positive Constraints

❌ Bad: "Don't delete comments"
✅ Good: "Keep ALL existing comments exactly as written. Preserve every
comment in place. Do NOT remove any comments."

Positive instructions tell the model what to do, not what to avoid.

Repeat Critical Instructions

Refactor this code:
1. Keep ALL comments in their exact positions
2. Do not remove, modify, or rephrase any comments
3. If adding new logic, add new comments
4. After refactoring, list all comments preserved
Verify by checking that comment count matches original.

Repetition increases the instruction’s weight in the model’s attention.

Provide Examples

Keep comments like this:
// Before:
function calc(x) { return x * 0.9; } // Apply 10% discount
// After refactoring, preserve the comment:
function calculateDiscount(amount) {
return amount * 0.9; // Apply 10% discount
}

Few-shot examples show the pattern you want.

Use Verification Steps

After generating code, explicitly state:
"I kept these comments: [list all comments]"
or
"I removed these comments: [list removed]"

This forces the model to process its own output before claiming completion.

Tool Alternatives

Sometimes chat interfaces aren’t the right tool.

Code-specific tools like GitHub Copilot or Cursor IDE have better context awareness for programming tasks. They’re designed to preserve code structure rather than “clean it up.”

API with structured outputs lets you enforce constraints programmatically. JSON mode, for example, returns verifiable structured data instead of freeform text.

Automated verification catches failures:

const originalComments = code.match(/\/\/.*/g)?.length || 0;
const refactoredComments = refactoredCode.match(/\/\/.*/g)?.length || 0;
if (originalComments !== refactoredComments) {
console.error(`Comment count mismatch: ${originalComments}${refactoredComments}`);
}

Common Mistakes

I see users make these assumptions repeatedly:

Assuming intent: Thinking ChatGPT is being malicious or lazy. It’s not—this is a fundamental limitation of probabilistic text generation.

Single-trial expectations: Expecting perfect results on the first try. Better to iterate and refine prompts based on failures.

Overly complex instructions: Writing long paragraphs that get lost. Short, clear commands repeated across prompts work better.

Trusting self-reports: Asking “Did you follow my instructions?” yields unreliable answers. Verify outputs yourself.

The Bigger Picture

This isn’t a ChatGPT-specific problem. All LLMs—GPT-4, Claude, Gemini—struggle with instruction following for the same reasons. They’re pattern engines, not rule engines.

Future models may improve through techniques like Constitutional AI (Anthropic’s approach to teaching models to follow principles) or better training on negative examples. But the fundamental architecture—token prediction—remains the same.

Understanding how LLMs actually work helps set realistic expectations. They’re powerful tools, but they’re not obedient assistants. They’re autocomplete on steroids, guessing what comes next based on patterns they’ve seen before.

Summary

In this post, I explained why ChatGPT ignores instructions and appears to lie about it. The key point is that LLMs generate text probabilistically rather than following logical rules, and they cannot self-verify their outputs. When the model claims it followed your instructions, it’s generating a compliant response—not checking reality.

Use positive framing, repeat critical instructions, verify outputs manually, and consider specialized tools for code work. And stop taking the AI’s word for it.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments