Skip to content

Do System Prompts Actually Change AI Agent Behavior?

I’ve been running both Claude Code and Codex CLI for months. Same tasks, similar prompts, wildly different experiences. When Claude Code’s system prompt leaked via npm source map in March 2026, I finally understood why.

But then a debate erupted: do system prompts actually change anything? Or is RLHF training the real driver of behavior?

The Question That Won’t Go Away

I run a home-rolled LLM-agnostic harness. Same system prompt, different models, very different results:

Same Prompt, Different Behavior
Prompt: "Fix the authentication bug"
Claude Code:
-> Analyzes entire auth flow
-> Finds 3 related issues I didn't mention
-> Proposes comprehensive refactor
-> Modifies 7 files
Codex CLI:
-> Locates specific bug line
-> Applies minimal patch
-> Confirms fix for stated issue
-> Touches 1 file

Same instruction. Completely different interpretation. What’s driving this?

What System Prompts Actually Do

System prompts establish baseline context before user messages. They define role, constraints, and behavioral guidelines. Most users never see them - tool developers set them.

Claude Code’s prompt is 142KB. That’s not a typo. It contains:

  • Role definition (“Anthropic’s official CLI for Claude”)
  • Tool usage instructions (Bash, Read, Edit, Write, Glob, Grep…)
  • Permission and security constraints
  • Memory management guidelines
  • Feature flags (KAIROS, BUDDY, Undercover Mode)

But here’s the key part:

Claude Code Key Instruction
When uncertain, act on your best judgment rather than
asking for clarification.

And Codex (inferred from behavior) emphasizes:

Codex CLI Expected Theme
Maintain surgical precision in code modifications.
Do exactly what the user asks - no more, no less.

Two different philosophies. Two different experiences.

The “Weights Dominate” Argument

Many practitioners argue prompts make little difference. Here’s why:

Reason 1: RLHF Bakes Behavior Into Weights

Reinforcement Learning from Human Feedback trains preferences directly into model weights. Claude was trained to be helpful, harmless, and honest. GPT was trained for instruction following and task completion.

These baked tendencies persist regardless of prompt.

Reason 2: Identical Prompts, Divergent Behaviors

One redditor put it well:

“GPT is anal about the instructions in the system prompt to a fault, while Claude takes them as more of a helpful suggestion than a hard rulebook.”

This interpretation style is learned, not prompted.

Reason 3: Prompt Interpretation Is Model-Specific

I tried giving Claude a strict “do exactly what I say” prompt. It still asked clarifying questions and offered alternatives. That’s not rebellion - that’s its RLHF training pushing through.

The “Prompts Matter” Argument

But wait. The prompt differences aren’t accidental:

Intentional Design Differences
Claude Code: "act on your best judgment"
-> Amplifies helpfulness training
-> Creates proactive, coach-like experience
Codex CLI: "surgical precision"
-> Amplifies compliance training
-> Creates precise, constitution-like experience

These are deliberate design choices. They create real user experience differences.

Users report:

  • Claude Code feels proactive, anticipates needs
  • Codex feels precise, respects boundaries

Both are valuable. Different tools for different jobs.

The Synthesis: Behavioral Envelope Model

I think both views are correct. Here’s the model that reconciles them:

Behavioral Envelope Model
+---------------------------------------------------------+
| RLHF Behavioral Envelope |
| (baked tendencies: helpfulness, compliance) |
+---------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Claude Code | | Codex CLI | |
| | Prompt Zone | | Prompt Zone | |
| | | | | |
| | Amplifies: | | Amplifies: | |
| | - Initiative | | - Precision | |
| | - Judgment | | - Compliance | |
| | - Proactivity | | - Bounds | |
| +------------------+ +------------------+ |
| |
| Same envelope boundaries, different interior |
| operating points |
+---------------------------------------------------------+

Key insight:

  • RLHF defines the boundaries (what’s possible)
  • Prompts select operating points (what’s emphasized)
  • You can’t prompt a model outside its envelope
  • But you can shift behavior significantly within the envelope

Comparison Table

Behavioral Tendency Comparison
+------------------+-------------------+-------------------+
| Aspect | Claude Code | Codex CLI |
+------------------+-------------------+-------------------+
| Initiative | High (proactive) | Low (reactive) |
| Judgment | Encouraged | Minimized |
| Bounds | Flexible | Strict |
| Interpretation | Helpful suggestion| Hard rulebook |
| Overreach risk | Higher | Lower |
| Underreach risk | Lower | Higher |
| Best for | Exploration | Precision tasks |
+------------------+-------------------+-------------------+

When Prompts Fail

Not everything can be changed. Here’s where prompts don’t work:

Fighting RLHF Training

Prompts That Fail Against RLHF
Prompting Claude to be cold and unhelpful:
-> Fights helpfulness training
-> Model resists or feels unnatural
Prompting GPT to make proactive assumptions:
-> Fights compliance training
-> Model asks for clarification instead

Outside Behavioral Envelope

Requests Beyond the Envelope
Asking Claude to ignore safety constraints:
-> Hard refusal
-> RLHF safety training dominates
Asking GPT to act without explicit instructions:
-> Confused response
-> Model asks "what should I do?"

Weak Prompt vs Strong RLHF

Generic prompts with weak guidance? The model defaults to RLHF-dominant behavior. The prompt needs specific, aligned instructions to shift the operating point.

The Emerging Frontier: Prompt Baking

Academic research (arxiv 2409.13697) shows prompts can be “baked” into weight updates. The process converts a prompt u and weights theta to new weights theta_u where the baked LLM behaves like the original prompted LLM.

This could change everything:

  • Current debate becomes obsolete
  • Custom models could have “baked prompts”
  • Behavior customization without runtime prompts

But that’s future research. Today, we work with the envelope.

Practical Implications

For Tool Developers

Prompts matter but aren’t magic. Design prompts that align with model’s RLHF tendencies:

  • Claude: Leverage helpfulness, encourage proactivity
  • GPT: Leverage compliance, encourage precision

Working against RLHF is fighting uphill.

For Practitioners

Choose tools based on behavioral envelope fit:

  • Claude Code for exploratory, complex work
  • Codex for precise, bounded tasks

Don’t expect prompts to override fundamental model tendencies.

For Prompt Engineers

Work within the envelope, not against it:

  • Amplify desired tendencies rather than suppressing unwanted ones
  • Understand the model’s RLHF “personality” first
  • Accept the envelope boundaries as hard constraints

What I Learned

The debate isn’t “do prompts matter?” It’s “how do prompts interact with RLHF?”

My answer: RLHF defines the behavioral envelope. Prompts select operating points within that envelope. Claude Code’s “act on your best judgment” amplifies Claude’s helpfulness training. Codex’s “surgical precision” amplifies GPT’s compliance training.

Both approaches work. Both have trade-offs. The question isn’t which is better - it’s which fits your use case.

For my workflow, I use Claude Code when I want a collaborator who anticipates needs and explores possibilities. I use Codex when I want a precision tool that does exactly what I specify.

Different tools. Different envelopes. Both valuable.


Sources

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments