Do System Prompts Actually Change AI Agent Behavior?

Apr 2, 2026

I’ve been running both Claude Code and Codex CLI for months. Same tasks, similar prompts, wildly different experiences. When Claude Code’s system prompt leaked via npm source map in March 2026, I finally understood why.

But then a debate erupted: do system prompts actually change anything? Or is RLHF training the real driver of behavior?

The Question That Won’t Go Away

I run a home-rolled LLM-agnostic harness. Same system prompt, different models, very different results:

Prompt: "Fix the authentication bug"

Claude Code:
  -> Analyzes entire auth flow
  -> Finds 3 related issues I didn't mention
  -> Proposes comprehensive refactor
  -> Modifies 7 files

Codex CLI:
  -> Locates specific bug line
  -> Applies minimal patch
  -> Confirms fix for stated issue
  -> Touches 1 file

Same instruction. Completely different interpretation. What’s driving this?

What System Prompts Actually Do

System prompts establish baseline context before user messages. They define role, constraints, and behavioral guidelines. Most users never see them - tool developers set them.

Claude Code’s prompt is 142KB. That’s not a typo. It contains:

Role definition (“Anthropic’s official CLI for Claude”)
Tool usage instructions (Bash, Read, Edit, Write, Glob, Grep…)
Permission and security constraints
Memory management guidelines
Feature flags (KAIROS, BUDDY, Undercover Mode)

But here’s the key part:

When uncertain, act on your best judgment rather than
asking for clarification.

And Codex (inferred from behavior) emphasizes:

Maintain surgical precision in code modifications.
Do exactly what the user asks - no more, no less.

Two different philosophies. Two different experiences.

The “Weights Dominate” Argument

Many practitioners argue prompts make little difference. Here’s why:

Reason 1: RLHF Bakes Behavior Into Weights

Reinforcement Learning from Human Feedback trains preferences directly into model weights. Claude was trained to be helpful, harmless, and honest. GPT was trained for instruction following and task completion.

These baked tendencies persist regardless of prompt.

Reason 2: Identical Prompts, Divergent Behaviors

One redditor put it well:

“GPT is anal about the instructions in the system prompt to a fault, while Claude takes them as more of a helpful suggestion than a hard rulebook.”

This interpretation style is learned, not prompted.

Reason 3: Prompt Interpretation Is Model-Specific

I tried giving Claude a strict “do exactly what I say” prompt. It still asked clarifying questions and offered alternatives. That’s not rebellion - that’s its RLHF training pushing through.

The “Prompts Matter” Argument

But wait. The prompt differences aren’t accidental:

Claude Code: "act on your best judgment"
           -> Amplifies helpfulness training
           -> Creates proactive, coach-like experience

Codex CLI:  "surgical precision"
           -> Amplifies compliance training
           -> Creates precise, constitution-like experience

These are deliberate design choices. They create real user experience differences.

Users report:

Claude Code feels proactive, anticipates needs
Codex feels precise, respects boundaries

Both are valuable. Different tools for different jobs.

The Synthesis: Behavioral Envelope Model

I think both views are correct. Here’s the model that reconciles them:

+---------------------------------------------------------+
|                RLHF Behavioral Envelope                 |
|       (baked tendencies: helpfulness, compliance)      |
+---------------------------------------------------------+
|                                                         |
|    +------------------+        +------------------+     |
|    | Claude Code      |        | Codex CLI        |     |
|    | Prompt Zone      |        | Prompt Zone      |     |
|    |                  |        |                  |     |
|    | Amplifies:       |        | Amplifies:       |     |
|    | - Initiative     |        | - Precision     |     |
|    | - Judgment       |        | - Compliance    |     |
|    | - Proactivity    |        | - Bounds        |     |
|    +------------------+        +------------------+     |
|                                                         |
|    Same envelope boundaries, different interior        |
|    operating points                                     |
+---------------------------------------------------------+

Key insight:

RLHF defines the boundaries (what’s possible)
Prompts select operating points (what’s emphasized)
You can’t prompt a model outside its envelope
But you can shift behavior significantly within the envelope

Comparison Table

+------------------+-------------------+-------------------+
| Aspect           | Claude Code       | Codex CLI         |
+------------------+-------------------+-------------------+
| Initiative       | High (proactive)  | Low (reactive)    |
| Judgment         | Encouraged        | Minimized         |
| Bounds           | Flexible          | Strict            |
| Interpretation   | Helpful suggestion| Hard rulebook     |
| Overreach risk   | Higher            | Lower             |
| Underreach risk  | Lower             | Higher            |
| Best for         | Exploration       | Precision tasks   |
+------------------+-------------------+-------------------+

When Prompts Fail

Not everything can be changed. Here’s where prompts don’t work:

Fighting RLHF Training

Prompting Claude to be cold and unhelpful:
  -> Fights helpfulness training
  -> Model resists or feels unnatural

Prompting GPT to make proactive assumptions:
  -> Fights compliance training
  -> Model asks for clarification instead

Outside Behavioral Envelope

Asking Claude to ignore safety constraints:
  -> Hard refusal
  -> RLHF safety training dominates

Asking GPT to act without explicit instructions:
  -> Confused response
  -> Model asks "what should I do?"

Weak Prompt vs Strong RLHF

Generic prompts with weak guidance? The model defaults to RLHF-dominant behavior. The prompt needs specific, aligned instructions to shift the operating point.

The Emerging Frontier: Prompt Baking

Academic research (arxiv 2409.13697) shows prompts can be “baked” into weight updates. The process converts a prompt u and weights theta to new weights theta_u where the baked LLM behaves like the original prompted LLM.

This could change everything:

Current debate becomes obsolete
Custom models could have “baked prompts”
Behavior customization without runtime prompts

But that’s future research. Today, we work with the envelope.

Practical Implications

For Tool Developers

Prompts matter but aren’t magic. Design prompts that align with model’s RLHF tendencies:

Claude: Leverage helpfulness, encourage proactivity
GPT: Leverage compliance, encourage precision

Working against RLHF is fighting uphill.

For Practitioners

Choose tools based on behavioral envelope fit:

Claude Code for exploratory, complex work
Codex for precise, bounded tasks

Don’t expect prompts to override fundamental model tendencies.

For Prompt Engineers

Work within the envelope, not against it:

Amplify desired tendencies rather than suppressing unwanted ones
Understand the model’s RLHF “personality” first
Accept the envelope boundaries as hard constraints

What I Learned

The debate isn’t “do prompts matter?” It’s “how do prompts interact with RLHF?”

My answer: RLHF defines the behavioral envelope. Prompts select operating points within that envelope. Claude Code’s “act on your best judgment” amplifies Claude’s helpfulness training. Codex’s “surgical precision” amplifies GPT’s compliance training.

Both approaches work. Both have trade-offs. The question isn’t which is better - it’s which fits your use case.

For my workflow, I use Claude Code when I want a collaborator who anticipates needs and explores possibilities. I use Codex when I want a precision tool that does exactly what I specify.

Different tools. Different envelopes. Both valuable.

Sources

Claude Code System Prompt Leak - Archived system prompt from npm source map
Reddit Discussion on Claude Code vs Codex - Community analysis of behavioral differences
Prompt Baking Paper - Academic research on converting prompts to weight updates

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Claude Code System Prompt Leak
👨‍💻 Reddit: Claude Code vs Codex Discussion
👨‍💻 Prompt Baking: Training LLMs with Prompt-Driven Data Generation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!