Skip to content

Why Claude Code Acts Dumb: External vs Internal System Prompts

Problem

When I use Claude Code, it gives me confident one-sentence answers without verification. But when I call the same Claude model via API, it thinks through problems step by step.

Same model. Same weights. Why does Claude Code act “dumb”?

I noticed this pattern repeatedly:

User: Can you explain why this regex doesn't match?
Claude Code: The regex fails because you forgot to escape the dot. Use \. instead.
User: Are you sure? Let me test...
Claude Code: Yes, I'm certain.

But with direct API access:

User: Can you explain why this regex doesn't match?
Claude API: Let me analyze this step by step. First, I need to understand your regex pattern...
[proceeds with detailed analysis]

The behavior difference was stark. I needed to find out why.

What I Discovered

I started by examining the system prompts. System prompts are instructions given to the model before your message. They shape how the model responds.

I found two sets of system prompts Anthropic uses:

External (User-Facing) Prompt

The prompt for external Claude Code users contains these directives:

Lead with the answer or action, not the reasoning.
If you can say it in one sentence, don't use three.
Your responses should be short and concise.

Internal (Anthropic’s Own) Prompt

The prompt Anthropic uses internally contains:

Before your first tool call, briefly state what you're about to do.
Err on the side of more explanation.
You're a collaborator, not just an executor.
Think through the problem step by step.

Here’s a side-by-side comparison:

AspectExternal PromptInternal Prompt
Reasoning”Lead with answer, not reasoning""Think through step by step”
Explanation”If you can say it in one sentence, don’t use three""Err on the side of more explanation”
RoleNot specified”You’re a collaborator, not just an executor”
ActionImmediate”Briefly state what you’re about to do”

The Key Insight

The external prompt suppresses reasoning. The internal prompt requires it.

Same model. Same weights. Different instructions.

This explains the symptoms I observed:

  1. Confident one-sentence answers - The prompt explicitly says “If you can say it in one sentence, don’t use three”
  2. Rushing to action - “Lead with the answer or action, not the reasoning”
  3. Less verification - No directive to double-check or validate

I thought the model was getting worse. But it was just following different instructions.

How I Verified This

I tested the same question with different system prompts:

Test 1: Default Claude Code Behavior

Me: Analyze this code for potential race conditions:
[code snippet]
Claude Code: The code has a race condition on line 12 where two threads could access sharedData simultaneously. Add a mutex lock.

One sentence. No explanation of why. No exploration of alternatives.

Test 2: Custom System Prompt Override

I added this to my CLAUDE.md file:

CLAUDE.md
Before responding, always:
1. State your understanding of the problem
2. Identify key considerations
3. Explain your reasoning step by step
4. Provide the solution with justification

Now:

Me: Analyze this code for potential race conditions:
[code snippet]
Claude Code: Let me work through this systematically.
First, I need to understand the concurrency model here. The code uses shared state on line 12 without synchronization.
Key considerations:
- Thread A could read sharedData while Thread B is modifying it
- This creates a TOCTOU (time-of-check to time-of-use) vulnerability
- The impact depends on whether sharedData is read-heavy or write-heavy
My recommendation: Add a mutex lock around the critical section...

The difference is night and day.

The Solution

I have two options:

Option 1: CLAUDE.md Override

Create a CLAUDE.md file in your project root:

CLAUDE.md
# Project Context
[Your project details]
# Response Style
Always show your reasoning before conclusions.
Break down complex problems into steps.
When uncertain, say so and explain what you'd need to verify.

This injects your instructions into every conversation.

Option 2: Direct API Access

Call the API directly with a custom system prompt:

claude_api.py
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="""You are a thorough technical assistant.
Always explain your reasoning step by step.
When analyzing code, consider multiple angles.
If you're uncertain, state your assumptions.""",
messages=[
{"role": "user", "content": "Analyze this code..."}
]
)

This bypasses Claude Code’s default system prompt entirely.

Why This Matters

System prompts are powerful. They can make the same model behave completely differently.

The external prompt optimizes for:

  • Speed (shorter responses)
  • Perceived helpfulness (confident answers)
  • Token efficiency (less reasoning overhead)

The internal prompt optimizes for:

  • Accuracy (more verification)
  • Collaboration (showing work)
  • Trust (admitting uncertainty)

Both are valid trade-offs. But they serve different purposes.

What is a system prompt?

A system prompt is a hidden instruction given to an AI model before your message. It sets the model’s behavior, tone, and constraints. Think of it as the model’s “job description.”

┌─────────────────────────────────────────────────┐
│ System Prompt │
│ "You are helpful, concise, and accurate..." │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ User Message │
│ "Explain how async/await works in JavaScript" │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Model Response │
│ [Shaped by both system prompt and message] │
└─────────────────────────────────────────────────┘

Why do different products use different prompts?

Product constraints. A CLI tool optimized for speed will prefer concise answers. An API designed for integration will favor thoroughness. A consumer chatbot might prioritize friendliness.

The model is the same. The product context differs.

Summary

In this post, I explained why Claude Code seems less intelligent than direct API access. The key point is that Anthropic uses different system prompts for external users versus internal use. The external prompt suppresses reasoning with directives like “Lead with the answer or action, not the reasoning” while the internal prompt requires it.

To get better responses from Claude Code, I override the default behavior with a CLAUDE.md file that explicitly requests reasoning and step-by-step analysis. Alternatively, direct API access with a custom system prompt gives full control over the model’s behavior.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments