Why Be Concise Fails with Claude: Use Hard Constraints Instead

Mar 16, 2026

Problem

I typed a simple prompt:

Explain quantum computing. Be concise.

Claude responded with a 47-sentence wall of text. I asked it to be concise, and it gave me a lecture.

This kept happening. “Brief summary” produced three paragraphs. “Quick answer” gave me a full essay. “Keep it short” resulted in something that was definitely not short.

Every “concise” prompt produced verbose output. I was wasting tokens and time.

What Happened?

I found a Reddit discussion about Claude prompting tricks. One tip stood out:

Stop saying “be concise” — use hard limits instead. “Answer in 3 sentences max” or “5 bullet points only.” Vague instructions = verbose answers. Hard constraints work every time.

The comment section confirmed this pattern. Users reported consistent success with numeric limits and consistent failure with vague adjectives.

Let me test this.

Test 1: Vague instruction

User: Explain quantum computing. Be concise.

Claude: Quantum computing is a revolutionary approach to computation that harnesses
the principles of quantum mechanics to process information in fundamentally
different ways than classical computers. Unlike classical bits that can only
exist in states of 0 or 1, quantum computers use quantum bits (qubits) that
can exist in superposition—simultaneously representing both 0 and 1. This,
combined with quantum entanglement and interference, allows quantum computers
to perform certain calculations exponentially faster than classical systems.
Applications include cryptography, drug discovery, optimization problems, and
machine learning. Major tech companies like IBM, Google, and others are racing
to build practical quantum computers, though significant challenges remain in
error correction and qubit stability. The field has seen remarkable progress...
[Response continues for 5 more paragraphs]

The output kept going. “Be concise” meant nothing.

Test 2: Hard constraint

User: Explain quantum computing in exactly 3 sentences.

Claude: Quantum computers use qubits that can exist in multiple states
simultaneously, unlike classical bits limited to 0 or 1. This superposition,
combined with entanglement, enables exponential speedups for specific problems
like cryptography and optimization. Current challenges include maintaining qubit
stability and error correction for practical applications.

Three sentences. Exactly three. No more, no less.

Why does “exactly 3 sentences” work when “be concise” fails?

Why Vague Instructions Fail

“Be concise” fails because it is subjective. LLMs cannot measure “concise.”

No Measurable Definition

What does “concise” mean to an LLM?

One sentence?
One paragraph?
Half a page?
Under 100 words?

The model has seen “concise” used in many contexts. A research paper might call 500 words concise. A tweet is concise at 20 words. A textbook might consider a 5-page summary concise.

Without a measurable definition, the model guesses based on training patterns. And training patterns favor thoroughness. The model is trained to be helpful, and helpful often means complete.

The Helpfulness Trap

LLMs are trained to be helpful. When in doubt, they err on the side of more information.

User: Explain quantum computing. Be concise.

Model thinking: "The user wants to understand quantum computing. A complete
explanation would cover qubits, superposition, entanglement, applications,
and current challenges. I should provide a thorough overview to be helpful."

Result: Long response because "helpful" overrides "concise."

The instruction “be concise” competes with the training objective “be helpful.” Helpful usually wins.

No Stop Condition

Vague instructions give no signal for when to stop:

"Be concise" → When do I stop? After 1 paragraph? 2? 3?
"Keep it short" → How short? What's the boundary?
"Don't be verbose" → Defines what NOT to do, not what to do

The model keeps generating until it feels “done,” which for an LLM often means covering all relevant points.

Why Hard Constraints Work

Numeric constraints work because they are measurable and verifiable.

Measurable

"3 sentences" → Count: 1, 2, 3. Stop.
"5 bullet points" → Count: 1, 2, 3, 4, 5. Stop.
"100 words max" → Count words. Stop at 100.

The model can track and verify these limits during generation.

Consistent Results

Same input, same constraint, same output length every time:

User: List the benefits of exercise in exactly 5 bullet points.

Claude: Here are 5 benefits of exercise:
1. Improved cardiovascular health
2. Better weight management
3. Enhanced mood and mental health
4. Stronger muscles and bones
5. Better sleep quality

Exactly 5. Not 4, not 6. Every time.

Constraint Types Ranked by Reliability

I tested different constraint types. Here is what works best:

+-------------------+------------+-------------------------------------------+
| Constraint Type   | Reliability| Notes                                     |
+-------------------+------------+-------------------------------------------+
| Bullet/number    | Highest    | "5 bullet points" - model counts items    |
| limits            |            | during generation                         |
+-------------------+------------+-------------------------------------------+
| Sentence limits   | High       | "3 sentences" - model tracks periods      |
+-------------------+------------+-------------------------------------------+
| Paragraph limits  | Medium     | "2 paragraphs" - paragraphs are flexible  |
+-------------------+------------+-------------------------------------------+
| Word limits       | Low        | "100 words" - models struggle to count    |
|                   |            | words accurately during generation         |
+-------------------+------------+-------------------------------------------+

Word limits are least reliable because models generate tokens, not words, and counting during generation is imprecise. A “100 word” limit often produces 85-120 words.

Common Mistakes

I made these mistakes. Maybe you will too.

Mistake 1: Using Relative Terms

❌ "Be brief"
   → Brief compared to what? A tweet? A chapter? A sentence?

❌ "Keep it short"
   → How short is short? 50 words? 100 words?

❌ "Don't write too much"
   → Defines what NOT to do, gives no target

Mistake 2: Mixing Vague and Specific

❌ "Be concise and answer in 3 sentences."
   → Redundant and confusing
   → Model may try to satisfy both, creating conflict

✅ "Answer in 3 sentences."
   → One clear constraint, no ambiguity

Mistake 3: Wrong Constraint Type for the Task

❌ "Explain in 50 words or less."
   → Word counting is unreliable during generation

✅ "Explain in 3 sentences."
   → Sentence counting is reliable

✅ "Explain in 5 bullet points."
   → Bullet counting is most reliable

Mistake 4: Leaving Format Unspecified

❌ "Answer in 3 sentences."
   → Might give 3 very long sentences

✅ "Answer in 3 short sentences, each under 15 words."
   → Constrains both count and length

Practical Prompting Patterns

I now use these patterns consistently:

PATTERNS = {
    # Quick answers
    "quick_fact": "Answer in 1-2 sentences.",

    # Summaries
    "summary": "Summarize in exactly 3 bullet points.",

    # Comparisons
    "comparison": "Compare X and Y in exactly 4 bullet points.",

    # Explanations
    "explanation": "Explain in 3 short sentences maximum.",

    # Lists
    "list": "List exactly 5 key points, no more.",

    # Deep dive (when you actually want detail)
    "detailed": "Explain thoroughly with 5-7 paragraphs.",
}

Notice even “detailed” uses numbers. When I want length, I specify how much length.

Why This Matters Beyond Chat

Token costs add up. If every response is 2x longer than needed because of vague prompts:

API costs double
Response times increase
Context windows fill faster
Downstream parsing becomes harder

For automated workflows, predictable output length matters:

# BAD: Unpredictable length
response = client.messages.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": f"Summarize this article. Be concise.\n\n{article}"}]
)
# Response could be 100 words or 500 words

# GOOD: Predictable length
response = client.messages.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": f"Summarize this article in exactly 3 bullet points.\n\n{article}"}]
)
# Response will have exactly 3 bullet points

Summary

Vague instructions like “be concise” fail because they lack measurable boundaries. LLMs cannot objectively measure “conciseness” — they can measure “3 sentences.”

The Reddit commenter who said all prompt tips boil down to “give more detailed prompts” was right. Specificity beats ambiguity. When you want shorter responses, specify exactly how short using numbers.

The pattern is simple:

❌ Vague: "Be [adjective]"
✅ Specific: "[Action] in [number] [unit]"

Replace subjective words with countable constraints. Your prompts will work every time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: 6 Claude Prompting Tricks I Wish I Knew on Day One
👨‍💻 Anthropic: Prompt Engineering Guide
👨‍💻 OpenAI: Prompt Engineering Best Practices

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!