Single Agent vs Multi-Agent Orchestration: When to Choose?

Mar 24, 2026

Problem

I built a multi-agent content creation system with three agents: a researcher, a writer, and a reviewer. It seemed sophisticated. It looked like what the papers described. It cost me $150 in API calls in the first week and produced worse output than my old single-agent prompt.

Here’s what I saw:

Researcher Agent: Found 15 sources on the topic...

Writer Agent: Based on research, here's the article...

Reviewer Agent: The article needs more detail on points 3, 7, and 12...

Writer Agent: I've added more detail...

Reviewer Agent: Still missing context on point 7...

[15 more back-and-forth messages]

Final output: Generic article with hallucinated facts
Total cost: $4.32 for one article
Total time: 45 seconds

My old single-agent approach produced better articles in 5 seconds for $0.15.

What happened?

I fell into the classic trap: I assumed complex tasks require complex systems.

The multi-agent architecture added:

3x more LLM calls
Context loss between agents
Information bleeding (researcher found facts that writer forgot)
Hallucinations from the reviewer making up “missing” details
Higher latency and cost

Here’s my original multi-agent setup:

from langgraph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI

class ContentState(MessagesState):
    research: str
    draft: str
    review_comments: list

def researcher(state: ContentState):
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke([
        {"role": "system", "content": "You are a research agent. Find relevant information."},
        {"role": "user", "content": f"Research: {state['topic']}"}
    ])
    return {"research": response.content}

def writer(state: ContentState):
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke([
        {"role": "system", "content": "You are a writer. Create content based on research."},
        {"role": "user", "content": f"Research: {state['research']}\n\nWrite article."}
    ])
    return {"draft": response.content}

def reviewer(state: ContentState):
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke([
        {"role": "system", "content": "You are a reviewer. Check for quality."},
        {"role": "user", "content": f"Review: {state['draft']}"}
    ])
    return {"review_comments": [response.content]}

workflow = StateGraph(ContentState)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)

workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
# Reviewer can loop back to writer for revisions...

app = workflow.compile()

This looks professional. It follows the patterns I read about. But it’s over-engineered for my use case.

The uncomfortable truth

A Reddit thread with 25+ production agent deployments revealed something striking:

“Every agent you add is a new failure point. Every handoff is where context dies.”

The profitable examples all used single-agent architecture:

System	Revenue	Architecture
Email-to-CRM updater	$200/month	Single agent
Resume parser	$50/seat	Single agent
Invoice extractor	$500/month	Single agent

These systems share a common pattern: OpenAI API + n8n (or similar), one tight prompt with examples.

Meanwhile, multi-agent systems often produce:

- Researcher finds facts
- Writer forgets them (context loss)
- Reviewer hallucinates missing details
- Final output: worse than single agent
- Cost: 3-5x higher
- Latency: 3-10x slower

When to use single agent

After testing both approaches extensively, I found clear patterns.

Single agent works when:

Task has a clear, focused objective
Input/output format is well-defined
Complete task fits in one context window
Examples can demonstrate expected behavior
Task decomposes into sequential steps within one prompt

Here’s my production-proven single-agent pattern:

from openai import OpenAI
from pydantic import BaseModel

class CRMEntry(BaseModel):
    contact_name: str
    company: str
    email: str
    action_required: bool
    notes: str

def email_to_crm(email_body: str) -> CRMEntry:
    """Single agent handles complete transformation."""
    client = OpenAI()

    # One tight prompt with examples
    prompt = f"""
    Extract CRM entry from email. Format as JSON.

    Examples:
    Email: "Hi, I'm John from TechCorp. Need pricing for 100 seats."
    Output: {{"contact_name": "John", "company": "TechCorp", "email": null, "action_required": true, "notes": "Pricing request for 100 seats"}}

    Email: {email_body}
    Output:
    """

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return CRMEntry.model_validate_json(response.choices[0].message.content)

# Usage: One function call, one LLM call, complete task
entry = email_to_crm(incoming_email)

Cost comparison:

Single agent:
- 1 LLM call
- ~1,500 tokens
- $0.015 per request
- 2-3 seconds latency

Multi-agent (researcher/writer/reviewer):
- 3+ LLM calls
- ~8,000 tokens
- $0.08 per request
- 10-15 seconds latency

When multi-agent is justified

Multi-agent isn’t wrong. It’s overused. Here’s when it actually makes sense:

1. Separate context windows needed

When different expertise domains require their own context, multiple agents prevent context bloat:

from langgraph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI

class CodeAnalysisState(MessagesState):
    codebase: dict
    security_issues: list
    performance_issues: list
    style_issues: list

def security_analyzer(state: CodeAnalysisState):
    """Specialized agent - security domain context."""
    llm = ChatOpenAI(model="gpt-4")
    # Full context for security reasoning
    # No dilution from other concerns
    ...

def performance_analyzer(state: CodeAnalysisState):
    """Specialized agent - performance domain context."""
    llm = ChatOpenAI(model="gpt-4")
    # Focused on performance patterns
    # Separate context avoids noise
    ...

def style_checker(state: CodeAnalysisState):
    """Specialized agent - style domain."""
    llm = ChatOpenAI(model="gpt-4")
    # Only cares about conventions
    ...

workflow = StateGraph(CodeAnalysisState)
workflow.add_node("security", security_analyzer)
workflow.add_node("performance", performance_analyzer)
workflow.add_node("style", style_checker)

# Parallel execution for speed
workflow.add_edge("security", "aggregate")
workflow.add_edge("performance", "aggregate")
workflow.add_edge("style", "aggregate")

Why this works:

Each agent has distinct expertise
Parallel execution provides speedup
Context windows aren’t wasted on irrelevant details

2. Parallel processing provides speedup

When tasks can run independently, parallel agents save time:

Sequential (single agent):
Task A: 3s -> Task B: 3s -> Task C: 3s = 9s total

Parallel (multi-agent):
Task A: 3s
Task B: 3s  } = 3s total (all complete together)
Task C: 3s

3. Context window limits force separation

For large codebase analysis, one agent can’t fit everything:

Single agent analyzing 50 files:
- Total tokens: 200,000
- Context limit: 128,000
- Result: Truncation, missed issues

Multi-agent with file distribution:
- Agent 1: Files 1-10 (40,000 tokens)
- Agent 2: Files 11-20 (40,000 tokens)
- Agent 3: Files 21-30 (40,000 tokens)
- Result: Complete coverage

4. Complex workflows with branching logic

When business processes have conditional paths:

def route_request(state):
    """Route to different agents based on request type."""
    if state["request_type"] == "refund":
        return "refund_agent"
    elif state["request_type"] == "technical":
        return "support_agent"
    else:
        return "general_agent"

workflow.add_conditional_edges("router", route_request)

Decision framework

I created this decision tree:

START: What's your task complexity?
|
+-- Single, focused objective?
|   |
|   +-- YES --> Does it fit in one context window?
|   |   |
|   |   +-- YES --> SINGLE AGENT
|   |   |
|   |   +-- NO --> Can you chunk the input?
|   |       |
|   |       +-- YES --> SINGLE AGENT (chunked)
|   |       +-- NO --> MULTI-AGENT (by necessity)
|   |
|   +-- NO --> Do tasks need different expertise?
|       |
|       +-- YES --> Will agents run in parallel?
|       |   |
|       |   +-- YES --> MULTI-AGENT (justified)
|       |   +-- NO --> Try single agent with tool use first
|       |
|       +-- NO --> SINGLE AGENT (with better prompt)

Common mistakes I made

Mistake 1: Starting with multi-agent

My first instinct was “this is complex, I need multiple agents.” Wrong. I should have started simple:

# WRONG: Start with multi-agent
workflow = StateGraph(ComplexState)
workflow.add_node("researcher", ...)
workflow.add_node("writer", ...)
workflow.add_node("reviewer", ...)

# RIGHT: Start with single agent, add complexity only if needed
def single_agent_task(input):
    return llm.invoke(f"""
    You are an expert content creator.
    Research and write about: {input}

    Step 1: Identify key points
    Step 2: Write comprehensive content
    Step 3: Self-review for gaps
    Output the final article.
    """)

Mistake 2: Copying research paper architectures

Academic papers showcase multi-agent research because that’s what gets published. Production systems need reliability:

Academic paper:
- Novel architecture
- Complex agent interactions
- Published for contribution

Production system:
- Reliable execution
- Minimal failure points
- Built for users, not reviewers

Mistake 3: Framework-driven design

LangGraph and CrewAI make orchestration easy. But easy doesn’t mean appropriate:

# CrewAI makes this easy to write:
crew = Crew(
    agents=[researcher, writer, editor, reviewer, publisher],
    tasks=[...]
)

# But that's 5 failure points, 5x the cost, and 5x the latency
# A single well-prompted agent might do better

Mistake 4: Ignoring context loss

Every agent handoff loses information:

Agent 1 receives: "Write about Python async programming, focus on FastAPI, include code examples, target senior developers"

Agent 1 output to Agent 2: "Here's the research on async programming..."

Agent 2 receives: Lost "focus on FastAPI" and "target senior developers"
Agent 2 output: Generic async article

Result: User intent degraded with each handoff

Reliability impact

The math is brutal:

Assume each agent has 95% success rate:

Single agent: 95% success

3-agent pipeline: 0.95 x 0.95 x 0.95 = 85.7% success
5-agent pipeline: 0.95^5 = 77.4% success

Add retry logic?
3-agent with 2 retries each: Much more complex, higher cost

Cost impact in production

I tracked costs over a month:

Content creation system (10 articles/day):

Single agent:
- 10 articles x 30 days = 300 articles
- ~$0.15 per article
- Total: ~$45/month

Multi-agent (researcher/writer/reviewer):
- 300 articles
- ~$0.50 per article (with retries)
- Total: ~$150/month

Difference: $105/month = $1,260/year

Plus: Multi-agent had more errors requiring human fixes

Summary

I started this journey thinking multi-agent was “more advanced.” I was wrong. Single agents with well-crafted prompts outperform multi-agent systems for most production workloads.

Key takeaways:

Start with a single agent and a tight prompt
Add multi-agent complexity only when forced by context limits, parallel processing needs, or distinct domain expertise
Every agent is a failure point; every handoff loses context
“Boring AI” scales better than sophisticated architecture

Audit your current systems:

Can you consolidate your agents into a single, better-prompted agent? Try it. Measure the impact on cost, latency, and reliability. You might be surprised.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Why single agent wins in production
👨‍💻 LangGraph Documentation
👨‍💻 CrewAI Framework
👨‍💻 Goodhart's Law

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!