Skip to content

Single Agent vs Multi-Agent Orchestration: When to Choose?

Problem

I built a multi-agent content creation system with three agents: a researcher, a writer, and a reviewer. It seemed sophisticated. It looked like what the papers described. It cost me $150 in API calls in the first week and produced worse output than my old single-agent prompt.

Here’s what I saw:

multi-agent-output.txt
Researcher Agent: Found 15 sources on the topic...
Writer Agent: Based on research, here's the article...
Reviewer Agent: The article needs more detail on points 3, 7, and 12...
Writer Agent: I've added more detail...
Reviewer Agent: Still missing context on point 7...
[15 more back-and-forth messages]
Final output: Generic article with hallucinated facts
Total cost: $4.32 for one article
Total time: 45 seconds

My old single-agent approach produced better articles in 5 seconds for $0.15.

What happened?

I fell into the classic trap: I assumed complex tasks require complex systems.

The multi-agent architecture added:

  • 3x more LLM calls
  • Context loss between agents
  • Information bleeding (researcher found facts that writer forgot)
  • Hallucinations from the reviewer making up “missing” details
  • Higher latency and cost

Here’s my original multi-agent setup:

multi_agent_v1.py
from langgraph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI
class ContentState(MessagesState):
research: str
draft: str
review_comments: list
def researcher(state: ContentState):
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke([
{"role": "system", "content": "You are a research agent. Find relevant information."},
{"role": "user", "content": f"Research: {state['topic']}"}
])
return {"research": response.content}
def writer(state: ContentState):
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke([
{"role": "system", "content": "You are a writer. Create content based on research."},
{"role": "user", "content": f"Research: {state['research']}\n\nWrite article."}
])
return {"draft": response.content}
def reviewer(state: ContentState):
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke([
{"role": "system", "content": "You are a reviewer. Check for quality."},
{"role": "user", "content": f"Review: {state['draft']}"}
])
return {"review_comments": [response.content]}
workflow = StateGraph(ContentState)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
# Reviewer can loop back to writer for revisions...
app = workflow.compile()

This looks professional. It follows the patterns I read about. But it’s over-engineered for my use case.

The uncomfortable truth

A Reddit thread with 25+ production agent deployments revealed something striking:

“Every agent you add is a new failure point. Every handoff is where context dies.”

The profitable examples all used single-agent architecture:

SystemRevenueArchitecture
Email-to-CRM updater$200/monthSingle agent
Resume parser$50/seatSingle agent
Invoice extractor$500/monthSingle agent

These systems share a common pattern: OpenAI API + n8n (or similar), one tight prompt with examples.

Meanwhile, multi-agent systems often produce:

multi-agent-failures.txt
- Researcher finds facts
- Writer forgets them (context loss)
- Reviewer hallucinates missing details
- Final output: worse than single agent
- Cost: 3-5x higher
- Latency: 3-10x slower

When to use single agent

After testing both approaches extensively, I found clear patterns.

Single agent works when:

  1. Task has a clear, focused objective
  2. Input/output format is well-defined
  3. Complete task fits in one context window
  4. Examples can demonstrate expected behavior
  5. Task decomposes into sequential steps within one prompt

Here’s my production-proven single-agent pattern:

single_agent.py
from openai import OpenAI
from pydantic import BaseModel
class CRMEntry(BaseModel):
contact_name: str
company: str
email: str
action_required: bool
notes: str
def email_to_crm(email_body: str) -> CRMEntry:
"""Single agent handles complete transformation."""
client = OpenAI()
# One tight prompt with examples
prompt = f"""
Extract CRM entry from email. Format as JSON.
Examples:
Email: "Hi, I'm John from TechCorp. Need pricing for 100 seats."
Output: {{"contact_name": "John", "company": "TechCorp", "email": null, "action_required": true, "notes": "Pricing request for 100 seats"}}
Email: {email_body}
Output:
"""
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
return CRMEntry.model_validate_json(response.choices[0].message.content)
# Usage: One function call, one LLM call, complete task
entry = email_to_crm(incoming_email)

Cost comparison:

cost-comparison.txt
Single agent:
- 1 LLM call
- ~1,500 tokens
- $0.015 per request
- 2-3 seconds latency
Multi-agent (researcher/writer/reviewer):
- 3+ LLM calls
- ~8,000 tokens
- $0.08 per request
- 10-15 seconds latency

When multi-agent is justified

Multi-agent isn’t wrong. It’s overused. Here’s when it actually makes sense:

1. Separate context windows needed

When different expertise domains require their own context, multiple agents prevent context bloat:

multi_agent_justified.py
from langgraph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI
class CodeAnalysisState(MessagesState):
codebase: dict
security_issues: list
performance_issues: list
style_issues: list
def security_analyzer(state: CodeAnalysisState):
"""Specialized agent - security domain context."""
llm = ChatOpenAI(model="gpt-4")
# Full context for security reasoning
# No dilution from other concerns
...
def performance_analyzer(state: CodeAnalysisState):
"""Specialized agent - performance domain context."""
llm = ChatOpenAI(model="gpt-4")
# Focused on performance patterns
# Separate context avoids noise
...
def style_checker(state: CodeAnalysisState):
"""Specialized agent - style domain."""
llm = ChatOpenAI(model="gpt-4")
# Only cares about conventions
...
workflow = StateGraph(CodeAnalysisState)
workflow.add_node("security", security_analyzer)
workflow.add_node("performance", performance_analyzer)
workflow.add_node("style", style_checker)
# Parallel execution for speed
workflow.add_edge("security", "aggregate")
workflow.add_edge("performance", "aggregate")
workflow.add_edge("style", "aggregate")

Why this works:

  • Each agent has distinct expertise
  • Parallel execution provides speedup
  • Context windows aren’t wasted on irrelevant details

2. Parallel processing provides speedup

When tasks can run independently, parallel agents save time:

parallel-vs-sequential.txt
Sequential (single agent):
Task A: 3s -> Task B: 3s -> Task C: 3s = 9s total
Parallel (multi-agent):
Task A: 3s
Task B: 3s } = 3s total (all complete together)
Task C: 3s

3. Context window limits force separation

For large codebase analysis, one agent can’t fit everything:

context-window-issue.txt
Single agent analyzing 50 files:
- Total tokens: 200,000
- Context limit: 128,000
- Result: Truncation, missed issues
Multi-agent with file distribution:
- Agent 1: Files 1-10 (40,000 tokens)
- Agent 2: Files 11-20 (40,000 tokens)
- Agent 3: Files 21-30 (40,000 tokens)
- Result: Complete coverage

4. Complex workflows with branching logic

When business processes have conditional paths:

branching_workflow.py
def route_request(state):
"""Route to different agents based on request type."""
if state["request_type"] == "refund":
return "refund_agent"
elif state["request_type"] == "technical":
return "support_agent"
else:
return "general_agent"
workflow.add_conditional_edges("router", route_request)

Decision framework

I created this decision tree:

decision-framework.txt
START: What's your task complexity?
|
+-- Single, focused objective?
| |
| +-- YES --> Does it fit in one context window?
| | |
| | +-- YES --> SINGLE AGENT
| | |
| | +-- NO --> Can you chunk the input?
| | |
| | +-- YES --> SINGLE AGENT (chunked)
| | +-- NO --> MULTI-AGENT (by necessity)
| |
| +-- NO --> Do tasks need different expertise?
| |
| +-- YES --> Will agents run in parallel?
| | |
| | +-- YES --> MULTI-AGENT (justified)
| | +-- NO --> Try single agent with tool use first
| |
| +-- NO --> SINGLE AGENT (with better prompt)

Common mistakes I made

Mistake 1: Starting with multi-agent

My first instinct was “this is complex, I need multiple agents.” Wrong. I should have started simple:

better_approach.py
# WRONG: Start with multi-agent
workflow = StateGraph(ComplexState)
workflow.add_node("researcher", ...)
workflow.add_node("writer", ...)
workflow.add_node("reviewer", ...)
# RIGHT: Start with single agent, add complexity only if needed
def single_agent_task(input):
return llm.invoke(f"""
You are an expert content creator.
Research and write about: {input}
Step 1: Identify key points
Step 2: Write comprehensive content
Step 3: Self-review for gaps
Output the final article.
""")

Mistake 2: Copying research paper architectures

Academic papers showcase multi-agent research because that’s what gets published. Production systems need reliability:

academic-vs-production.txt
Academic paper:
- Novel architecture
- Complex agent interactions
- Published for contribution
Production system:
- Reliable execution
- Minimal failure points
- Built for users, not reviewers

Mistake 3: Framework-driven design

LangGraph and CrewAI make orchestration easy. But easy doesn’t mean appropriate:

framework_trap.py
# CrewAI makes this easy to write:
crew = Crew(
agents=[researcher, writer, editor, reviewer, publisher],
tasks=[...]
)
# But that's 5 failure points, 5x the cost, and 5x the latency
# A single well-prompted agent might do better

Mistake 4: Ignoring context loss

Every agent handoff loses information:

context-loss.txt
Agent 1 receives: "Write about Python async programming, focus on FastAPI, include code examples, target senior developers"
Agent 1 output to Agent 2: "Here's the research on async programming..."
Agent 2 receives: Lost "focus on FastAPI" and "target senior developers"
Agent 2 output: Generic async article
Result: User intent degraded with each handoff

Reliability impact

The math is brutal:

failure-math.txt
Assume each agent has 95% success rate:
Single agent: 95% success
3-agent pipeline: 0.95 x 0.95 x 0.95 = 85.7% success
5-agent pipeline: 0.95^5 = 77.4% success
Add retry logic?
3-agent with 2 retries each: Much more complex, higher cost

Cost impact in production

I tracked costs over a month:

monthly-costs.txt
Content creation system (10 articles/day):
Single agent:
- 10 articles x 30 days = 300 articles
- ~$0.15 per article
- Total: ~$45/month
Multi-agent (researcher/writer/reviewer):
- 300 articles
- ~$0.50 per article (with retries)
- Total: ~$150/month
Difference: $105/month = $1,260/year
Plus: Multi-agent had more errors requiring human fixes

Summary

I started this journey thinking multi-agent was “more advanced.” I was wrong. Single agents with well-crafted prompts outperform multi-agent systems for most production workloads.

Key takeaways:

  1. Start with a single agent and a tight prompt
  2. Add multi-agent complexity only when forced by context limits, parallel processing needs, or distinct domain expertise
  3. Every agent is a failure point; every handoff loses context
  4. “Boring AI” scales better than sophisticated architecture

Audit your current systems:

Can you consolidate your agents into a single, better-prompted agent? Try it. Measure the impact on cost, latency, and reliability. You might be surprised.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments