Single Agent vs Multi-Agent Orchestration: When to Choose?
Problem
I built a multi-agent content creation system with three agents: a researcher, a writer, and a reviewer. It seemed sophisticated. It looked like what the papers described. It cost me $150 in API calls in the first week and produced worse output than my old single-agent prompt.
Here’s what I saw:
Researcher Agent: Found 15 sources on the topic...
Writer Agent: Based on research, here's the article...
Reviewer Agent: The article needs more detail on points 3, 7, and 12...
Writer Agent: I've added more detail...
Reviewer Agent: Still missing context on point 7...
[15 more back-and-forth messages]
Final output: Generic article with hallucinated factsTotal cost: $4.32 for one articleTotal time: 45 secondsMy old single-agent approach produced better articles in 5 seconds for $0.15.
What happened?
I fell into the classic trap: I assumed complex tasks require complex systems.
The multi-agent architecture added:
- 3x more LLM calls
- Context loss between agents
- Information bleeding (researcher found facts that writer forgot)
- Hallucinations from the reviewer making up “missing” details
- Higher latency and cost
Here’s my original multi-agent setup:
from langgraph import StateGraph, MessagesStatefrom langchain_openai import ChatOpenAI
class ContentState(MessagesState): research: str draft: str review_comments: list
def researcher(state: ContentState): llm = ChatOpenAI(model="gpt-4") response = llm.invoke([ {"role": "system", "content": "You are a research agent. Find relevant information."}, {"role": "user", "content": f"Research: {state['topic']}"} ]) return {"research": response.content}
def writer(state: ContentState): llm = ChatOpenAI(model="gpt-4") response = llm.invoke([ {"role": "system", "content": "You are a writer. Create content based on research."}, {"role": "user", "content": f"Research: {state['research']}\n\nWrite article."} ]) return {"draft": response.content}
def reviewer(state: ContentState): llm = ChatOpenAI(model="gpt-4") response = llm.invoke([ {"role": "system", "content": "You are a reviewer. Check for quality."}, {"role": "user", "content": f"Review: {state['draft']}"} ]) return {"review_comments": [response.content]}
workflow = StateGraph(ContentState)workflow.add_node("researcher", researcher)workflow.add_node("writer", writer)workflow.add_node("reviewer", reviewer)
workflow.set_entry_point("researcher")workflow.add_edge("researcher", "writer")workflow.add_edge("writer", "reviewer")# Reviewer can loop back to writer for revisions...
app = workflow.compile()This looks professional. It follows the patterns I read about. But it’s over-engineered for my use case.
The uncomfortable truth
A Reddit thread with 25+ production agent deployments revealed something striking:
“Every agent you add is a new failure point. Every handoff is where context dies.”
The profitable examples all used single-agent architecture:
| System | Revenue | Architecture |
|---|---|---|
| Email-to-CRM updater | $200/month | Single agent |
| Resume parser | $50/seat | Single agent |
| Invoice extractor | $500/month | Single agent |
These systems share a common pattern: OpenAI API + n8n (or similar), one tight prompt with examples.
Meanwhile, multi-agent systems often produce:
- Researcher finds facts- Writer forgets them (context loss)- Reviewer hallucinates missing details- Final output: worse than single agent- Cost: 3-5x higher- Latency: 3-10x slowerWhen to use single agent
After testing both approaches extensively, I found clear patterns.
Single agent works when:
- Task has a clear, focused objective
- Input/output format is well-defined
- Complete task fits in one context window
- Examples can demonstrate expected behavior
- Task decomposes into sequential steps within one prompt
Here’s my production-proven single-agent pattern:
from openai import OpenAIfrom pydantic import BaseModel
class CRMEntry(BaseModel): contact_name: str company: str email: str action_required: bool notes: str
def email_to_crm(email_body: str) -> CRMEntry: """Single agent handles complete transformation.""" client = OpenAI()
# One tight prompt with examples prompt = f""" Extract CRM entry from email. Format as JSON.
Examples: Email: "Hi, I'm John from TechCorp. Need pricing for 100 seats." Output: {{"contact_name": "John", "company": "TechCorp", "email": null, "action_required": true, "notes": "Pricing request for 100 seats"}}
Email: {email_body} Output: """
response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"} )
return CRMEntry.model_validate_json(response.choices[0].message.content)
# Usage: One function call, one LLM call, complete taskentry = email_to_crm(incoming_email)Cost comparison:
Single agent:- 1 LLM call- ~1,500 tokens- $0.015 per request- 2-3 seconds latency
Multi-agent (researcher/writer/reviewer):- 3+ LLM calls- ~8,000 tokens- $0.08 per request- 10-15 seconds latencyWhen multi-agent is justified
Multi-agent isn’t wrong. It’s overused. Here’s when it actually makes sense:
1. Separate context windows needed
When different expertise domains require their own context, multiple agents prevent context bloat:
from langgraph import StateGraph, MessagesStatefrom langchain_openai import ChatOpenAI
class CodeAnalysisState(MessagesState): codebase: dict security_issues: list performance_issues: list style_issues: list
def security_analyzer(state: CodeAnalysisState): """Specialized agent - security domain context.""" llm = ChatOpenAI(model="gpt-4") # Full context for security reasoning # No dilution from other concerns ...
def performance_analyzer(state: CodeAnalysisState): """Specialized agent - performance domain context.""" llm = ChatOpenAI(model="gpt-4") # Focused on performance patterns # Separate context avoids noise ...
def style_checker(state: CodeAnalysisState): """Specialized agent - style domain.""" llm = ChatOpenAI(model="gpt-4") # Only cares about conventions ...
workflow = StateGraph(CodeAnalysisState)workflow.add_node("security", security_analyzer)workflow.add_node("performance", performance_analyzer)workflow.add_node("style", style_checker)
# Parallel execution for speedworkflow.add_edge("security", "aggregate")workflow.add_edge("performance", "aggregate")workflow.add_edge("style", "aggregate")Why this works:
- Each agent has distinct expertise
- Parallel execution provides speedup
- Context windows aren’t wasted on irrelevant details
2. Parallel processing provides speedup
When tasks can run independently, parallel agents save time:
Sequential (single agent):Task A: 3s -> Task B: 3s -> Task C: 3s = 9s total
Parallel (multi-agent):Task A: 3sTask B: 3s } = 3s total (all complete together)Task C: 3s3. Context window limits force separation
For large codebase analysis, one agent can’t fit everything:
Single agent analyzing 50 files:- Total tokens: 200,000- Context limit: 128,000- Result: Truncation, missed issues
Multi-agent with file distribution:- Agent 1: Files 1-10 (40,000 tokens)- Agent 2: Files 11-20 (40,000 tokens)- Agent 3: Files 21-30 (40,000 tokens)- Result: Complete coverage4. Complex workflows with branching logic
When business processes have conditional paths:
def route_request(state): """Route to different agents based on request type.""" if state["request_type"] == "refund": return "refund_agent" elif state["request_type"] == "technical": return "support_agent" else: return "general_agent"
workflow.add_conditional_edges("router", route_request)Decision framework
I created this decision tree:
START: What's your task complexity?|+-- Single, focused objective?| || +-- YES --> Does it fit in one context window?| | || | +-- YES --> SINGLE AGENT| | || | +-- NO --> Can you chunk the input?| | || | +-- YES --> SINGLE AGENT (chunked)| | +-- NO --> MULTI-AGENT (by necessity)| || +-- NO --> Do tasks need different expertise?| || +-- YES --> Will agents run in parallel?| | || | +-- YES --> MULTI-AGENT (justified)| | +-- NO --> Try single agent with tool use first| || +-- NO --> SINGLE AGENT (with better prompt)Common mistakes I made
Mistake 1: Starting with multi-agent
My first instinct was “this is complex, I need multiple agents.” Wrong. I should have started simple:
# WRONG: Start with multi-agentworkflow = StateGraph(ComplexState)workflow.add_node("researcher", ...)workflow.add_node("writer", ...)workflow.add_node("reviewer", ...)
# RIGHT: Start with single agent, add complexity only if neededdef single_agent_task(input): return llm.invoke(f""" You are an expert content creator. Research and write about: {input}
Step 1: Identify key points Step 2: Write comprehensive content Step 3: Self-review for gaps Output the final article. """)Mistake 2: Copying research paper architectures
Academic papers showcase multi-agent research because that’s what gets published. Production systems need reliability:
Academic paper:- Novel architecture- Complex agent interactions- Published for contribution
Production system:- Reliable execution- Minimal failure points- Built for users, not reviewersMistake 3: Framework-driven design
LangGraph and CrewAI make orchestration easy. But easy doesn’t mean appropriate:
# CrewAI makes this easy to write:crew = Crew( agents=[researcher, writer, editor, reviewer, publisher], tasks=[...])
# But that's 5 failure points, 5x the cost, and 5x the latency# A single well-prompted agent might do betterMistake 4: Ignoring context loss
Every agent handoff loses information:
Agent 1 receives: "Write about Python async programming, focus on FastAPI, include code examples, target senior developers"
Agent 1 output to Agent 2: "Here's the research on async programming..."
Agent 2 receives: Lost "focus on FastAPI" and "target senior developers"Agent 2 output: Generic async article
Result: User intent degraded with each handoffReliability impact
The math is brutal:
Assume each agent has 95% success rate:
Single agent: 95% success
3-agent pipeline: 0.95 x 0.95 x 0.95 = 85.7% success5-agent pipeline: 0.95^5 = 77.4% success
Add retry logic?3-agent with 2 retries each: Much more complex, higher costCost impact in production
I tracked costs over a month:
Content creation system (10 articles/day):
Single agent:- 10 articles x 30 days = 300 articles- ~$0.15 per article- Total: ~$45/month
Multi-agent (researcher/writer/reviewer):- 300 articles- ~$0.50 per article (with retries)- Total: ~$150/month
Difference: $105/month = $1,260/year
Plus: Multi-agent had more errors requiring human fixesSummary
I started this journey thinking multi-agent was “more advanced.” I was wrong. Single agents with well-crafted prompts outperform multi-agent systems for most production workloads.
Key takeaways:
- Start with a single agent and a tight prompt
- Add multi-agent complexity only when forced by context limits, parallel processing needs, or distinct domain expertise
- Every agent is a failure point; every handoff loses context
- “Boring AI” scales better than sophisticated architecture
Audit your current systems:
Can you consolidate your agents into a single, better-prompted agent? Try it. Measure the impact on cost, latency, and reliability. You might be surprised.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: Why single agent wins in production
- 👨💻 LangGraph Documentation
- 👨💻 CrewAI Framework
- 👨💻 Goodhart's Law
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments