How to Prevent Context Bleeding in Multi-Agent AI Systems?
Problem
I built a multi-agent research pipeline with LangGraph: a researcher agent gathers information, a writer agent drafts content, and a reviewer agent checks quality. The architecture looked perfect on paper.
Then I ran it. The writer started hallucinating facts the researcher never found. The reviewer praised completely made-up citations. And the final output included weird artifacts from an earlier debugging session I thought I’d removed.
The quantum computing market will reach $65 billion by 2025(Source: definitely real study from MIT)
Note: DEBUG MODE ENABLED - skipping cache validationNeither researcher nor MIT ever said that. My agent chain was suffering from context bleeding.
What happened?
Context bleeding occurs when information leaks across agent boundaries during handoffs. Each agent was inheriting remnants of previous agent conversations, leftover debugging context, and accumulated noise that corrupted its reasoning.
Here’s what my initial architecture looked like:
from langgraph.graph import StateGraph, ENDfrom typing import TypedDict
class AgentState(TypedDict): messages: list # EVERYTHING accumulates here research_notes: str draft: str final: str
def researcher(state: AgentState) -> dict: # Agent sees ALL previous messages prompt = f"Research this topic. Context: {state['messages']}" result = llm.invoke(prompt) return {"research_notes": result, "messages": state["messages"] + [result]}
def writer(state: AgentState) -> dict: # Agent sees research notes PLUS all previous messages prompt = f"Write article. Research: {state['research_notes']}. History: {state['messages']}" result = llm.invoke(prompt) return {"draft": result, "messages": state["messages"] + [result]}
def reviewer(state: AgentState) -> dict: # Agent sees draft PLUS everything that came before prompt = f"Review: {state['draft']}. Full context: {state['messages']}" result = llm.invoke(prompt) return {"final": result, "messages": state["messages"] + [result]}
# Every handoff adds to the growing message pilegraph = StateGraph(AgentState)graph.add_node("researcher", researcher)graph.add_node("writer", writer)graph.add_node("reviewer", reviewer)graph.add_edge("researcher", "writer")graph.add_edge("writer", "reviewer")The problem: every agent sees the entire conversation history. That’s a feature, not a bug, in chat applications. But in a pipeline where agents have distinct responsibilities, it’s a disaster.
Agent 1 output: 2,000 tokensAgent 2 receives: 2,000 + its own context = 3,000 tokensAgent 3 receives: 3,000 + its own context = 4,500 tokens...Final agent: 15,000+ tokens of accumulated contextThe writer saw debugging notes from development. The reviewer saw half-formed thoughts from earlier iterations. Critical details got lost in the noise.
Why does this happen?
Four mechanisms cause context bleeding:
1. Context Accumulation
Each agent adds to the prompt, bloating the context window with irrelevant data. The signal-to-noise ratio drops with every handoff.
2. Information Decay
Critical details get buried. Agents hallucinate because they can’t find the relevant information in the noise.
3. Implicit Assumptions
Agents make assumptions based on leftover context from previous tasks. The writer assumed facts were verified because the reviewer’s prompt template mentioned “quality checks.”
4. Error Propagation
One agent’s mistake contaminates all downstream agents. A fabricated citation in the research phase appears in the final output because nobody can trace its origin.
The trap: building researcher-writer-reviewer chains feels like “proper software engineering” but creates fragile systems where, as one Reddit practitioner put it, “context dies” at every handoff.
How to fix it?
I tried three approaches, each with different tradeoffs.
Attempt 1: Subagent Isolation
The most effective fix: give each subagent its own isolated context window, returning only structured results to the orchestrator.
Orchestrator Agent (main context)├── Subagent A (isolated context) → Returns: TypedResult├── Subagent B (isolated context) → Returns: TypedResult└── Subagent C (isolated context) → Returns: TypedResultHere’s the implementation:
from langgraph.graph import StateGraph, ENDfrom typing import TypedDict, Listfrom pydantic import BaseModel
# Define typed outputs - NO raw conversation historyclass ResearchOutput(BaseModel): findings: List[str] sources: List[str] confidence: float
class DraftOutput(BaseModel): content: str word_count: int
class ReviewOutput(BaseModel): approved: bool issues: List[str] suggestions: List[str]
# State contains ONLY structured outputsclass WorkflowState(TypedDict): research: ResearchOutput draft: DraftOutput review: ReviewOutput final: str
def research_agent(state: WorkflowState) -> dict: # Fresh context for this agent - no inherited baggage prompt = "Research topic X. Return JSON with findings, sources, confidence." result = llm.invoke(prompt)
# Parse and validate - ONLY structured data passes through output = ResearchOutput.model_validate_json(result) return {"research": output}
def draft_agent(state: WorkflowState) -> dict: # Agent receives ONLY the structured research output prompt = f"""Draft article based on: Findings: {state['research'].findings} Sources: {state['research'].sources} Confidence: {state['research'].confidence} """ result = llm.invoke(prompt) output = DraftOutput.model_validate_json(result) return {"draft": output}
def review_agent(state: WorkflowState) -> dict: # Agent sees ONLY the draft, not the research notes prompt = f"""Review this draft: Content: {state['draft'].content}
Check for hallucinations, verify citations, assess quality. """ result = llm.invoke(prompt) output = ReviewOutput.model_validate_json(result) return {"review": output}
# Build graph with clean state transitionsgraph = StateGraph(WorkflowState)graph.add_node("research", research_agent)graph.add_node("draft", draft_agent)graph.add_node("review", review_agent)graph.add_edge("research", "draft")graph.add_edge("draft", "review")The key change: state contains typed outputs, not raw messages. Each agent starts fresh, sees only what it needs, and returns structured data.
Attempt 2: Structured Handoffs
When subagent isolation isn’t practical, use explicit handoff structures instead of prompt injection.
from dataclasses import dataclassfrom datetime import datetimeimport jsonfrom pathlib import Path
@dataclassclass AgentHandoff: """Explicit handoff structure - no ambiguity""" agent_id: str task_type: str output: dict confidence: float timestamp: str
class HandoffCollection: """Typed collection for agent communication"""
def __init__(self, path: str): self.path = Path(path) self.path.parent.mkdir(parents=True, exist_ok=True) self._load()
def _load(self): if self.path.exists(): with open(self.path) as f: self.data = json.load(f) else: self.data = []
def write(self, handoff: AgentHandoff): self.data.append({ "agent_id": handoff.agent_id, "task_type": handoff.task_type, "output": handoff.output, "confidence": handoff.confidence, "timestamp": handoff.timestamp }) with open(self.path, 'w') as f: json.dump(self.data, f, indent=2)
def read_last(self, agent_id: str) -> AgentHandoff: for entry in reversed(self.data): if entry["agent_id"] == agent_id: return AgentHandoff(**entry) raise ValueError(f"No handoff from {agent_id}")
# Agent A writes structured outputcollection = HandoffCollection(".agent_state/handoff.json")collection.write(AgentHandoff( agent_id="researcher", task_type="web_search", output={ "findings": [ "Quantum market projected at $65B by 2030", "IBM leads with 127-qubit processor" ], "sources": ["https://example.com/report", "https://ibm.com/quantum"] }, confidence=0.85, timestamp=datetime.now().isoformat()))
# Agent B reads - no prompt injection, no context bleedinglast_handoff = collection.read_last("researcher")prompt = f"""Write article based on verified findings:{json.dumps(last_handoff.output, indent=2)}
Confidence: {last_handoff.confidence}"""The difference: instead of passing the full conversation history, each handoff is a persisted, typed collection. The next agent reads exactly what it needs, nothing more.
Attempt 3: Single Agent with Detailed Spec
For 80% of use cases, the multi-agent complexity isn’t justified. A single well-prompted agent often works better.
from pathlib import Path
def run_single_agent(task: str, spec_path: str = "spec.md"): """ Simplest solution for most tasks. No handoffs, no bleeding, one context. """ spec = Path(spec_path).read_text() prompt = f"""Follow this specification:
{spec}
Task: {task}""" return llm.invoke(prompt)
# spec.mdspec_content = """# Research and Writing Specification
## TaskResearch the given topic and write a comprehensive article.
## Constraints- Use only verified, citable sources- Maximum 800 words- Include inline citations in [Source] format- No speculation without explicit "we believe" framing- Fact-check all statistics
## Output Format## [Title][Lead paragraph][Body with citations]## Sources- [1] URL - description"""
# No handoffs, no bleedingresult = run_single_agent( "Research and write about quantum computing market trends", "specs/writer_spec.md")One agent with a detailed spec eliminates handoff complexity entirely. The spec becomes the “context” that would otherwise bleed between agents.
Comparison
| Approach | When to Use | Pros | Cons |
|---|---|---|---|
| Subagent Isolation | Complex pipelines, strict separation | Clean contexts, parallelizable | More infrastructure, orchestrator overhead |
| Structured Handoffs | Sequential pipelines, audit requirements | Explicit data flow, debuggable | Still accumulates at orchestrator |
| Single Agent | Most tasks, simpler requirements | No handoffs, fastest to implement | Limited parallelization, single point of failure |
Common Mistakes
I made all of these:
| Mistake | Consequence | Fix |
|---|---|---|
| Passing full outputs between agents | Context bloat, hallucination | Return typed summaries |
| Implicit handoffs via prompts | Information leakage | Use explicit collections |
| Over-engineering agent chains | Fragile systems | Start simple, add agents only when needed |
| Sharing memory between agents | State contamination | Isolate or use lock files |
| No output validation | Garbage propagates | Pydantic models at every boundary |
Summary
In this post, I showed how context bleeding silently corrupts multi-agent AI systems. The problem isn’t the agents themselves - it’s the handoffs. Every agent boundary is where “context dies” and hallucinations are born.
The solutions:
-
Subagent Isolation - Each agent gets a fresh context, returns only typed results. Best for complex pipelines.
-
Structured Handoffs - Use persisted collections instead of prompt injection. Good for sequential workflows with audit needs.
-
Single Agent with Spec - Eliminate handoffs entirely. The right choice for most tasks.
The uncomfortable truth from practitioners who’ve built 25+ agents: “Every agent you add is a new failure point. Every handoff is where context dies.” Start with one agent. Add more only when the complexity pays for itself.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: 25+ agents built. Here's the uncomfortable truth
- 👨💻 LangGraph Documentation
- 👨💻 Circuit Breaker Pattern
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments