Skip to content

How to Prevent Context Bleeding in Multi-Agent AI Systems?

Problem

I built a multi-agent research pipeline with LangGraph: a researcher agent gathers information, a writer agent drafts content, and a reviewer agent checks quality. The architecture looked perfect on paper.

Then I ran it. The writer started hallucinating facts the researcher never found. The reviewer praised completely made-up citations. And the final output included weird artifacts from an earlier debugging session I thought I’d removed.

output.txt
The quantum computing market will reach $65 billion by 2025
(Source: definitely real study from MIT)
Note: DEBUG MODE ENABLED - skipping cache validation

Neither researcher nor MIT ever said that. My agent chain was suffering from context bleeding.

What happened?

Context bleeding occurs when information leaks across agent boundaries during handoffs. Each agent was inheriting remnants of previous agent conversations, leftover debugging context, and accumulated noise that corrupted its reasoning.

Here’s what my initial architecture looked like:

bleeding-pipeline.py
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: list # EVERYTHING accumulates here
research_notes: str
draft: str
final: str
def researcher(state: AgentState) -> dict:
# Agent sees ALL previous messages
prompt = f"Research this topic. Context: {state['messages']}"
result = llm.invoke(prompt)
return {"research_notes": result, "messages": state["messages"] + [result]}
def writer(state: AgentState) -> dict:
# Agent sees research notes PLUS all previous messages
prompt = f"Write article. Research: {state['research_notes']}. History: {state['messages']}"
result = llm.invoke(prompt)
return {"draft": result, "messages": state["messages"] + [result]}
def reviewer(state: AgentState) -> dict:
# Agent sees draft PLUS everything that came before
prompt = f"Review: {state['draft']}. Full context: {state['messages']}"
result = llm.invoke(prompt)
return {"final": result, "messages": state["messages"] + [result]}
# Every handoff adds to the growing message pile
graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)
graph.add_edge("researcher", "writer")
graph.add_edge("writer", "reviewer")

The problem: every agent sees the entire conversation history. That’s a feature, not a bug, in chat applications. But in a pipeline where agents have distinct responsibilities, it’s a disaster.

context-bloat.txt
Agent 1 output: 2,000 tokens
Agent 2 receives: 2,000 + its own context = 3,000 tokens
Agent 3 receives: 3,000 + its own context = 4,500 tokens
...
Final agent: 15,000+ tokens of accumulated context

The writer saw debugging notes from development. The reviewer saw half-formed thoughts from earlier iterations. Critical details got lost in the noise.

Why does this happen?

Four mechanisms cause context bleeding:

1. Context Accumulation

Each agent adds to the prompt, bloating the context window with irrelevant data. The signal-to-noise ratio drops with every handoff.

2. Information Decay

Critical details get buried. Agents hallucinate because they can’t find the relevant information in the noise.

3. Implicit Assumptions

Agents make assumptions based on leftover context from previous tasks. The writer assumed facts were verified because the reviewer’s prompt template mentioned “quality checks.”

4. Error Propagation

One agent’s mistake contaminates all downstream agents. A fabricated citation in the research phase appears in the final output because nobody can trace its origin.

The trap: building researcher-writer-reviewer chains feels like “proper software engineering” but creates fragile systems where, as one Reddit practitioner put it, “context dies” at every handoff.

How to fix it?

I tried three approaches, each with different tradeoffs.

Attempt 1: Subagent Isolation

The most effective fix: give each subagent its own isolated context window, returning only structured results to the orchestrator.

architecture-diagram.txt
Orchestrator Agent (main context)
├── Subagent A (isolated context) → Returns: TypedResult
├── Subagent B (isolated context) → Returns: TypedResult
└── Subagent C (isolated context) → Returns: TypedResult

Here’s the implementation:

isolated-subagents.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, List
from pydantic import BaseModel
# Define typed outputs - NO raw conversation history
class ResearchOutput(BaseModel):
findings: List[str]
sources: List[str]
confidence: float
class DraftOutput(BaseModel):
content: str
word_count: int
class ReviewOutput(BaseModel):
approved: bool
issues: List[str]
suggestions: List[str]
# State contains ONLY structured outputs
class WorkflowState(TypedDict):
research: ResearchOutput
draft: DraftOutput
review: ReviewOutput
final: str
def research_agent(state: WorkflowState) -> dict:
# Fresh context for this agent - no inherited baggage
prompt = "Research topic X. Return JSON with findings, sources, confidence."
result = llm.invoke(prompt)
# Parse and validate - ONLY structured data passes through
output = ResearchOutput.model_validate_json(result)
return {"research": output}
def draft_agent(state: WorkflowState) -> dict:
# Agent receives ONLY the structured research output
prompt = f"""Draft article based on:
Findings: {state['research'].findings}
Sources: {state['research'].sources}
Confidence: {state['research'].confidence}
"""
result = llm.invoke(prompt)
output = DraftOutput.model_validate_json(result)
return {"draft": output}
def review_agent(state: WorkflowState) -> dict:
# Agent sees ONLY the draft, not the research notes
prompt = f"""Review this draft:
Content: {state['draft'].content}
Check for hallucinations, verify citations, assess quality.
"""
result = llm.invoke(prompt)
output = ReviewOutput.model_validate_json(result)
return {"review": output}
# Build graph with clean state transitions
graph = StateGraph(WorkflowState)
graph.add_node("research", research_agent)
graph.add_node("draft", draft_agent)
graph.add_node("review", review_agent)
graph.add_edge("research", "draft")
graph.add_edge("draft", "review")

The key change: state contains typed outputs, not raw messages. Each agent starts fresh, sees only what it needs, and returns structured data.

Attempt 2: Structured Handoffs

When subagent isolation isn’t practical, use explicit handoff structures instead of prompt injection.

structured-handoffs.py
from dataclasses import dataclass
from datetime import datetime
import json
from pathlib import Path
@dataclass
class AgentHandoff:
"""Explicit handoff structure - no ambiguity"""
agent_id: str
task_type: str
output: dict
confidence: float
timestamp: str
class HandoffCollection:
"""Typed collection for agent communication"""
def __init__(self, path: str):
self.path = Path(path)
self.path.parent.mkdir(parents=True, exist_ok=True)
self._load()
def _load(self):
if self.path.exists():
with open(self.path) as f:
self.data = json.load(f)
else:
self.data = []
def write(self, handoff: AgentHandoff):
self.data.append({
"agent_id": handoff.agent_id,
"task_type": handoff.task_type,
"output": handoff.output,
"confidence": handoff.confidence,
"timestamp": handoff.timestamp
})
with open(self.path, 'w') as f:
json.dump(self.data, f, indent=2)
def read_last(self, agent_id: str) -> AgentHandoff:
for entry in reversed(self.data):
if entry["agent_id"] == agent_id:
return AgentHandoff(**entry)
raise ValueError(f"No handoff from {agent_id}")
# Agent A writes structured output
collection = HandoffCollection(".agent_state/handoff.json")
collection.write(AgentHandoff(
agent_id="researcher",
task_type="web_search",
output={
"findings": [
"Quantum market projected at $65B by 2030",
"IBM leads with 127-qubit processor"
],
"sources": ["https://example.com/report", "https://ibm.com/quantum"]
},
confidence=0.85,
timestamp=datetime.now().isoformat()
))
# Agent B reads - no prompt injection, no context bleeding
last_handoff = collection.read_last("researcher")
prompt = f"""Write article based on verified findings:
{json.dumps(last_handoff.output, indent=2)}
Confidence: {last_handoff.confidence}
"""

The difference: instead of passing the full conversation history, each handoff is a persisted, typed collection. The next agent reads exactly what it needs, nothing more.

Attempt 3: Single Agent with Detailed Spec

For 80% of use cases, the multi-agent complexity isn’t justified. A single well-prompted agent often works better.

single-agent-spec.py
from pathlib import Path
def run_single_agent(task: str, spec_path: str = "spec.md"):
"""
Simplest solution for most tasks.
No handoffs, no bleeding, one context.
"""
spec = Path(spec_path).read_text()
prompt = f"""Follow this specification:
{spec}
Task: {task}
"""
return llm.invoke(prompt)
# spec.md
spec_content = """
# Research and Writing Specification
## Task
Research the given topic and write a comprehensive article.
## Constraints
- Use only verified, citable sources
- Maximum 800 words
- Include inline citations in [Source] format
- No speculation without explicit "we believe" framing
- Fact-check all statistics
## Output Format
## [Title]
[Lead paragraph]
[Body with citations]
## Sources
- [1] URL - description
"""
# No handoffs, no bleeding
result = run_single_agent(
"Research and write about quantum computing market trends",
"specs/writer_spec.md"
)

One agent with a detailed spec eliminates handoff complexity entirely. The spec becomes the “context” that would otherwise bleed between agents.

Comparison

ApproachWhen to UseProsCons
Subagent IsolationComplex pipelines, strict separationClean contexts, parallelizableMore infrastructure, orchestrator overhead
Structured HandoffsSequential pipelines, audit requirementsExplicit data flow, debuggableStill accumulates at orchestrator
Single AgentMost tasks, simpler requirementsNo handoffs, fastest to implementLimited parallelization, single point of failure

Common Mistakes

I made all of these:

MistakeConsequenceFix
Passing full outputs between agentsContext bloat, hallucinationReturn typed summaries
Implicit handoffs via promptsInformation leakageUse explicit collections
Over-engineering agent chainsFragile systemsStart simple, add agents only when needed
Sharing memory between agentsState contaminationIsolate or use lock files
No output validationGarbage propagatesPydantic models at every boundary

Summary

In this post, I showed how context bleeding silently corrupts multi-agent AI systems. The problem isn’t the agents themselves - it’s the handoffs. Every agent boundary is where “context dies” and hallucinations are born.

The solutions:

  1. Subagent Isolation - Each agent gets a fresh context, returns only typed results. Best for complex pipelines.

  2. Structured Handoffs - Use persisted collections instead of prompt injection. Good for sequential workflows with audit needs.

  3. Single Agent with Spec - Eliminate handoffs entirely. The right choice for most tasks.

The uncomfortable truth from practitioners who’ve built 25+ agents: “Every agent you add is a new failure point. Every handoff is where context dies.” Start with one agent. Add more only when the complexity pays for itself.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments