How to Prevent Context Bleeding in Multi-Agent AI Systems?

Mar 24, 2026

Problem

I built a multi-agent research pipeline with LangGraph: a researcher agent gathers information, a writer agent drafts content, and a reviewer agent checks quality. The architecture looked perfect on paper.

Then I ran it. The writer started hallucinating facts the researcher never found. The reviewer praised completely made-up citations. And the final output included weird artifacts from an earlier debugging session I thought I’d removed.

The quantum computing market will reach $65 billion by 2025
(Source: definitely real study from MIT)

Note: DEBUG MODE ENABLED - skipping cache validation

Neither researcher nor MIT ever said that. My agent chain was suffering from context bleeding.

What happened?

Context bleeding occurs when information leaks across agent boundaries during handoffs. Each agent was inheriting remnants of previous agent conversations, leftover debugging context, and accumulated noise that corrupted its reasoning.

Here’s what my initial architecture looked like:

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list  # EVERYTHING accumulates here
    research_notes: str
    draft: str
    final: str

def researcher(state: AgentState) -> dict:
    # Agent sees ALL previous messages
    prompt = f"Research this topic. Context: {state['messages']}"
    result = llm.invoke(prompt)
    return {"research_notes": result, "messages": state["messages"] + [result]}

def writer(state: AgentState) -> dict:
    # Agent sees research notes PLUS all previous messages
    prompt = f"Write article. Research: {state['research_notes']}. History: {state['messages']}"
    result = llm.invoke(prompt)
    return {"draft": result, "messages": state["messages"] + [result]}

def reviewer(state: AgentState) -> dict:
    # Agent sees draft PLUS everything that came before
    prompt = f"Review: {state['draft']}. Full context: {state['messages']}"
    result = llm.invoke(prompt)
    return {"final": result, "messages": state["messages"] + [result]}

# Every handoff adds to the growing message pile
graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)
graph.add_edge("researcher", "writer")
graph.add_edge("writer", "reviewer")

The problem: every agent sees the entire conversation history. That’s a feature, not a bug, in chat applications. But in a pipeline where agents have distinct responsibilities, it’s a disaster.

Agent 1 output: 2,000 tokens
Agent 2 receives: 2,000 + its own context = 3,000 tokens
Agent 3 receives: 3,000 + its own context = 4,500 tokens
...
Final agent: 15,000+ tokens of accumulated context

The writer saw debugging notes from development. The reviewer saw half-formed thoughts from earlier iterations. Critical details got lost in the noise.

Why does this happen?

Four mechanisms cause context bleeding:

1. Context Accumulation

Each agent adds to the prompt, bloating the context window with irrelevant data. The signal-to-noise ratio drops with every handoff.

2. Information Decay

Critical details get buried. Agents hallucinate because they can’t find the relevant information in the noise.

3. Implicit Assumptions

Agents make assumptions based on leftover context from previous tasks. The writer assumed facts were verified because the reviewer’s prompt template mentioned “quality checks.”

4. Error Propagation

One agent’s mistake contaminates all downstream agents. A fabricated citation in the research phase appears in the final output because nobody can trace its origin.

The trap: building researcher-writer-reviewer chains feels like “proper software engineering” but creates fragile systems where, as one Reddit practitioner put it, “context dies” at every handoff.

How to fix it?

I tried three approaches, each with different tradeoffs.

Attempt 1: Subagent Isolation

The most effective fix: give each subagent its own isolated context window, returning only structured results to the orchestrator.

Orchestrator Agent (main context)
├── Subagent A (isolated context) → Returns: TypedResult
├── Subagent B (isolated context) → Returns: TypedResult
└── Subagent C (isolated context) → Returns: TypedResult

Here’s the implementation:

from langgraph.graph import StateGraph, END
from typing import TypedDict, List
from pydantic import BaseModel

# Define typed outputs - NO raw conversation history
class ResearchOutput(BaseModel):
    findings: List[str]
    sources: List[str]
    confidence: float

class DraftOutput(BaseModel):
    content: str
    word_count: int

class ReviewOutput(BaseModel):
    approved: bool
    issues: List[str]
    suggestions: List[str]

# State contains ONLY structured outputs
class WorkflowState(TypedDict):
    research: ResearchOutput
    draft: DraftOutput
    review: ReviewOutput
    final: str

def research_agent(state: WorkflowState) -> dict:
    # Fresh context for this agent - no inherited baggage
    prompt = "Research topic X. Return JSON with findings, sources, confidence."
    result = llm.invoke(prompt)

    # Parse and validate - ONLY structured data passes through
    output = ResearchOutput.model_validate_json(result)
    return {"research": output}

def draft_agent(state: WorkflowState) -> dict:
    # Agent receives ONLY the structured research output
    prompt = f"""Draft article based on:
    Findings: {state['research'].findings}
    Sources: {state['research'].sources}
    Confidence: {state['research'].confidence}
    """
    result = llm.invoke(prompt)
    output = DraftOutput.model_validate_json(result)
    return {"draft": output}

def review_agent(state: WorkflowState) -> dict:
    # Agent sees ONLY the draft, not the research notes
    prompt = f"""Review this draft:
    Content: {state['draft'].content}

    Check for hallucinations, verify citations, assess quality.
    """
    result = llm.invoke(prompt)
    output = ReviewOutput.model_validate_json(result)
    return {"review": output}

# Build graph with clean state transitions
graph = StateGraph(WorkflowState)
graph.add_node("research", research_agent)
graph.add_node("draft", draft_agent)
graph.add_node("review", review_agent)
graph.add_edge("research", "draft")
graph.add_edge("draft", "review")

The key change: state contains typed outputs, not raw messages. Each agent starts fresh, sees only what it needs, and returns structured data.

Attempt 2: Structured Handoffs

When subagent isolation isn’t practical, use explicit handoff structures instead of prompt injection.

from dataclasses import dataclass
from datetime import datetime
import json
from pathlib import Path

@dataclass
class AgentHandoff:
    """Explicit handoff structure - no ambiguity"""
    agent_id: str
    task_type: str
    output: dict
    confidence: float
    timestamp: str

class HandoffCollection:
    """Typed collection for agent communication"""

    def __init__(self, path: str):
        self.path = Path(path)
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self._load()

    def _load(self):
        if self.path.exists():
            with open(self.path) as f:
                self.data = json.load(f)
        else:
            self.data = []

    def write(self, handoff: AgentHandoff):
        self.data.append({
            "agent_id": handoff.agent_id,
            "task_type": handoff.task_type,
            "output": handoff.output,
            "confidence": handoff.confidence,
            "timestamp": handoff.timestamp
        })
        with open(self.path, 'w') as f:
            json.dump(self.data, f, indent=2)

    def read_last(self, agent_id: str) -> AgentHandoff:
        for entry in reversed(self.data):
            if entry["agent_id"] == agent_id:
                return AgentHandoff(**entry)
        raise ValueError(f"No handoff from {agent_id}")

# Agent A writes structured output
collection = HandoffCollection(".agent_state/handoff.json")
collection.write(AgentHandoff(
    agent_id="researcher",
    task_type="web_search",
    output={
        "findings": [
            "Quantum market projected at $65B by 2030",
            "IBM leads with 127-qubit processor"
        ],
        "sources": ["https://example.com/report", "https://ibm.com/quantum"]
    },
    confidence=0.85,
    timestamp=datetime.now().isoformat()
))

# Agent B reads - no prompt injection, no context bleeding
last_handoff = collection.read_last("researcher")
prompt = f"""Write article based on verified findings:
{json.dumps(last_handoff.output, indent=2)}

Confidence: {last_handoff.confidence}
"""

The difference: instead of passing the full conversation history, each handoff is a persisted, typed collection. The next agent reads exactly what it needs, nothing more.

Attempt 3: Single Agent with Detailed Spec

For 80% of use cases, the multi-agent complexity isn’t justified. A single well-prompted agent often works better.

from pathlib import Path

def run_single_agent(task: str, spec_path: str = "spec.md"):
    """
    Simplest solution for most tasks.
    No handoffs, no bleeding, one context.
    """
    spec = Path(spec_path).read_text()
    prompt = f"""Follow this specification:

{spec}

Task: {task}
"""
    return llm.invoke(prompt)

# spec.md
spec_content = """
# Research and Writing Specification

## Task
Research the given topic and write a comprehensive article.

## Constraints
- Use only verified, citable sources
- Maximum 800 words
- Include inline citations in [Source] format
- No speculation without explicit "we believe" framing
- Fact-check all statistics

## Output Format
## [Title]
[Lead paragraph]
[Body with citations]
## Sources
- [1] URL - description
"""

# No handoffs, no bleeding
result = run_single_agent(
    "Research and write about quantum computing market trends",
    "specs/writer_spec.md"
)

One agent with a detailed spec eliminates handoff complexity entirely. The spec becomes the “context” that would otherwise bleed between agents.

Comparison

Approach	When to Use	Pros	Cons
Subagent Isolation	Complex pipelines, strict separation	Clean contexts, parallelizable	More infrastructure, orchestrator overhead
Structured Handoffs	Sequential pipelines, audit requirements	Explicit data flow, debuggable	Still accumulates at orchestrator
Single Agent	Most tasks, simpler requirements	No handoffs, fastest to implement	Limited parallelization, single point of failure

Common Mistakes

I made all of these:

Mistake	Consequence	Fix
Passing full outputs between agents	Context bloat, hallucination	Return typed summaries
Implicit handoffs via prompts	Information leakage	Use explicit collections
Over-engineering agent chains	Fragile systems	Start simple, add agents only when needed
Sharing memory between agents	State contamination	Isolate or use lock files
No output validation	Garbage propagates	Pydantic models at every boundary

Summary

In this post, I showed how context bleeding silently corrupts multi-agent AI systems. The problem isn’t the agents themselves - it’s the handoffs. Every agent boundary is where “context dies” and hallucinations are born.

The solutions:

Subagent Isolation - Each agent gets a fresh context, returns only typed results. Best for complex pipelines.
Structured Handoffs - Use persisted collections instead of prompt injection. Good for sequential workflows with audit needs.
Single Agent with Spec - Eliminate handoffs entirely. The right choice for most tasks.

The uncomfortable truth from practitioners who’ve built 25+ agents: “Every agent you add is a new failure point. Every handoff is where context dies.” Start with one agent. Add more only when the complexity pays for itself.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: 25+ agents built. Here's the uncomfortable truth
👨‍💻 LangGraph Documentation
👨‍💻 Circuit Breaker Pattern

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!