Skip to content

How to Build Multi-Agent Systems with Persistent Memory for Business Automation

Purpose

This post shows how to build multi-agent systems with persistent memory for business automation.

Problem

I built a single AI agent to handle my business operations. It worked for simple tasks. When I asked it to handle complex workflows across multiple days, everything broke.

Single Agent Failure
Day 1: "Set up meeting with John" - Agent creates calendar event
Day 2: "What did I schedule with John?" - Agent: "I don't know who John is"
# No memory across sessions
# No coordination between tasks
# Using expensive model for simple operations

The agent forgot context between sessions. It used Claude Sonnet for every task, even simple ones. My monthly AI cost hit $200 for basic operations.

Environment

  • Python 3.12
  • LangGraph 0.2 for orchestration
  • Mem0 for persistent memory
  • ChromaDB as vector database backend
  • DeepSeek V3.2, Claude Haiku, Claude Sonnet for different agent roles

Solution

A multi-agent system with persistent memory needs three components:

  1. Orchestration Layer (LangGraph) - Routes tasks to correct agents
  2. Memory Layer (Mem0) - Stores facts, decisions, and preferences
  3. Storage Layer (ChromaDB) - Vector embeddings for memory search

Architecture Overview

I designed the system with a router agent that dispatches tasks to specialized agents:

System Architecture
┌─────────────────────────────────────────────┐
│ ROUTER AGENT (DeepSeek) │
│ Classifies task, dispatches to specialist │
└─────────────────┬───────────────────────────┘
┌──────────────────────────┼──────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ OPS ENGINE │ │ OUTREACH QA │ │ WEBMASTER MGR │
│ (MiniMax) │ │ (Claude Haiku) │ │ (Claude Sonnet)│
│ │ │ │ │ │
│ Bulk operations │ │ Draft emails │ │ Complex decisions│
│ Batch processing │ │ Simple content │ │ Negotiations │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└──────────────────────────┼──────────────────────────┘
┌─────────────────────────────────────────────┐
│ Mem0 + ChromaDB │
│ Persistent Memory Storage & Retrieval │
│ │
│ Facts: "User prefers email over Slack" │
│ Decisions: "Approved budget of $500" │
│ Preferences: "Meeting times: 9am-11am" │
└─────────────────────────────────────────────┘

Memory Layers

Mem0 provides four memory layers with different lifetimes:

Memory Layer Hierarchy
┌─────────────────┬───────────────────┬───────────────────────────────┐
│ Layer │ Lifetime │ Best For │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ Conversation │ Single response │ Tool execution details │
│ Session │ Minutes to hours │ Multi-step workflow context │
│ User │ Weeks to forever │ Personal preferences │
│ Organizational │ Global │ Shared FAQs, policies │
└─────────────────┴───────────────────┴───────────────────────────────┘

I implemented this hierarchy to match business needs:

memory_config.py
from mem0 import Memory
from enum import Enum
class MemoryLayer(Enum):
CONVERSATION = "conversation" # Dies after response
SESSION = "session" # Lives during workflow
USER = "user" # Persists per user
ORGANIZATIONAL = "org" # Shared across all
class MemoryManager:
def __init__(self, chroma_client):
self.memory = Memory(
backend=chroma_client,
config={
"vector_store": {
"provider": "chroma",
"config": {
"collection_name": "agent_memory",
"path": "./chroma_db"
}
}
}
)
async def store_fact(self, user_id: str, fact: str, layer: MemoryLayer):
"""Store a fact in the appropriate memory layer"""
await self.memory.add(
messages=[{"role": "system", "content": fact}],
user_id=user_id,
metadata={
"layer": layer.value,
"timestamp": datetime.now().isoformat()
}
)
async def get_relevant_context(self, user_id: str, query: str) -> list:
"""Retrieve relevant facts for current task"""
results = await self.memory.search(
query=query,
user_id=user_id,
limit=10
)
return results

When I store a user preference:

# Store preference
await memory.store_fact(
user_id="user_123",
fact="User prefers email communication over Slack for important updates",
layer=MemoryLayer.USER
)
# Later, retrieve it
context = await memory.get_relevant_context(
user_id="user_123",
query="How should I notify about the budget approval?"
)
# Output: [{"content": "User prefers email...", "score": 0.92}]

Agent Router with LangGraph

I built a router that classifies tasks and dispatches to specialized agents:

agent_router.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
from enum import Enum
class TaskType(Enum):
SIMPLE_OPERATION = "simple" # MiniMax handles
CONTENT_DRAFT = "draft" # Claude Haiku handles
COMPLEX_DECISION = "complex" # Claude Sonnet handles
class AgentState(TypedDict):
input: str
task_type: TaskType
user_id: str
context: list
result: str
cost: float
class AgentRouter:
def __init__(self, memory_manager, agents):
self.memory = memory_manager
self.agents = agents
self.graph = self._build_graph()
def _build_graph(self) -> StateGraph:
"""Build the routing graph"""
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("classify", self.classify_task)
workflow.add_node("retrieve_context", self.retrieve_context)
workflow.add_node("dispatch", self.dispatch_to_agent)
workflow.add_node("store_result", self.store_result)
# Add edges
workflow.set_entry_point("classify")
workflow.add_edge("classify", "retrieve_context")
workflow.add_edge("retrieve_context", "dispatch")
workflow.add_edge("dispatch", "store_result")
workflow.add_edge("store_result", END)
return workflow.compile()
async def classify_task(self, state: AgentState) -> AgentState:
"""Classify task complexity using DeepSeek (cheap)"""
classification_prompt = f"""
Classify this task complexity:
Task: {state['input']}
Options:
- SIMPLE_OPERATION: Bulk tasks, batch processing, data retrieval
- CONTENT_DRAFT: Writing emails, drafting content, simple summaries
- COMPLEX_DECISION: Negotiations, strategic decisions, technical architecture
Return only the classification type.
"""
response = await self.agents["classifier"].generate(classification_prompt)
state["task_type"] = TaskType(response.strip().upper())
return state
async def retrieve_context(self, state: AgentState) -> AgentState:
"""Get relevant memory context"""
context = await self.memory.get_relevant_context(
user_id=state["user_id"],
query=state["input"]
)
state["context"] = context
return state
async def dispatch_to_agent(self, state: AgentState) -> AgentState:
"""Route to appropriate specialist agent"""
agent_map = {
TaskType.SIMPLE_OPERATION: "ops_engine",
TaskType.CONTENT_DRAFT: "outreach_qa",
TaskType.COMPLEX_DECISION: "webmaster_mgr"
}
agent_name = agent_map[state["task_type"]]
agent = self.agents[agent_name]
# Build prompt with context
context_str = "\n".join([c["content"] for c in state["context"]])
prompt = f"""
Context from memory:
{context_str}
Task: {state['input']}
"""
result = await agent.generate(prompt)
state["result"] = result
state["cost"] = agent.get_cost()
return state
async def store_result(self, state: AgentState) -> AgentState:
"""Store decision in memory"""
await self.memory.store_fact(
user_id=state["user_id"],
fact=f"Decision made: {state['result']}",
layer=MemoryLayer.SESSION
)
return state

When I run the router:

# Initialize router
router = AgentRouter(memory_manager, agents)
# Process task
result = await router.invoke({
"input": "Draft an email to John about the meeting schedule",
"user_id": "user_123"
})
# Classification flow:
# 1. classify -> CONTENT_DRAFT (using DeepSeek, $0.001)
# 2. retrieve_context -> ["User prefers email...", "Meeting times: 9am"]
# 3. dispatch -> outreach_qa (Claude Haiku, $0.01)
# 4. store_result -> Memory saved

Multi-Agent Configuration

I configured each agent with a model matched to its task complexity:

agents_config.py
from langchain_openai import ChatOpenAI
from anthropic import Anthropic
import requests
class AgentConfig:
"""Agent configuration with cost tracking"""
agents = {
"classifier": {
"model": "deepseek-chat",
"provider": "deepseek",
"cost_per_1k_tokens": 0.0014, # $0.14 per million
"role": "Task classification only"
},
"ops_engine": {
"model": "MiniMax-m2.1",
"provider": "minimax",
"cost_per_1k_tokens": 0.002,
"role": "Bulk operations, batch processing"
},
"outreach_qa": {
"model": "claude-3-5-haiku",
"provider": "anthropic",
"cost_per_1k_tokens": 0.008, # $0.80 per million
"role": "Draft emails, simple content"
},
"webmaster_mgr": {
"model": "claude-3-5-sonnet",
"provider": "anthropic",
"cost_per_1k_tokens": 0.03, # $3 per million
"role": "Complex decisions, negotiations"
}
}
class AgentPool:
def __init__(self):
self.agents = {}
self._initialize_agents()
def _initialize_agents(self):
"""Initialize all agents with their models"""
for name, config in AgentConfig.agents.items():
if config["provider"] == "anthropic":
self.agents[name] = AnthropicAgent(config)
elif config["provider"] == "deepseek":
self.agents[name] = DeepSeekAgent(config)
elif config["provider"] == "minimax":
self.agents[name] = MiniMaxAgent(config)
def get_agent(self, name: str):
return self.agents[name]
class AnthropicAgent:
def __init__(self, config):
self.client = Anthropic()
self.model = config["model"]
self.cost_per_1k = config["cost_per_1k_tokens"]
self.tokens_used = 0
async def generate(self, prompt: str) -> str:
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
self.tokens_used += response.usage.input_tokens + response.usage.output_tokens
return response.content[0].text
def get_cost(self) -> float:
return (self.tokens_used / 1000) * self.cost_per_1k
class DeepSeekAgent:
def __init__(self, config):
self.api_key = os.environ.get("DEEPSEEK_API_KEY")
self.model = config["model"]
self.cost_per_1k = config["cost_per_1k_tokens"]
self.tokens_used = 0
async def generate(self, prompt: str) -> str:
response = requests.post(
"https://api.deepseek.com/v1/chat/completions",
headers={"Authorization": f"Bearer {self.api_key}"},
json={
"model": self.model,
"messages": [{"role": "user", "content": prompt}]
}
)
usage = response.json()["usage"]
self.tokens_used += usage["total_tokens"]
return response.json()["choices"][0]["message"]["content"]
def get_cost(self) -> float:
return (self.tokens_used / 1000) * self.cost_per_1k

When I compare costs for 1000 tasks:

Cost Comparison
Single Claude Sonnet for all tasks:
- 1000 tasks * 2k tokens average * $0.03/1k = $60/month
Multi-agent with model matching:
- 600 simple tasks * MiniMax * $0.002/1k = $1.20
- 300 draft tasks * Haiku * $0.008/1k = $2.40
- 100 complex tasks * Sonnet * $0.03/1k = $3.00
- Classification overhead: $0.50
- Total: $7.10/month
Savings: 88% cost reduction

ChromaDB Vector Store Setup

I set up ChromaDB as the vector storage backend:

chroma_setup.py
import chromadb
from chromadb.config import Settings
class VectorStore:
def __init__(self, path: str = "./chroma_db"):
self.client = chromadb.PersistentClient(
path=path,
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
self.collection = self.client.get_or_create_collection(
name="agent_memory",
metadata={"description": "Multi-agent persistent memory"}
)
async def add_memory(self, id: str, content: str, metadata: dict):
"""Add memory entry with embeddings"""
self.collection.add(
documents=[content],
metadatas=[metadata],
ids=[id]
)
async def search_memory(self, query: str, n_results: int = 10) -> list:
"""Semantic search for relevant memories"""
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
return results
async def get_by_user(self, user_id: str) -> list:
"""Get all memories for a user"""
results = self.collection.get(
where={"user_id": user_id}
)
return results
# Initialize
vector_store = VectorStore()
# Add organizational memory (shared rules)
await vector_store.add_memory(
id="org_rule_1",
content="Always confirm budget approvals with finance team before execution",
metadata={"layer": "org", "type": "policy"}
)
# Semantic search
results = await vector_store.search_memory(
query="How do I handle budget requests?",
n_results=5
)

Complete System Integration

I integrated all components into a complete system:

multi_agent_system.py
from agent_router import AgentRouter
from memory_config import MemoryManager, MemoryLayer
from agents_config import AgentPool
from chroma_setup import VectorStore
class MultiAgentSystem:
def __init__(self):
# Initialize components
self.vector_store = VectorStore()
self.memory = MemoryManager(self.vector_store)
self.agent_pool = AgentPool()
self.router = AgentRouter(self.memory, self.agent_pool.agents)
# Load organizational rules
self._load_org_memory()
def _load_org_memory(self):
"""Load shared organizational memory"""
org_rules = [
"Budget approvals require finance team confirmation",
"All client communications must be logged",
"Meeting times preferred: 9am-11am, 2pm-4pm",
"Response SLA: 24 hours for emails, 2 hours for Slack"
]
for i, rule in enumerate(org_rules):
await self.vector_store.add_memory(
id=f"org_rule_{i}",
content=rule,
metadata={"layer": "org", "type": "policy"}
)
async def process_task(self, user_id: str, task: str) -> dict:
"""Process a task through the multi-agent system"""
result = await self.router.invoke({
"input": task,
"user_id": user_id
})
return {
"result": result["result"],
"task_type": result["task_type"].value,
"cost": result["cost"],
"context_used": len(result["context"])
}
async def add_user_preference(self, user_id: str, preference: str):
"""Store user preference in memory"""
await self.memory.store_fact(
user_id=user_id,
fact=preference,
layer=MemoryLayer.USER
)
# Usage
system = MultiAgentSystem()
# Add user preference
await system.add_user_preference(
user_id="john_123",
preference="John prefers morning meetings between 9am and 11am"
)
# Process task
result = await system.process_task(
user_id="john_123",
task="Schedule a meeting with the client team about the new project"
)
# Output:
# {
# "result": "Meeting scheduled for 9:30am tomorrow with client team...",
# "task_type": "simple",
# "cost": 0.002,
# "context_used": 3
# }

Real-World Example: SEO Agency

I found a Reddit post about an SEO agency running 5 agents with this architecture:

SEO Agency Agent Setup
Agent Roles:
- Steve (DeepSeek V3.2): Main agent, handles WhatsApp & Slack daily ops
- ops-engine (MiniMax): Bulk tasks, batch SEO operations
- outreach-qa (Claude Haiku): Draft outreach emails, review content
- webmaster-mgr (Claude Sonnet): Negotiate with webmasters, technical decisions
Memory Facts Stored:
- "Client X prefers weekly reports"
- "Budget limit for outreach: $500/month"
- "Webmaster Y negotiated $200 for link placement"
- "Previous successful outreach template: [stored]"
Monthly Cost: ~$20-30 (vs $200+ for single premium model)

The agency reported: “We use Mem0 + ChromaDB - every decision, rule, and preference gets stored as a searchable fact.”

Common Mistakes I Made

Mistakes to Avoid
1. Using one model for everything
- Mistake: Claude Sonnet for simple classification
- Fix: Use DeepSeek for routing, Haiku for drafts
2. Ignoring memory architecture
- Mistake: All memories in one bucket
- Fix: Layer by lifetime (conversation, session, user, org)
3. No agent routing logic
- Mistake: Random agent assignment
- Fix: Classification step before dispatch
4. Over-engineering first iteration
- Mistake: 10 agents on day one
- Fix: Start with 3 agents, add as needed
5. Storing secrets in memory
- Mistake: API keys stored in vector DB
- Fix: Only store business facts, use env vars for secrets

Summary

In this post, I showed how to build multi-agent systems with persistent memory for business automation. The key point is combining LangGraph for orchestration, Mem0 for memory management, and matching models to task complexity.

The architecture uses three layers: Orchestration (LangGraph router), Memory (Mem0 with ChromaDB backend), and Storage (ChromaDB vector embeddings). I configured four memory layers: Conversation (single response), Session (workflow context), User (preferences), and Organizational (shared rules).

For model matching, I use DeepSeek for classification ($0.0014/1k), MiniMax for operations ($0.002/1k), Claude Haiku for drafts ($0.008/1k), and Claude Sonnet only for complex decisions ($0.03/1k). This reduced my monthly AI costs from $200 to ~$20-30.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments