How DeerFlow Sandbox Execution Enables Safe Code Running
Purpose
I wanted my AI agents to actually execute code, not just talk about it. Most agent frameworks stop at API calls—they can generate Python scripts but can’t run them. They can suggest file changes but can’t apply them.
DeerFlow solved this with its sandbox execution system. This post explains how it works and why it matters for building agents that do real work.
The Problem: Agents Without Execution
I spent weeks building an agent that could analyze data. The workflow should have been simple:
- User uploads a CSV file
- Agent writes a Python script to analyze it
- Agent runs the script
- Agent saves the results
Steps 1 and 2 were easy. Steps 3 and 4 were impossible. My agent could only generate text responses. It had no way to:
- Execute the Python script it wrote
- Save files to disk
- Read files the user uploaded
- Persist anything between messages
I tried giving it shell access. That was a disaster:
# What could go wrong?> agent.execute("rm -rf /")# ...everythingI needed isolation. I needed sandboxing.
What DeerFlow’s Sandbox Provides
DeerFlow creates an isolated execution environment per conversation thread. Each thread gets:
- Its own filesystem - workspace, uploads, and outputs directories
- Virtual path translation - agents see
/mnt/user-data/...not real paths - Execution tools - bash, read_file, write_file, str_replace
- Isolation modes - local or Docker-based
Thread-Isolated Directory Structure
When I started a new conversation thread, DeerFlow created this structure:
backend/.deer-flow/threads/├── thread-abc123/│ └── user-data/│ ├── workspace/ # Agent's working directory│ ├── uploads/ # User-uploaded files│ └── outputs/ # Generated deliverables├── thread-def456/│ └── user-data/│ └── ...└── thread-xyz789/ └── user-data/ └── ...Each thread is completely isolated. Files from one conversation never leak to another.
Virtual Path System
This is the clever part. Agents never see real filesystem paths. They operate through virtual paths:
| Virtual Path (Agent Sees) | Physical Path (Actual Location) |
|---|---|
/mnt/user-data/workspace | threads/{thread_id}/user-data/workspace |
/mnt/user-data/uploads | threads/{thread_id}/user-data/uploads |
/mnt/user-data/outputs | threads/{thread_id}/user-data/outputs |
/mnt/skills | deer-flow/skills/ |
I tested this by asking an agent to list files:
"List all files in my workspace directory"The agent ran:
ls -la /mnt/user-data/workspaceBut behind the scenes, DeerFlow translated that to:
ls -la backend/.deer-flow/threads/abc123/user-data/workspaceThe agent never knows the real path. This prevents path traversal attacks and keeps the host system safe.
Sandbox Tools Available
I explored the tools DeerFlow provides to agents inside the sandbox.
Bash Execution
The most powerful tool is bash execution:
## Step 1: Create a Python script```bashcat > /mnt/user-data/workspace/analyze.py << 'EOF'import pandas as pdimport json
# Read uploaded datadf = pd.read_csv('/mnt/user-data/uploads/sales.csv')
# Calculate statisticsstats = { 'total_sales': df['amount'].sum(), 'avg_sale': df['amount'].mean(), 'count': len(df)}
# Save resultswith open('/mnt/user-data/outputs/results.json', 'w') as f: json.dump(stats, f)
print("Analysis complete!")EOFStep 2: Run the script
python3 /mnt/user-data/workspace/analyze.pyThe bash tool runs commands inside the sandbox. The agent can install packages, run tests, or execute any command.
### File Operations
DeerFlow provides dedicated file tools:
```markdown title="File operation tools"## Read a file```bashread_file /mnt/user-data/uploads/data.csvWrite a file
write_file /mnt/user-data/outputs/report.md "# Sales Report\n\n..."Edit by string replacement
str_replace /mnt/user-data/workspace/config.yaml "debug: false" "debug: true"The `str_replace` tool is particularly useful. It lets agents edit files without rewriting the entire contents.
## Sandbox Modes: Local vs Docker
DeerFlow supports two sandbox modes. I tested both.
### Local Execution Mode
Local mode runs commands directly on the host filesystem:
```yaml title="config.yaml"sandbox: use: deerflow.sandbox.local:LocalSandboxProviderI found this mode:
- Faster startup (no container overhead)
- Simpler setup (no Docker required)
- Less isolation (files are real files on the host)
Good for development, risky for production.
Docker Execution Mode
Docker mode runs each thread in its own container:
sandbox: use: deerflow.community.aio_sandbox:AioSandboxProvider image: ghcr.io/bytedance/deer-flow-sandbox:latestI tested the Docker mode:
# Start a thread with Docker sandbox# DeerFlow spins up a containerdocker ps
# OutputCONTAINER ID IMAGE STATUSabc123def456 deer-flow-sandbox:latest Up 2 minutesEach thread gets its own container. When the thread ends, the container is destroyed.
Kubernetes via Provisioner
For multi-user deployments, DeerFlow supports Kubernetes:
sandbox: use: deerflow.community.aio_sandbox:AioSandboxProvider provisioner_url: http://provisioner:8002The provisioner manages pod lifecycle. This scales to thousands of concurrent users.
How Path Translation Works
I dug into the implementation to understand the translation layer.
The Translation Function
def replace_virtual_path(virtual_path: str, thread_id: str) -> str: """Translate virtual path to physical path"""
# Virtual paths the agent sees path_mappings = { '/mnt/user-data/workspace': f'threads/{thread_id}/user-data/workspace', '/mnt/user-data/uploads': f'threads/{thread_id}/user-data/uploads', '/mnt/user-data/outputs': f'threads/{thread_id}/user-data/outputs', '/mnt/skills': 'deer-flow/skills/' }
for virtual, physical in path_mappings.items(): if virtual_path.startswith(virtual): return virtual_path.replace(virtual, physical)
# Block access to paths outside sandbox raise SecurityError(f"Path {virtual_path} outside sandbox boundaries")Middleware Integration
The SandboxMiddleware hooks into the agent lifecycle:
class SandboxMiddleware: async def before_agent_run(self, state: dict): # 1. Acquire sandbox sandbox_id = await self.sandbox_provider.acquire() state['sandbox_id'] = sandbox_id
# 2. Store path mappings state['path_mappings'] = self.get_path_mappings(sandbox_id)
async def on_tool_call(self, tool_name: str, args: dict, state: dict): # 3. Translate paths in tool arguments if 'path' in args: args['path'] = self.translate_path( args['path'], state['sandbox_id'] )
async def after_agent_run(self, state: dict): # 4. Optional cleanup if self.auto_cleanup: await self.sandbox_provider.release(state['sandbox_id'])This ensures every file operation goes through translation.
Detecting Sandbox Mode in Skills
Sometimes skills need to know which mode they’re running in:
from deerflow.sandbox import is_local_sandbox
async def my_skill(state: dict): if is_local_sandbox(state): # Local mode - can access host filesystem directly result = subprocess.run(['ls', '/home/user/data']) else: # Docker mode - use sandbox tools result = await bash("ls /mnt/user-data/workspace")This allows skills to work in both development and production.
Real Example: Data Analysis Pipeline
I built a complete data analysis pipeline using the sandbox.
Step 1: User Uploads Data
The user uploads sales.csv to the thread’s uploads directory.
Step 2: Agent Analyzes Data
I prompted the agent:
"Analyze the sales.csv file. Calculate monthly revenue, identify top products, and generate a markdown report."The agent executed this sequence:
# Step 1: Check the datahead -20 /mnt/user-data/uploads/sales.csv
# Step 2: Create analysis scriptcat > /mnt/user-data/workspace/analyze.py << 'EOF'import pandas as pdfrom datetime import datetime
# Load datadf = pd.read_csv('/mnt/user-data/uploads/sales.csv')df['date'] = pd.to_datetime(df['date'])df['month'] = df['date'].dt.to_period('M')
# Monthly revenuemonthly = df.groupby('month')['amount'].sum()
# Top productsproducts = df.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)
# Generate reportreport = f"""# Sales Analysis Report
## Monthly Revenue{monthly.to_markdown()}
## Top 10 Products{products.to_markdown()}"""
with open('/mnt/user-data/outputs/report.md', 'w') as f: f.write(report)EOF
# Step 3: Run analysispython3 /mnt/user-data/workspace/analyze.py
# Step 4: Show resultscat /mnt/user-data/outputs/report.mdStep 3: Results Persist
The report is saved in the thread’s outputs directory. The user can download it or continue refining it in subsequent messages.
Comparison: With and Without Sandbox
I compared the experience:
| Feature | Without Sandbox | With DeerFlow Sandbox |
|---|---|---|
| Code execution | API calls only, no actual running | Full bash access in isolation |
| File persistence | None between messages | Per-thread storage |
| Isolation | Risk to host system | Thread-specific containers |
| Safety | Must trust agent completely | Virtual paths prevent escape |
| Debugging | Black box | Inspect files anytime |
The sandbox transforms the agent from a text generator into an actual worker.
Issues I Encountered
Not everything was smooth.
Docker Resource Usage
The Docker mode consumes significant resources. Each container needs:
- ~500MB base image
- ~256MB RAM minimum
- CPU time for execution
I recommend at least 8GB RAM for running multiple threads.
Cold Start Latency
First execution in a new thread takes 5-10 seconds to spin up the container. Subsequent commands are fast.
Skill Debugging
When a skill fails inside the sandbox, error messages can be cryptic. I had to check container logs:
# Check sandbox container logsdocker logs deer-flow-sandbox-abc123Path Confusion
Sometimes I forgot to use virtual paths and used real paths by mistake. The sandbox correctly blocked these, but debugging took time.
When to Use Each Mode
Based on my testing:
Use Local Mode when:
- Developing and debugging skills
- Running single-threaded locally
- You trust the code being executed
- Speed matters more than isolation
Use Docker Mode when:
- Running in production
- Multiple users share the system
- Executing untrusted code
- Compliance requires isolation
Summary
DeerFlow’s sandbox system solves the execution gap that most agent frameworks ignore. By providing isolated execution environments with virtual filesystems, agents can create, modify, and execute files without risking the host system.
The key insight is the virtual path translation layer. Agents see /mnt/user-data/... paths, but the system translates these to thread-specific directories. This enables isolation without requiring agents to understand filesystem details.
For building agents that need to do real work—data analysis, code generation, content creation—the sandbox is essential infrastructure. It’s the difference between an agent that talks about doing things and an agent that actually does them.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments