Skip to content

How DeerFlow Sandbox Execution Enables Safe Code Running

Purpose

I wanted my AI agents to actually execute code, not just talk about it. Most agent frameworks stop at API calls—they can generate Python scripts but can’t run them. They can suggest file changes but can’t apply them.

DeerFlow solved this with its sandbox execution system. This post explains how it works and why it matters for building agents that do real work.

The Problem: Agents Without Execution

I spent weeks building an agent that could analyze data. The workflow should have been simple:

  1. User uploads a CSV file
  2. Agent writes a Python script to analyze it
  3. Agent runs the script
  4. Agent saves the results

Steps 1 and 2 were easy. Steps 3 and 4 were impossible. My agent could only generate text responses. It had no way to:

  • Execute the Python script it wrote
  • Save files to disk
  • Read files the user uploaded
  • Persist anything between messages

I tried giving it shell access. That was a disaster:

Terminal
# What could go wrong?
> agent.execute("rm -rf /")
# ...everything

I needed isolation. I needed sandboxing.

What DeerFlow’s Sandbox Provides

DeerFlow creates an isolated execution environment per conversation thread. Each thread gets:

  1. Its own filesystem - workspace, uploads, and outputs directories
  2. Virtual path translation - agents see /mnt/user-data/... not real paths
  3. Execution tools - bash, read_file, write_file, str_replace
  4. Isolation modes - local or Docker-based

Thread-Isolated Directory Structure

When I started a new conversation thread, DeerFlow created this structure:

Sandbox Directory Layout
backend/.deer-flow/threads/
├── thread-abc123/
│ └── user-data/
│ ├── workspace/ # Agent's working directory
│ ├── uploads/ # User-uploaded files
│ └── outputs/ # Generated deliverables
├── thread-def456/
│ └── user-data/
│ └── ...
└── thread-xyz789/
└── user-data/
└── ...

Each thread is completely isolated. Files from one conversation never leak to another.

Virtual Path System

This is the clever part. Agents never see real filesystem paths. They operate through virtual paths:

Virtual Path (Agent Sees)Physical Path (Actual Location)
/mnt/user-data/workspacethreads/{thread_id}/user-data/workspace
/mnt/user-data/uploadsthreads/{thread_id}/user-data/uploads
/mnt/user-data/outputsthreads/{thread_id}/user-data/outputs
/mnt/skillsdeer-flow/skills/

I tested this by asking an agent to list files:

Agent Request
"List all files in my workspace directory"

The agent ran:

Agent executes this command
ls -la /mnt/user-data/workspace

But behind the scenes, DeerFlow translated that to:

Actual execution
ls -la backend/.deer-flow/threads/abc123/user-data/workspace

The agent never knows the real path. This prevents path traversal attacks and keeps the host system safe.

Sandbox Tools Available

I explored the tools DeerFlow provides to agents inside the sandbox.

Bash Execution

The most powerful tool is bash execution:

Skill using bash
## Step 1: Create a Python script
```bash
cat > /mnt/user-data/workspace/analyze.py << 'EOF'
import pandas as pd
import json
# Read uploaded data
df = pd.read_csv('/mnt/user-data/uploads/sales.csv')
# Calculate statistics
stats = {
'total_sales': df['amount'].sum(),
'avg_sale': df['amount'].mean(),
'count': len(df)
}
# Save results
with open('/mnt/user-data/outputs/results.json', 'w') as f:
json.dump(stats, f)
print("Analysis complete!")
EOF

Step 2: Run the script

Terminal window
python3 /mnt/user-data/workspace/analyze.py
The bash tool runs commands inside the sandbox. The agent can install packages, run tests, or execute any command.
### File Operations
DeerFlow provides dedicated file tools:
```markdown title="File operation tools"
## Read a file
```bash
read_file /mnt/user-data/uploads/data.csv

Write a file

Terminal window
write_file /mnt/user-data/outputs/report.md "# Sales Report\n\n..."

Edit by string replacement

Terminal window
str_replace /mnt/user-data/workspace/config.yaml "debug: false" "debug: true"
The `str_replace` tool is particularly useful. It lets agents edit files without rewriting the entire contents.
## Sandbox Modes: Local vs Docker
DeerFlow supports two sandbox modes. I tested both.
### Local Execution Mode
Local mode runs commands directly on the host filesystem:
```yaml title="config.yaml"
sandbox:
use: deerflow.sandbox.local:LocalSandboxProvider

I found this mode:

  • Faster startup (no container overhead)
  • Simpler setup (no Docker required)
  • Less isolation (files are real files on the host)

Good for development, risky for production.

Docker Execution Mode

Docker mode runs each thread in its own container:

config.yaml
sandbox:
use: deerflow.community.aio_sandbox:AioSandboxProvider
image: ghcr.io/bytedance/deer-flow-sandbox:latest

I tested the Docker mode:

Terminal
# Start a thread with Docker sandbox
# DeerFlow spins up a container
docker ps
# Output
CONTAINER ID IMAGE STATUS
abc123def456 deer-flow-sandbox:latest Up 2 minutes

Each thread gets its own container. When the thread ends, the container is destroyed.

Kubernetes via Provisioner

For multi-user deployments, DeerFlow supports Kubernetes:

config.yaml
sandbox:
use: deerflow.community.aio_sandbox:AioSandboxProvider
provisioner_url: http://provisioner:8002

The provisioner manages pod lifecycle. This scales to thousands of concurrent users.

How Path Translation Works

I dug into the implementation to understand the translation layer.

The Translation Function

path_translation.py
def replace_virtual_path(virtual_path: str, thread_id: str) -> str:
"""Translate virtual path to physical path"""
# Virtual paths the agent sees
path_mappings = {
'/mnt/user-data/workspace': f'threads/{thread_id}/user-data/workspace',
'/mnt/user-data/uploads': f'threads/{thread_id}/user-data/uploads',
'/mnt/user-data/outputs': f'threads/{thread_id}/user-data/outputs',
'/mnt/skills': 'deer-flow/skills/'
}
for virtual, physical in path_mappings.items():
if virtual_path.startswith(virtual):
return virtual_path.replace(virtual, physical)
# Block access to paths outside sandbox
raise SecurityError(f"Path {virtual_path} outside sandbox boundaries")

Middleware Integration

The SandboxMiddleware hooks into the agent lifecycle:

sandbox_middleware.py
class SandboxMiddleware:
async def before_agent_run(self, state: dict):
# 1. Acquire sandbox
sandbox_id = await self.sandbox_provider.acquire()
state['sandbox_id'] = sandbox_id
# 2. Store path mappings
state['path_mappings'] = self.get_path_mappings(sandbox_id)
async def on_tool_call(self, tool_name: str, args: dict, state: dict):
# 3. Translate paths in tool arguments
if 'path' in args:
args['path'] = self.translate_path(
args['path'],
state['sandbox_id']
)
async def after_agent_run(self, state: dict):
# 4. Optional cleanup
if self.auto_cleanup:
await self.sandbox_provider.release(state['sandbox_id'])

This ensures every file operation goes through translation.

Detecting Sandbox Mode in Skills

Sometimes skills need to know which mode they’re running in:

detect_mode.py
from deerflow.sandbox import is_local_sandbox
async def my_skill(state: dict):
if is_local_sandbox(state):
# Local mode - can access host filesystem directly
result = subprocess.run(['ls', '/home/user/data'])
else:
# Docker mode - use sandbox tools
result = await bash("ls /mnt/user-data/workspace")

This allows skills to work in both development and production.

Real Example: Data Analysis Pipeline

I built a complete data analysis pipeline using the sandbox.

Step 1: User Uploads Data

The user uploads sales.csv to the thread’s uploads directory.

Step 2: Agent Analyzes Data

I prompted the agent:

User Request
"Analyze the sales.csv file. Calculate monthly revenue, identify top products, and generate a markdown report."

The agent executed this sequence:

Terminal
# Step 1: Check the data
head -20 /mnt/user-data/uploads/sales.csv
# Step 2: Create analysis script
cat > /mnt/user-data/workspace/analyze.py << 'EOF'
import pandas as pd
from datetime import datetime
# Load data
df = pd.read_csv('/mnt/user-data/uploads/sales.csv')
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.to_period('M')
# Monthly revenue
monthly = df.groupby('month')['amount'].sum()
# Top products
products = df.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)
# Generate report
report = f"""# Sales Analysis Report
## Monthly Revenue
{monthly.to_markdown()}
## Top 10 Products
{products.to_markdown()}
"""
with open('/mnt/user-data/outputs/report.md', 'w') as f:
f.write(report)
EOF
# Step 3: Run analysis
python3 /mnt/user-data/workspace/analyze.py
# Step 4: Show results
cat /mnt/user-data/outputs/report.md

Step 3: Results Persist

The report is saved in the thread’s outputs directory. The user can download it or continue refining it in subsequent messages.

Comparison: With and Without Sandbox

I compared the experience:

FeatureWithout SandboxWith DeerFlow Sandbox
Code executionAPI calls only, no actual runningFull bash access in isolation
File persistenceNone between messagesPer-thread storage
IsolationRisk to host systemThread-specific containers
SafetyMust trust agent completelyVirtual paths prevent escape
DebuggingBlack boxInspect files anytime

The sandbox transforms the agent from a text generator into an actual worker.

Issues I Encountered

Not everything was smooth.

Docker Resource Usage

The Docker mode consumes significant resources. Each container needs:

  • ~500MB base image
  • ~256MB RAM minimum
  • CPU time for execution

I recommend at least 8GB RAM for running multiple threads.

Cold Start Latency

First execution in a new thread takes 5-10 seconds to spin up the container. Subsequent commands are fast.

Skill Debugging

When a skill fails inside the sandbox, error messages can be cryptic. I had to check container logs:

Terminal
# Check sandbox container logs
docker logs deer-flow-sandbox-abc123

Path Confusion

Sometimes I forgot to use virtual paths and used real paths by mistake. The sandbox correctly blocked these, but debugging took time.

When to Use Each Mode

Based on my testing:

Use Local Mode when:

  • Developing and debugging skills
  • Running single-threaded locally
  • You trust the code being executed
  • Speed matters more than isolation

Use Docker Mode when:

  • Running in production
  • Multiple users share the system
  • Executing untrusted code
  • Compliance requires isolation

Summary

DeerFlow’s sandbox system solves the execution gap that most agent frameworks ignore. By providing isolated execution environments with virtual filesystems, agents can create, modify, and execute files without risking the host system.

The key insight is the virtual path translation layer. Agents see /mnt/user-data/... paths, but the system translates these to thread-specific directories. This enables isolation without requiring agents to understand filesystem details.

For building agents that need to do real work—data analysis, code generation, content creation—the sandbox is essential infrastructure. It’s the difference between an agent that talks about doing things and an agent that actually does them.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments