How DeerFlow Sandbox Execution Enables Safe Code Running

Mar 16, 2026

Purpose

I wanted my AI agents to actually execute code, not just talk about it. Most agent frameworks stop at API calls—they can generate Python scripts but can’t run them. They can suggest file changes but can’t apply them.

DeerFlow solved this with its sandbox execution system. This post explains how it works and why it matters for building agents that do real work.

The Problem: Agents Without Execution

I spent weeks building an agent that could analyze data. The workflow should have been simple:

User uploads a CSV file
Agent writes a Python script to analyze it
Agent runs the script
Agent saves the results

Steps 1 and 2 were easy. Steps 3 and 4 were impossible. My agent could only generate text responses. It had no way to:

Execute the Python script it wrote
Save files to disk
Read files the user uploaded
Persist anything between messages

I tried giving it shell access. That was a disaster:

# What could go wrong?
> agent.execute("rm -rf /")
# ...everything

I needed isolation. I needed sandboxing.

What DeerFlow’s Sandbox Provides

DeerFlow creates an isolated execution environment per conversation thread. Each thread gets:

Its own filesystem - workspace, uploads, and outputs directories
Virtual path translation - agents see /mnt/user-data/... not real paths
Execution tools - bash, read_file, write_file, str_replace
Isolation modes - local or Docker-based

Thread-Isolated Directory Structure

When I started a new conversation thread, DeerFlow created this structure:

backend/.deer-flow/threads/
├── thread-abc123/
│   └── user-data/
│       ├── workspace/    # Agent's working directory
│       ├── uploads/      # User-uploaded files
│       └── outputs/      # Generated deliverables
├── thread-def456/
│   └── user-data/
│       └── ...
└── thread-xyz789/
    └── user-data/
        └── ...

Each thread is completely isolated. Files from one conversation never leak to another.

Virtual Path System

This is the clever part. Agents never see real filesystem paths. They operate through virtual paths:

Virtual Path (Agent Sees)	Physical Path (Actual Location)
`/mnt/user-data/workspace`	`threads/{thread_id}/user-data/workspace`
`/mnt/user-data/uploads`	`threads/{thread_id}/user-data/uploads`
`/mnt/user-data/outputs`	`threads/{thread_id}/user-data/outputs`
`/mnt/skills`	`deer-flow/skills/`

I tested this by asking an agent to list files:

"List all files in my workspace directory"

The agent ran:

ls -la /mnt/user-data/workspace

But behind the scenes, DeerFlow translated that to:

ls -la backend/.deer-flow/threads/abc123/user-data/workspace

The agent never knows the real path. This prevents path traversal attacks and keeps the host system safe.

Sandbox Tools Available

I explored the tools DeerFlow provides to agents inside the sandbox.

Bash Execution

The most powerful tool is bash execution:

## Step 1: Create a Python script
```bash
cat > /mnt/user-data/workspace/analyze.py << 'EOF'
import pandas as pd
import json

# Read uploaded data
df = pd.read_csv('/mnt/user-data/uploads/sales.csv')

# Calculate statistics
stats = {
    'total_sales': df['amount'].sum(),
    'avg_sale': df['amount'].mean(),
    'count': len(df)
}

# Save results
with open('/mnt/user-data/outputs/results.json', 'w') as f:
    json.dump(stats, f)

print("Analysis complete!")
EOF

Step 2: Run the script

python3 /mnt/user-data/workspace/analyze.py

The bash tool runs commands inside the sandbox. The agent can install packages, run tests, or execute any command.

### File Operations

DeerFlow provides dedicated file tools:

```markdown title="File operation tools"
## Read a file
```bash
read_file /mnt/user-data/uploads/data.csv

Write a file

write_file /mnt/user-data/outputs/report.md "# Sales Report\n\n..."

Edit by string replacement

str_replace /mnt/user-data/workspace/config.yaml "debug: false" "debug: true"

The `str_replace` tool is particularly useful. It lets agents edit files without rewriting the entire contents.

## Sandbox Modes: Local vs Docker

DeerFlow supports two sandbox modes. I tested both.

### Local Execution Mode

Local mode runs commands directly on the host filesystem:

```yaml title="config.yaml"
sandbox:
  use: deerflow.sandbox.local:LocalSandboxProvider

I found this mode:

Faster startup (no container overhead)
Simpler setup (no Docker required)
Less isolation (files are real files on the host)

Good for development, risky for production.

Docker Execution Mode

Docker mode runs each thread in its own container:

sandbox:
  use: deerflow.community.aio_sandbox:AioSandboxProvider
  image: ghcr.io/bytedance/deer-flow-sandbox:latest

I tested the Docker mode:

# Start a thread with Docker sandbox
# DeerFlow spins up a container
docker ps

# Output
CONTAINER ID   IMAGE                              STATUS
abc123def456   deer-flow-sandbox:latest           Up 2 minutes

Each thread gets its own container. When the thread ends, the container is destroyed.

Kubernetes via Provisioner

For multi-user deployments, DeerFlow supports Kubernetes:

sandbox:
  use: deerflow.community.aio_sandbox:AioSandboxProvider
  provisioner_url: http://provisioner:8002

The provisioner manages pod lifecycle. This scales to thousands of concurrent users.

How Path Translation Works

I dug into the implementation to understand the translation layer.

The Translation Function

def replace_virtual_path(virtual_path: str, thread_id: str) -> str:
    """Translate virtual path to physical path"""

    # Virtual paths the agent sees
    path_mappings = {
        '/mnt/user-data/workspace': f'threads/{thread_id}/user-data/workspace',
        '/mnt/user-data/uploads': f'threads/{thread_id}/user-data/uploads',
        '/mnt/user-data/outputs': f'threads/{thread_id}/user-data/outputs',
        '/mnt/skills': 'deer-flow/skills/'
    }

    for virtual, physical in path_mappings.items():
        if virtual_path.startswith(virtual):
            return virtual_path.replace(virtual, physical)

    # Block access to paths outside sandbox
    raise SecurityError(f"Path {virtual_path} outside sandbox boundaries")

Middleware Integration

The SandboxMiddleware hooks into the agent lifecycle:

class SandboxMiddleware:
    async def before_agent_run(self, state: dict):
        # 1. Acquire sandbox
        sandbox_id = await self.sandbox_provider.acquire()
        state['sandbox_id'] = sandbox_id

        # 2. Store path mappings
        state['path_mappings'] = self.get_path_mappings(sandbox_id)

    async def on_tool_call(self, tool_name: str, args: dict, state: dict):
        # 3. Translate paths in tool arguments
        if 'path' in args:
            args['path'] = self.translate_path(
                args['path'],
                state['sandbox_id']
            )

    async def after_agent_run(self, state: dict):
        # 4. Optional cleanup
        if self.auto_cleanup:
            await self.sandbox_provider.release(state['sandbox_id'])

This ensures every file operation goes through translation.

Detecting Sandbox Mode in Skills

Sometimes skills need to know which mode they’re running in:

from deerflow.sandbox import is_local_sandbox

async def my_skill(state: dict):
    if is_local_sandbox(state):
        # Local mode - can access host filesystem directly
        result = subprocess.run(['ls', '/home/user/data'])
    else:
        # Docker mode - use sandbox tools
        result = await bash("ls /mnt/user-data/workspace")

This allows skills to work in both development and production.

Real Example: Data Analysis Pipeline

I built a complete data analysis pipeline using the sandbox.

Step 1: User Uploads Data

The user uploads sales.csv to the thread’s uploads directory.

Step 2: Agent Analyzes Data

I prompted the agent:

"Analyze the sales.csv file. Calculate monthly revenue, identify top products, and generate a markdown report."

The agent executed this sequence:

# Step 1: Check the data
head -20 /mnt/user-data/uploads/sales.csv

# Step 2: Create analysis script
cat > /mnt/user-data/workspace/analyze.py << 'EOF'
import pandas as pd
from datetime import datetime

# Load data
df = pd.read_csv('/mnt/user-data/uploads/sales.csv')
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.to_period('M')

# Monthly revenue
monthly = df.groupby('month')['amount'].sum()

# Top products
products = df.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)

# Generate report
report = f"""# Sales Analysis Report

## Monthly Revenue
{monthly.to_markdown()}

## Top 10 Products
{products.to_markdown()}
"""

with open('/mnt/user-data/outputs/report.md', 'w') as f:
    f.write(report)
EOF

# Step 3: Run analysis
python3 /mnt/user-data/workspace/analyze.py

# Step 4: Show results
cat /mnt/user-data/outputs/report.md

Step 3: Results Persist

The report is saved in the thread’s outputs directory. The user can download it or continue refining it in subsequent messages.

Comparison: With and Without Sandbox

I compared the experience:

Feature	Without Sandbox	With DeerFlow Sandbox
Code execution	API calls only, no actual running	Full bash access in isolation
File persistence	None between messages	Per-thread storage
Isolation	Risk to host system	Thread-specific containers
Safety	Must trust agent completely	Virtual paths prevent escape
Debugging	Black box	Inspect files anytime

The sandbox transforms the agent from a text generator into an actual worker.

Issues I Encountered

Not everything was smooth.

Docker Resource Usage

The Docker mode consumes significant resources. Each container needs:

~500MB base image
~256MB RAM minimum
CPU time for execution

I recommend at least 8GB RAM for running multiple threads.

Cold Start Latency

First execution in a new thread takes 5-10 seconds to spin up the container. Subsequent commands are fast.

Skill Debugging

When a skill fails inside the sandbox, error messages can be cryptic. I had to check container logs:

# Check sandbox container logs
docker logs deer-flow-sandbox-abc123

Path Confusion

Sometimes I forgot to use virtual paths and used real paths by mistake. The sandbox correctly blocked these, but debugging took time.

When to Use Each Mode

Based on my testing:

Use Local Mode when:

Developing and debugging skills
Running single-threaded locally
You trust the code being executed
Speed matters more than isolation

Use Docker Mode when:

Running in production
Multiple users share the system
Executing untrusted code
Compliance requires isolation

Summary

DeerFlow’s sandbox system solves the execution gap that most agent frameworks ignore. By providing isolated execution environments with virtual filesystems, agents can create, modify, and execute files without risking the host system.

The key insight is the virtual path translation layer. Agents see /mnt/user-data/... paths, but the system translates these to thread-specific directories. This enables isolation without requiring agents to understand filesystem details.

For building agents that need to do real work—data analysis, code generation, content creation—the sandbox is essential infrastructure. It’s the difference between an agent that talks about doing things and an agent that actually does them.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!