Skip to content

How Can You Run Code Safely with AI Agents Using Sandboxed Execution?

I watched my AI agent delete a production database. Not on purpose - it was testing a connection string parsing function and accidentally executed DROP DATABASE production; on my local PostgreSQL instance. I had backups, but the hour of downtime taught me a hard lesson: never let AI agents run code on your actual system.

The Reddit thread “10 MCP servers that together give your AI agent an actual brain” highlighted E2B Code Interpreter as the solution: “sandboxed code execution. Agent can write and run code in isolation. Great for data analysis, testing snippets, anything you don’t want touching your actual system.” This is exactly what I needed.

The Problem: Why Unrestricted Code Execution Is Dangerous

When you give AI agents the ability to execute code, you open yourself to several critical risks:

System Access: Unrestricted code execution lets agents read, modify, or delete any file the user can access. An agent testing a file parsing function could accidentally corrupt your project files.

Data Exposure: Environment variables often contain API keys, database credentials, and secrets. An agent running os.environ or reading config files can expose sensitive data.

Network Risks: Code can make HTTP requests to any endpoint. An agent debugging an API call might accidentally hit a production endpoint with destructive operations.

Resource Abuse: Infinite loops, memory leaks, or CPU-intensive operations can crash your system or freeze your IDE.

Malicious Code: Even unintentional bugs can cause data corruption. The agent I mentioned earlier was supposed to test parsing, not execute raw SQL.

I initially tried Docker containers with volume mounts:

Terminal
docker run -v $(pwd):/workspace python:3.11 python -c "
import os
print(os.listdir('/workspace'))
# Agent can now read and modify all project files
"

This was a mistake. The volume mount gave the container full access to my project directory. When the agent ran a cleanup script that deleted “temporary” files, it wiped my uncommitted changes.

The Solution: E2B Code Interpreter

E2B Code Interpreter provides purpose-built sandboxed execution for AI agents. Each code execution runs in an isolated, ephemeral environment with no access to your host filesystem, environment variables, or network (unless explicitly granted).

Basic Setup

First, install the E2B package:

Terminal
pip install e2b-code-interpreter

Then create a sandboxed session:

basic_sandbox.py
from e2b_code_interpreter import Sandbox
# Create an isolated sandbox
sandbox = Sandbox()
# Run code in the sandbox
execution = sandbox.run_code("print('Hello from sandbox!')")
print(execution.text)
# Output: Hello from sandbox!
# The sandbox is isolated - no access to host system
sandbox.run_code("import os; print(os.listdir('/'))")
# Output: Lists sandbox filesystem, NOT your host
sandbox.close()

The sandbox runs in a secure cloud environment. Your agent can execute any Python code, but it cannot touch your local files or environment.

MCP Integration for AI Agents

The real power comes from integrating E2B as an MCP server. Your AI agent can then call it as a tool:

~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"e2b-code-interpreter": {
"command": "npx",
"args": ["-y", "@e2b/code-interpreter-mcp"],
"env": {
"E2B_API_KEY": "your-api-key-here"
}
}
}
}

After restarting Claude Desktop, your agent can execute code in the sandbox:

User: Analyze this CSV file and create a bar chart of sales by region.
Agent: I'll use the code interpreter to analyze the data.
[Agent writes and executes Python code in the sandbox]
[Agent returns the chart without ever touching your local files]

Data Analysis Example

Here’s how I use E2B for data analysis without security concerns:

data_analysis.py
from e2b_code_interpreter import Sandbox
import pandas as pd
sandbox = Sandbox()
# Upload data to sandbox (not your local system)
csv_content = """product,sales,region
Widget A,1500,North
Widget B,2300,South
Widget C,1800,East
Widget D,1200,West"""
sandbox.filesystem.write("/data/sales.csv", csv_content)
# Agent can analyze without risking local data
analysis_code = """
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/data/sales.csv')
summary = df.groupby('region')['sales'].sum().sort_values(ascending=False)
print(summary)
print(f"\nTotal sales: ${df['sales'].sum():,}")
print(f"Best region: {summary.index[0]} (${summary.iloc[0]:,})")
"""
result = sandbox.run_code(analysis_code)
print(result.text)
sandbox.close()
Output
region
South 2300
East 1800
North 1500
West 1200
Total sales: $6,800
Best region: South ($2,300)

The agent processed the data and generated insights without ever accessing my local filesystem or network.

Multi-Language Support

E2B supports multiple languages, not just Python:

multi_language.py
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
# JavaScript
js_result = sandbox.run_code("console.log(2 + 2)", language="javascript")
print(f"JS result: {js_result.text}")
# R
r_result = sandbox.run_code("mean(c(1, 2, 3, 4, 5))", language="r")
print(f"R result: {r_result.text}")
sandbox.close()

File Operations Within Sandbox

The sandbox has its own filesystem. Agents can create, read, and modify files within the sandbox:

sandbox_files.py
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
# Create a file in the sandbox
sandbox.run_code("""
with open('/tmp/notes.txt', 'w') as f:
f.write('Agent working notes\\n')
f.write('Task: Analyze quarterly data\\n')
""")
# Read it back
result = sandbox.run_code("""
with open('/tmp/notes.txt', 'r') as f:
print(f.read())
""")
print(result.text)
# Download results from sandbox
sandbox.filesystem.download("/tmp/notes.txt", "./downloaded_notes.txt")
# Now the file is in your local directory, but only because
# you explicitly downloaded it
sandbox.close()

This is key: the sandbox has its own filesystem, and you explicitly choose what to bring back to your host system.

Why This Matters

Enables Powerful Use Cases: Data analysis on user-uploaded files, code generation and testing in real-time, multi-step computational workflows, scientific computing and simulations, automated report generation.

Security by Design: No host system access means no privilege escalation. Ephemeral environments prevent persistent threats. Resource limits prevent denial-of-service. Network isolation prevents data exfiltration.

Peace of Mind: I can let agents experiment with code without worrying about accidents. When an agent wants to test a database migration script, it runs in the sandbox first.

Common Mistakes

I made several mistakes before getting this right:

Using Docker Without Proper Isolation: My initial approach with volume mounts defeated the purpose. The correct approach is to use Docker with no volume mounts, or better yet, use E2B which handles this properly.

Terminal
# WRONG: Volume mounts expose host filesystem
docker run -v $(pwd):/workspace python:3.11 python script.py
# BETTER: No volume mounts, but still needs network limits
docker run --network none python:3.11 python script.py
# BEST: Use E2B sandbox
sandbox = Sandbox()
sandbox.run_code("your code here")

Granting Persistent Storage to Sandboxes: Some solutions create persistent containers that accumulate state. This is risky because buggy or malicious code from one session can affect future sessions. E2B’s ephemeral approach destroys all state when the session ends.

Over-Permissive Network Access: Even in a sandbox, network access can be dangerous. E2B allows you to control network access:

network_control.py
from e2b_code_interpreter import Sandbox
# Default: no network access
sandbox = Sandbox()
# If you need network access, enable it explicitly
sandbox_networked = Sandbox(network_access=True)
# Now the agent can make HTTP requests
sandbox_networked.run_code("""
import requests
response = requests.get('https://api.example.com/data')
print(response.json())
""")
sandbox.close()
sandbox_networked.close()

Ignoring Resource Limits: Without limits, an agent can consume all available memory or CPU. E2B enforces automatic resource limits:

resource_limits.py
from e2b_code_interpreter import Sandbox
sandbox = Sandbox(timeout=30) # 30 second timeout
# This will be terminated after 30 seconds
try:
result = sandbox.run_code("""
import time
while True:
time.sleep(1)
""")
except TimeoutError:
print("Execution timed out - resource limit worked!")

Not Validating Outputs: Even sandboxed code can produce unexpected results. Always validate:

validate_output.py
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
result = sandbox.run_code("x = 1/0")
if result.error:
print(f"Code failed: {result.error}")
# Handle error appropriately
else:
# Validate result before using
if result.text and len(result.text) < 10000:
print(result.text)
else:
print("Unexpected output size")

Putting It All Together

Here’s my current workflow for safe AI agent code execution:

safe_workflow.py
from e2b_code_interpreter import Sandbox
import json
def analyze_data_safely(csv_data: str, analysis_prompt: str):
"""Let AI agent analyze data in a sandboxed environment."""
sandbox = Sandbox(timeout=60)
try:
# Upload data to sandbox
sandbox.filesystem.write("/data/input.csv", csv_data)
# Agent writes analysis code based on prompt
analysis_code = f"""
import pandas as pd
import json
df = pd.read_csv('/data/input.csv')
# Analysis based on prompt: {analysis_prompt}
result = {{
'rows': len(df),
'columns': list(df.columns),
'summary': df.describe().to_dict()
}}
print(json.dumps(result, indent=2, default=str))
"""
result = sandbox.run_code(analysis_code)
if result.error:
return {'error': result.error}
# Parse and validate output
try:
output = json.loads(result.text)
return output
except json.JSONDecodeError:
return {'error': 'Invalid JSON output', 'raw': result.text}
finally:
# Always clean up
sandbox.close()
# Usage
csv = """name,score,department
Alice,85,Engineering
Bob,92,Sales
Carol,78,Engineering"""
result = analyze_data_safely(csv, "Show basic statistics")
print(result)

This pattern ensures that:

  1. All code runs in isolation
  2. Resources are limited
  3. Outputs are validated
  4. Sandboxes are always cleaned up

The Reddit thread was right: combining “memory + reasoning + code execution + web access” creates powerful AI agents. But code execution without sandboxing is a security disaster waiting to happen. E2B gives you the best of both worlds - powerful execution capabilities with proper isolation.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments