How Can You Run Code Safely with AI Agents Using Sandboxed Execution?
I watched my AI agent delete a production database. Not on purpose - it was testing a connection string parsing function and accidentally executed DROP DATABASE production; on my local PostgreSQL instance. I had backups, but the hour of downtime taught me a hard lesson: never let AI agents run code on your actual system.
The Reddit thread “10 MCP servers that together give your AI agent an actual brain” highlighted E2B Code Interpreter as the solution: “sandboxed code execution. Agent can write and run code in isolation. Great for data analysis, testing snippets, anything you don’t want touching your actual system.” This is exactly what I needed.
The Problem: Why Unrestricted Code Execution Is Dangerous
When you give AI agents the ability to execute code, you open yourself to several critical risks:
System Access: Unrestricted code execution lets agents read, modify, or delete any file the user can access. An agent testing a file parsing function could accidentally corrupt your project files.
Data Exposure: Environment variables often contain API keys, database credentials, and secrets. An agent running os.environ or reading config files can expose sensitive data.
Network Risks: Code can make HTTP requests to any endpoint. An agent debugging an API call might accidentally hit a production endpoint with destructive operations.
Resource Abuse: Infinite loops, memory leaks, or CPU-intensive operations can crash your system or freeze your IDE.
Malicious Code: Even unintentional bugs can cause data corruption. The agent I mentioned earlier was supposed to test parsing, not execute raw SQL.
I initially tried Docker containers with volume mounts:
docker run -v $(pwd):/workspace python:3.11 python -c "import osprint(os.listdir('/workspace'))# Agent can now read and modify all project files"This was a mistake. The volume mount gave the container full access to my project directory. When the agent ran a cleanup script that deleted “temporary” files, it wiped my uncommitted changes.
The Solution: E2B Code Interpreter
E2B Code Interpreter provides purpose-built sandboxed execution for AI agents. Each code execution runs in an isolated, ephemeral environment with no access to your host filesystem, environment variables, or network (unless explicitly granted).
Basic Setup
First, install the E2B package:
pip install e2b-code-interpreterThen create a sandboxed session:
from e2b_code_interpreter import Sandbox
# Create an isolated sandboxsandbox = Sandbox()
# Run code in the sandboxexecution = sandbox.run_code("print('Hello from sandbox!')")print(execution.text)# Output: Hello from sandbox!
# The sandbox is isolated - no access to host systemsandbox.run_code("import os; print(os.listdir('/'))")# Output: Lists sandbox filesystem, NOT your host
sandbox.close()The sandbox runs in a secure cloud environment. Your agent can execute any Python code, but it cannot touch your local files or environment.
MCP Integration for AI Agents
The real power comes from integrating E2B as an MCP server. Your AI agent can then call it as a tool:
{ "mcpServers": { "e2b-code-interpreter": { "command": "npx", "args": ["-y", "@e2b/code-interpreter-mcp"], "env": { "E2B_API_KEY": "your-api-key-here" } } }}After restarting Claude Desktop, your agent can execute code in the sandbox:
User: Analyze this CSV file and create a bar chart of sales by region.
Agent: I'll use the code interpreter to analyze the data.[Agent writes and executes Python code in the sandbox][Agent returns the chart without ever touching your local files]Data Analysis Example
Here’s how I use E2B for data analysis without security concerns:
from e2b_code_interpreter import Sandboximport pandas as pd
sandbox = Sandbox()
# Upload data to sandbox (not your local system)csv_content = """product,sales,regionWidget A,1500,NorthWidget B,2300,SouthWidget C,1800,EastWidget D,1200,West"""
sandbox.filesystem.write("/data/sales.csv", csv_content)
# Agent can analyze without risking local dataanalysis_code = """import pandas as pdimport matplotlib.pyplot as plt
df = pd.read_csv('/data/sales.csv')summary = df.groupby('region')['sales'].sum().sort_values(ascending=False)print(summary)print(f"\nTotal sales: ${df['sales'].sum():,}")print(f"Best region: {summary.index[0]} (${summary.iloc[0]:,})")"""
result = sandbox.run_code(analysis_code)print(result.text)
sandbox.close()regionSouth 2300East 1800North 1500West 1200
Total sales: $6,800Best region: South ($2,300)The agent processed the data and generated insights without ever accessing my local filesystem or network.
Multi-Language Support
E2B supports multiple languages, not just Python:
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
# JavaScriptjs_result = sandbox.run_code("console.log(2 + 2)", language="javascript")print(f"JS result: {js_result.text}")
# Rr_result = sandbox.run_code("mean(c(1, 2, 3, 4, 5))", language="r")print(f"R result: {r_result.text}")
sandbox.close()File Operations Within Sandbox
The sandbox has its own filesystem. Agents can create, read, and modify files within the sandbox:
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
# Create a file in the sandboxsandbox.run_code("""with open('/tmp/notes.txt', 'w') as f: f.write('Agent working notes\\n') f.write('Task: Analyze quarterly data\\n')""")
# Read it backresult = sandbox.run_code("""with open('/tmp/notes.txt', 'r') as f: print(f.read())""")print(result.text)
# Download results from sandboxsandbox.filesystem.download("/tmp/notes.txt", "./downloaded_notes.txt")# Now the file is in your local directory, but only because# you explicitly downloaded it
sandbox.close()This is key: the sandbox has its own filesystem, and you explicitly choose what to bring back to your host system.
Why This Matters
Enables Powerful Use Cases: Data analysis on user-uploaded files, code generation and testing in real-time, multi-step computational workflows, scientific computing and simulations, automated report generation.
Security by Design: No host system access means no privilege escalation. Ephemeral environments prevent persistent threats. Resource limits prevent denial-of-service. Network isolation prevents data exfiltration.
Peace of Mind: I can let agents experiment with code without worrying about accidents. When an agent wants to test a database migration script, it runs in the sandbox first.
Common Mistakes
I made several mistakes before getting this right:
Using Docker Without Proper Isolation: My initial approach with volume mounts defeated the purpose. The correct approach is to use Docker with no volume mounts, or better yet, use E2B which handles this properly.
# WRONG: Volume mounts expose host filesystemdocker run -v $(pwd):/workspace python:3.11 python script.py
# BETTER: No volume mounts, but still needs network limitsdocker run --network none python:3.11 python script.py
# BEST: Use E2B sandboxsandbox = Sandbox()sandbox.run_code("your code here")Granting Persistent Storage to Sandboxes: Some solutions create persistent containers that accumulate state. This is risky because buggy or malicious code from one session can affect future sessions. E2B’s ephemeral approach destroys all state when the session ends.
Over-Permissive Network Access: Even in a sandbox, network access can be dangerous. E2B allows you to control network access:
from e2b_code_interpreter import Sandbox
# Default: no network accesssandbox = Sandbox()
# If you need network access, enable it explicitlysandbox_networked = Sandbox(network_access=True)
# Now the agent can make HTTP requestssandbox_networked.run_code("""import requestsresponse = requests.get('https://api.example.com/data')print(response.json())""")
sandbox.close()sandbox_networked.close()Ignoring Resource Limits: Without limits, an agent can consume all available memory or CPU. E2B enforces automatic resource limits:
from e2b_code_interpreter import Sandbox
sandbox = Sandbox(timeout=30) # 30 second timeout
# This will be terminated after 30 secondstry: result = sandbox.run_code("""import timewhile True: time.sleep(1)""")except TimeoutError: print("Execution timed out - resource limit worked!")Not Validating Outputs: Even sandboxed code can produce unexpected results. Always validate:
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()result = sandbox.run_code("x = 1/0")
if result.error: print(f"Code failed: {result.error}") # Handle error appropriatelyelse: # Validate result before using if result.text and len(result.text) < 10000: print(result.text) else: print("Unexpected output size")Putting It All Together
Here’s my current workflow for safe AI agent code execution:
from e2b_code_interpreter import Sandboximport json
def analyze_data_safely(csv_data: str, analysis_prompt: str): """Let AI agent analyze data in a sandboxed environment.""" sandbox = Sandbox(timeout=60)
try: # Upload data to sandbox sandbox.filesystem.write("/data/input.csv", csv_data)
# Agent writes analysis code based on prompt analysis_code = f"""import pandas as pdimport json
df = pd.read_csv('/data/input.csv')
# Analysis based on prompt: {analysis_prompt}result = {{ 'rows': len(df), 'columns': list(df.columns), 'summary': df.describe().to_dict()}}
print(json.dumps(result, indent=2, default=str))"""
result = sandbox.run_code(analysis_code)
if result.error: return {'error': result.error}
# Parse and validate output try: output = json.loads(result.text) return output except json.JSONDecodeError: return {'error': 'Invalid JSON output', 'raw': result.text}
finally: # Always clean up sandbox.close()
# Usagecsv = """name,score,departmentAlice,85,EngineeringBob,92,SalesCarol,78,Engineering"""
result = analyze_data_safely(csv, "Show basic statistics")print(result)This pattern ensures that:
- All code runs in isolation
- Resources are limited
- Outputs are validated
- Sandboxes are always cleaned up
The Reddit thread was right: combining “memory + reasoning + code execution + web access” creates powerful AI agents. But code execution without sandboxing is a security disaster waiting to happen. E2B gives you the best of both worlds - powerful execution capabilities with proper isolation.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments