How to Sandbox AI Agents Safely: Prevent Autonomous Agent Disasters
Problem
When I ran an autonomous AI agent to “organize my files,” I got this disaster:
$ ls -la ~/Photos/
# Before: 400+ family photos with descriptive namesIMG_20240101_vacation_beach.jpgIMG_20240102_family_dinner.jpgIMG_20240103_kids_playing.jpg...
# After: Agent renamed everything to UUIDs550e8400-e29b-41d4-a716-446655440000.jpg7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg...
# Agent also created this mess~/Photos/ ├── workspace/ │ ├── task_001/ │ │ ├── processed/ │ │ │ ├── batch_a/ │ │ │ │ ├── batch_a1/ │ │ │ │ │ └── ... (12 levels deep!) │ │ │ └── batch_b/ │ │ └── original/ │ └── task_002/ └── optimized/ └── high_contrast/ └── *.jpg (all converted to high-contrast JPEGs)Environment
- Ubuntu 24.04 LTS
- Python 3.11
- OpenAI API (GPT-4)
- Custom autonomous agent script
What happened?
I wanted to automate organizing my photo collection. I wrote a simple AI agent that uses file system access to “organize files efficiently.”
Here’s my initial agent setup:
import osimport shutilfrom openai import OpenAI
class AutonomousAgent: def __init__(self, base_path): self.base_path = base_path self.client = OpenAI()
def organize_files(self, instruction): """Let AI decide how to organize files""" response = self.client.chat.completions.create( model="gpt-4", messages=[{ "role": "user", "content": f"Instruction: {instruction}\n" f"Current files: {os.listdir(self.base_path)}\n" f"Propose a file organization strategy." }] )
strategy = response.choices[0].message.content print(f"AI Strategy: {strategy}")
# Execute strategy (DANGEROUS!) self._execute_strategy(strategy)
def _execute_strategy(self, strategy): """Parse and execute file operations""" # Agent decided to: # 1. Rename files to UUIDs for "uniqueness" # 2. Create deep folder hierarchy for "organization" # 3. Convert to high-contrast JPEGs for "optimization"
files = os.listdir(self.base_path)
for file in files: if file.endswith('.jpg'): # Generate UUID import uuid new_name = f"{uuid.uuid4()}.jpg"
# Rename old_path = os.path.join(self.base_path, file) new_path = os.path.join(self.base_path, new_name) os.rename(old_path, new_path)
# Create nested directories current = self.base_path for i in range(12): current = os.path.join(current, f"level_{i}") os.makedirs(current, exist_ok=True)
# Convert to high-contrast # (image processing code here)
# Run agentagent = AutonomousAgent("/home/user/Photos")agent.organize_files("Organize these photos efficiently")I can explain the key parts:
organize_files(): Sends file list to GPT-4 for strategy_execute_strategy(): Parses AI response and performs file operations- The AI decided UUIDs were “better” than descriptive names
- It created 12-level deep folders thinking it was “organizing”
When I ran this:
$ python agent.py
AI Strategy: I will organize the photos by:1. Renaming to UUIDs for uniqueness2. Creating a hierarchical structure3. Optimizing image quality
[Agent starts executing...]^C (I tried to stop it but too late)The damage was done in seconds. 400+ photos renamed, nested folders everywhere, and images converted to high-contrast JPEGs.
How to solve it?
I tried to restore from backup, but I realized the fundamental problem: I gave an AI agent unrestricted file system access.
So I started researching sandboxing strategies. Here’s what I learned.
Solution 1: Strict Folder Containment
First, I created a dedicated workspace directory:
#!/bin/bash# Create isolated workspace for agent
WORKSPACE="/tmp/agent-workspace"SOURCE_DIR="/home/user/Documents/tasks"
# Clean workspacerm -rf "$WORKSPACE"mkdir -p "$WORKSPACE"
# Copy only what you need the agent to work oncp -r "$SOURCE_DIR" "$WORKSPACE/"
# Verify isolationecho "Workspace created at: $WORKSPACE"echo "Contents:"ls -la "$WORKSPACE"Then I modified the agent to restrict operations:
import osimport pathlib
class SandboxedAgent: def __init__(self, workspace_path): self.workspace = pathlib.Path(workspace_path).resolve() self._validate_workspace()
def _validate_workspace(self): """Ensure workspace is in /tmp""" if not str(self.workspace).startswith("/tmp/"): raise ValueError( f"Workspace must be in /tmp, got: {self.workspace}" )
def _safe_path(self, path): """Ensure path doesn't escape workspace""" resolved = pathlib.Path(path).resolve()
if not str(resolved).startswith(str(self.workspace)): raise PermissionError( f"Path escape attempted: {path}" )
return resolved
def rename_file(self, src, dst): """Safe rename with path validation""" safe_src = self._safe_path(src) safe_dst = self._safe_path(dst)
os.rename(safe_src, safe_dst) print(f"Renamed: {safe_src} -> {safe_dst}")
def create_directory(self, path): """Safe directory creation""" safe_path = self._safe_path(path)
# Prevent excessive nesting depth = len(safe_path.relative_to(self.workspace).parts) if depth > 5: raise PermissionError( f"Directory depth exceeds limit: {depth}" )
os.makedirs(safe_path, exist_ok=True) print(f"Created: {safe_path}")
# Usageagent = SandboxedAgent("/tmp/agent-workspace")agent.create_directory("/tmp/agent-workspace/organized") # OKagent.create_directory("/etc/passwd") # PermissionError!Now when I run the agent:
$ python setup_workspace.shWorkspace created at: /tmp/agent-workspace
$ python sandboxed_agent.pyCreated: /tmp/agent-workspace/organizedCreated: /tmp/agent-workspace/organized/by_date
# Try to escapeTraceback (most recent call last): ...PermissionError: Path escape attempted: ../../../etc/passwdThe agent is now contained to /tmp/agent-workspace. Even if it goes rogue, it can’t touch my home directory.
Solution 2: Docker Container Isolation
Folder containment helps, but Docker provides stronger isolation. Here’s my Docker setup:
FROM python:3.11-slim
# Create unprivileged userRUN useradd -m -s /bin/bash agentuser
# Create workspace directoryRUN mkdir /workspace && chown agentuser:agentuser /workspace
# Install dependenciesCOPY --chown=agentuser:agentuser requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
# Switch to non-root userUSER agentuserWORKDIR /workspace
# Copy agent codeCOPY --chown=agentuser:agentuser agent.py .
# Default to sandboxed modeCMD ["python", "agent.py", "--sandbox", "/workspace"]Build and run with strict limits:
#!/bin/bash# Build containerdocker build -t sandboxed-agent .
# Run with minimal permissionsdocker run -it \ --rm \ --name agent-session \ --mount type=bind,source=/home/user/tasks,target=/workspace \ --read-only \ --memory=2g \ --cpus=2 \ --security-opt=no-new-privileges \ --cap-drop=ALL \ --network=none \ sandboxed-agent \ python agent.py --task "organize files"The key options:
--read-only: Base filesystem is read-only (can’t modify system files)--memory=2g --cpus=2: Prevent runaway resource consumption--security-opt=no-new-privileges: Can’t escalate privileges--cap-drop=ALL: Drop all Linux capabilities--network=none: No internet access
Now test the agent:
$ ./run_sandboxed.sh
[Agent output]Workspace: /workspaceFiles: file1.txt, file2.txt, file3.txt
AI Strategy: Organize by date...
[Agent operates only on /workspace]Renamed: file1.txt -> 2024-01-01_file1.txtCreated: /workspace/by_date/
# Try to access host filesystem$ docker exec agent-session ls /homels: cannot access '/home': No such file or directory
# Try network access$ docker exec agent-session ping google.comping: bad address 'google.com'The agent is now completely isolated. Even if it tries something malicious, it’s trapped in the container.
Solution 3: Dry-Run Mode with Operation Logging
I added a dry-run mode to preview operations before execution:
import argparseimport loggingfrom pathlib import Pathfrom datetime import datetime
class DryRunAgent: def __init__(self, workspace, dry_run=True, log_file="operations.log"): self.workspace = Path(workspace).resolve() self.dry_run = dry_run self.log_file = Path(log_file) self.operations = []
# Setup logging logging.basicConfig( filename=self.log_file, level=logging.INFO, format='%(asctime)s - %(message)s' )
def _log(self, operation): """Log operation for audit trail""" self.operations.append(operation) logging.info(operation)
def rename_file(self, src, dst): """Rename with dry-run support""" operation = f"rename: {src} -> {dst}"
if self.dry_run: print(f"[DRY-RUN] Would {operation}") else: src_path = self._safe_path(src) dst_path = self._safe_path(dst) src_path.rename(dst_path) print(f"[EXECUTE] {operation}")
self._log(operation)
def batch_rename(self, file_mapping): """Batch rename with preview""" print(f"\n{'='*60}") print(f"Preview: {len(file_mapping)} operations") print(f"{'='*60}")
for src, dst in file_mapping.items(): print(f" {src} -> {dst}")
print(f"{'='*60}\n")
if self.dry_run: print("DRY-RUN MODE - No changes will be made") for src, dst in file_mapping.items(): self.rename_file(src, dst) return
# Require confirmation confirm = input("Execute these operations? (yes/no): ") if confirm.lower() != 'yes': print("Aborted by user") return
for src, dst in file_mapping.items(): self.rename_file(src, dst)
print(f"\nCompleted. Log saved to: {self.log_file}")
def _safe_path(self, path): """Ensure path is within workspace""" resolved = Path(path).resolve() if not str(resolved).startswith(str(self.workspace)): raise PermissionError(f"Path escape: {path}") return resolved
# Usageif __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--workspace", required=True) parser.add_argument("--dry-run", action="store_true", default=True) parser.add_argument("--log-file", default="operations.log") args = parser.parse_args()
agent = DryRunAgent( workspace=args.workspace, dry_run=args.dry_run, log_file=args.log_file )
# Simulate AI agent decision file_mapping = { "photo1.jpg": "550e8400-e29b-41d4-a716-446655440000.jpg", "photo2.jpg": "7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg", }
agent.batch_rename(file_mapping)Now I can preview before executing:
# First run: Dry-run to see what will happen$ python dryrun_agent.py --workspace /tmp/agent-workspace --dry-run
============================================================Preview: 2 operations============================================================ photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg============================================================
DRY-RUN MODE - No changes will be made[DRY-RUN] Would rename: photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg[DRY-RUN] Would rename: photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
# Check the log$ cat operations.log2025-02-04 14:30:15 - rename: photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg2025-02-04 14:30:15 - rename: photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
# Real execution (without --dry-run flag)$ python dryrun_agent.py --workspace /tmp/agent-workspace
============================================================Preview: 2 operations============================================================ photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg============================================================
Execute these operations? (yes/no): noAborted by userThe dry-run mode saved me from another disaster. When I saw the UUID renaming strategy, I immediately said “no” and modified the AI prompt.
Solution 4: Firejail OS-Level Sandbox
For even stronger isolation without Docker, I use Firejail:
# Firejail profile for AI agents# Save as /etc/firejail/ai-agent.profile
# Private filesystemprivate-bin python3,agentprivate-devprivate-etc none
# Disable hardware accessnodvdnogroupsnoinputnou2fnovideonoweek
# Security featuresseccompnet noneprotocol unix,inet
# Namespacesipc-namespacecgroup namespace
# Resource limitsrlimit-nofile 1024Run the agent with Firejail:
# First, create the profile$ sudo cp ai-agent.profile /etc/firejail/ai-agent.profile
# Run agent in sandbox$ firejail \ --profile=ai-agent.profile \ --private=/tmp/agent-workspace \ python3 agent.py --task "organize files"
# Firejail outputParent pid 12345, child pid 12346Private /tmp installed in /tmp/agent-workspaceBlacklist violations are logged to syslogCommand started in /tmp/agent-workspace
[Agent runs in isolated environment]Now the agent:
- Has no network access (
net none) - Can’t access system files (
private-etc none) - Runs in separate filesystem namespace
- Logs all blacklist violations to syslog
Test the isolation:
# Try to access home directory$ firejail --profile=ai-agent.profile ls ~/ls: cannot access '/home/user': No such file or directory
# Try to access network$ firejail --profile=ai-agent.profile ping google.comping: socket: Operation not permitted
# Check what the agent can see$ firejail --profile=ai-agent.profile ls /bin dev etc lib proc run sys tmp usr var# (only minimal filesystem, no /home, no /root)The reason
I think the key reason for the disaster is:
Autonomous AI agents follow instructions literally, not intuitively
When I told the agent to “organize files efficiently,” it interpreted this as:
- UUIDs are “unique” therefore “efficient”
- Deep folder hierarchies are “organized”
- High-contrast JPEGs are “optimized”
The AI didn’t understand these were bad decisions. It just followed the instruction literally.
The sandboxing prevents damage, but doesn’t solve the core problem. I need to:
- Give more specific instructions
- Always use dry-run mode first
- Review the AI’s strategy before execution
- Provide examples of “good” organization
Summary
In this post, I showed how to sandbox AI agents safely to prevent file system disasters. The key point is using multi-layer containment: folder isolation, permission scoping, containerization, and dry-run modes.
I covered four solutions:
- Strict folder containment with path validation
- Docker containers with read-only filesystems and dropped capabilities
- Dry-run mode with operation logging and confirmation
- Firejail sandbox with OS-level isolation
The combination of these strategies prevents autonomous agents from causing damage, even when they make bad decisions. Always sandbox before you run, not after disaster strikes.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Firejail Documentation
- 👨💻 Docker Security Best Practices
- 👨💻 Linux Capabilities Guide
- 👨💻 Reddit Discussion on AI Agent Disasters
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments