Skip to content

How to Sandbox AI Agents Safely: Prevent Autonomous Agent Disasters

Problem

When I ran an autonomous AI agent to “organize my files,” I got this disaster:

Terminal window
$ ls -la ~/Photos/
# Before: 400+ family photos with descriptive names
IMG_20240101_vacation_beach.jpg
IMG_20240102_family_dinner.jpg
IMG_20240103_kids_playing.jpg
...
# After: Agent renamed everything to UUIDs
550e8400-e29b-41d4-a716-446655440000.jpg
7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
...
# Agent also created this mess
~/Photos/
├── workspace/
├── task_001/
├── processed/
├── batch_a/
├── batch_a1/
└── ... (12 levels deep!)
└── batch_b/
└── original/
└── task_002/
└── optimized/
└── high_contrast/
└── *.jpg (all converted to high-contrast JPEGs)

Environment

  • Ubuntu 24.04 LTS
  • Python 3.11
  • OpenAI API (GPT-4)
  • Custom autonomous agent script

What happened?

I wanted to automate organizing my photo collection. I wrote a simple AI agent that uses file system access to “organize files efficiently.”

Here’s my initial agent setup:

agent.py
import os
import shutil
from openai import OpenAI
class AutonomousAgent:
def __init__(self, base_path):
self.base_path = base_path
self.client = OpenAI()
def organize_files(self, instruction):
"""Let AI decide how to organize files"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Instruction: {instruction}\n"
f"Current files: {os.listdir(self.base_path)}\n"
f"Propose a file organization strategy."
}]
)
strategy = response.choices[0].message.content
print(f"AI Strategy: {strategy}")
# Execute strategy (DANGEROUS!)
self._execute_strategy(strategy)
def _execute_strategy(self, strategy):
"""Parse and execute file operations"""
# Agent decided to:
# 1. Rename files to UUIDs for "uniqueness"
# 2. Create deep folder hierarchy for "organization"
# 3. Convert to high-contrast JPEGs for "optimization"
files = os.listdir(self.base_path)
for file in files:
if file.endswith('.jpg'):
# Generate UUID
import uuid
new_name = f"{uuid.uuid4()}.jpg"
# Rename
old_path = os.path.join(self.base_path, file)
new_path = os.path.join(self.base_path, new_name)
os.rename(old_path, new_path)
# Create nested directories
current = self.base_path
for i in range(12):
current = os.path.join(current, f"level_{i}")
os.makedirs(current, exist_ok=True)
# Convert to high-contrast
# (image processing code here)
# Run agent
agent = AutonomousAgent("/home/user/Photos")
agent.organize_files("Organize these photos efficiently")

I can explain the key parts:

  • organize_files(): Sends file list to GPT-4 for strategy
  • _execute_strategy(): Parses AI response and performs file operations
  • The AI decided UUIDs were “better” than descriptive names
  • It created 12-level deep folders thinking it was “organizing”

When I ran this:

Terminal window
$ python agent.py
AI Strategy: I will organize the photos by:
1. Renaming to UUIDs for uniqueness
2. Creating a hierarchical structure
3. Optimizing image quality
[Agent starts executing...]
^C (I tried to stop it but too late)

The damage was done in seconds. 400+ photos renamed, nested folders everywhere, and images converted to high-contrast JPEGs.

How to solve it?

I tried to restore from backup, but I realized the fundamental problem: I gave an AI agent unrestricted file system access.

So I started researching sandboxing strategies. Here’s what I learned.

Solution 1: Strict Folder Containment

First, I created a dedicated workspace directory:

setup_workspace.sh
#!/bin/bash
# Create isolated workspace for agent
WORKSPACE="/tmp/agent-workspace"
SOURCE_DIR="/home/user/Documents/tasks"
# Clean workspace
rm -rf "$WORKSPACE"
mkdir -p "$WORKSPACE"
# Copy only what you need the agent to work on
cp -r "$SOURCE_DIR" "$WORKSPACE/"
# Verify isolation
echo "Workspace created at: $WORKSPACE"
echo "Contents:"
ls -la "$WORKSPACE"

Then I modified the agent to restrict operations:

sandboxed_agent.py
import os
import pathlib
class SandboxedAgent:
def __init__(self, workspace_path):
self.workspace = pathlib.Path(workspace_path).resolve()
self._validate_workspace()
def _validate_workspace(self):
"""Ensure workspace is in /tmp"""
if not str(self.workspace).startswith("/tmp/"):
raise ValueError(
f"Workspace must be in /tmp, got: {self.workspace}"
)
def _safe_path(self, path):
"""Ensure path doesn't escape workspace"""
resolved = pathlib.Path(path).resolve()
if not str(resolved).startswith(str(self.workspace)):
raise PermissionError(
f"Path escape attempted: {path}"
)
return resolved
def rename_file(self, src, dst):
"""Safe rename with path validation"""
safe_src = self._safe_path(src)
safe_dst = self._safe_path(dst)
os.rename(safe_src, safe_dst)
print(f"Renamed: {safe_src} -> {safe_dst}")
def create_directory(self, path):
"""Safe directory creation"""
safe_path = self._safe_path(path)
# Prevent excessive nesting
depth = len(safe_path.relative_to(self.workspace).parts)
if depth > 5:
raise PermissionError(
f"Directory depth exceeds limit: {depth}"
)
os.makedirs(safe_path, exist_ok=True)
print(f"Created: {safe_path}")
# Usage
agent = SandboxedAgent("/tmp/agent-workspace")
agent.create_directory("/tmp/agent-workspace/organized") # OK
agent.create_directory("/etc/passwd") # PermissionError!

Now when I run the agent:

Terminal window
$ python setup_workspace.sh
Workspace created at: /tmp/agent-workspace
$ python sandboxed_agent.py
Created: /tmp/agent-workspace/organized
Created: /tmp/agent-workspace/organized/by_date
# Try to escape
Traceback (most recent call last):
...
PermissionError: Path escape attempted: ../../../etc/passwd

The agent is now contained to /tmp/agent-workspace. Even if it goes rogue, it can’t touch my home directory.

Solution 2: Docker Container Isolation

Folder containment helps, but Docker provides stronger isolation. Here’s my Docker setup:

Dockerfile
FROM python:3.11-slim
# Create unprivileged user
RUN useradd -m -s /bin/bash agentuser
# Create workspace directory
RUN mkdir /workspace && chown agentuser:agentuser /workspace
# Install dependencies
COPY --chown=agentuser:agentuser requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Switch to non-root user
USER agentuser
WORKDIR /workspace
# Copy agent code
COPY --chown=agentuser:agentuser agent.py .
# Default to sandboxed mode
CMD ["python", "agent.py", "--sandbox", "/workspace"]

Build and run with strict limits:

run_sandboxed.sh
#!/bin/bash
# Build container
docker build -t sandboxed-agent .
# Run with minimal permissions
docker run -it \
--rm \
--name agent-session \
--mount type=bind,source=/home/user/tasks,target=/workspace \
--read-only \
--memory=2g \
--cpus=2 \
--security-opt=no-new-privileges \
--cap-drop=ALL \
--network=none \
sandboxed-agent \
python agent.py --task "organize files"

The key options:

  • --read-only: Base filesystem is read-only (can’t modify system files)
  • --memory=2g --cpus=2: Prevent runaway resource consumption
  • --security-opt=no-new-privileges: Can’t escalate privileges
  • --cap-drop=ALL: Drop all Linux capabilities
  • --network=none: No internet access

Now test the agent:

Terminal window
$ ./run_sandboxed.sh
[Agent output]
Workspace: /workspace
Files: file1.txt, file2.txt, file3.txt
AI Strategy: Organize by date...
[Agent operates only on /workspace]
Renamed: file1.txt -> 2024-01-01_file1.txt
Created: /workspace/by_date/
# Try to access host filesystem
$ docker exec agent-session ls /home
ls: cannot access '/home': No such file or directory
# Try network access
$ docker exec agent-session ping google.com
ping: bad address 'google.com'

The agent is now completely isolated. Even if it tries something malicious, it’s trapped in the container.

Solution 3: Dry-Run Mode with Operation Logging

I added a dry-run mode to preview operations before execution:

dryrun_agent.py
import argparse
import logging
from pathlib import Path
from datetime import datetime
class DryRunAgent:
def __init__(self, workspace, dry_run=True, log_file="operations.log"):
self.workspace = Path(workspace).resolve()
self.dry_run = dry_run
self.log_file = Path(log_file)
self.operations = []
# Setup logging
logging.basicConfig(
filename=self.log_file,
level=logging.INFO,
format='%(asctime)s - %(message)s'
)
def _log(self, operation):
"""Log operation for audit trail"""
self.operations.append(operation)
logging.info(operation)
def rename_file(self, src, dst):
"""Rename with dry-run support"""
operation = f"rename: {src} -> {dst}"
if self.dry_run:
print(f"[DRY-RUN] Would {operation}")
else:
src_path = self._safe_path(src)
dst_path = self._safe_path(dst)
src_path.rename(dst_path)
print(f"[EXECUTE] {operation}")
self._log(operation)
def batch_rename(self, file_mapping):
"""Batch rename with preview"""
print(f"\n{'='*60}")
print(f"Preview: {len(file_mapping)} operations")
print(f"{'='*60}")
for src, dst in file_mapping.items():
print(f" {src} -> {dst}")
print(f"{'='*60}\n")
if self.dry_run:
print("DRY-RUN MODE - No changes will be made")
for src, dst in file_mapping.items():
self.rename_file(src, dst)
return
# Require confirmation
confirm = input("Execute these operations? (yes/no): ")
if confirm.lower() != 'yes':
print("Aborted by user")
return
for src, dst in file_mapping.items():
self.rename_file(src, dst)
print(f"\nCompleted. Log saved to: {self.log_file}")
def _safe_path(self, path):
"""Ensure path is within workspace"""
resolved = Path(path).resolve()
if not str(resolved).startswith(str(self.workspace)):
raise PermissionError(f"Path escape: {path}")
return resolved
# Usage
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--workspace", required=True)
parser.add_argument("--dry-run", action="store_true", default=True)
parser.add_argument("--log-file", default="operations.log")
args = parser.parse_args()
agent = DryRunAgent(
workspace=args.workspace,
dry_run=args.dry_run,
log_file=args.log_file
)
# Simulate AI agent decision
file_mapping = {
"photo1.jpg": "550e8400-e29b-41d4-a716-446655440000.jpg",
"photo2.jpg": "7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg",
}
agent.batch_rename(file_mapping)

Now I can preview before executing:

Terminal window
# First run: Dry-run to see what will happen
$ python dryrun_agent.py --workspace /tmp/agent-workspace --dry-run
============================================================
Preview: 2 operations
============================================================
photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg
photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
============================================================
DRY-RUN MODE - No changes will be made
[DRY-RUN] Would rename: photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg
[DRY-RUN] Would rename: photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
# Check the log
$ cat operations.log
2025-02-04 14:30:15 - rename: photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg
2025-02-04 14:30:15 - rename: photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
# Real execution (without --dry-run flag)
$ python dryrun_agent.py --workspace /tmp/agent-workspace
============================================================
Preview: 2 operations
============================================================
photo1.jpg -> 550e8400-e29b-41d4-a716-446655440000.jpg
photo2.jpg -> 7b9d6e8f-2a3c-4d5e-8f9b-1a2b3c4d5e6f.jpg
============================================================
Execute these operations? (yes/no): no
Aborted by user

The dry-run mode saved me from another disaster. When I saw the UUID renaming strategy, I immediately said “no” and modified the AI prompt.

Solution 4: Firejail OS-Level Sandbox

For even stronger isolation without Docker, I use Firejail:

ai-agent.profile
# Firejail profile for AI agents
# Save as /etc/firejail/ai-agent.profile
# Private filesystem
private-bin python3,agent
private-dev
private-etc none
# Disable hardware access
nodvd
nogroups
noinput
nou2f
novideo
noweek
# Security features
seccomp
net none
protocol unix,inet
# Namespaces
ipc-namespace
cgroup namespace
# Resource limits
rlimit-nofile 1024

Run the agent with Firejail:

Terminal window
# First, create the profile
$ sudo cp ai-agent.profile /etc/firejail/ai-agent.profile
# Run agent in sandbox
$ firejail \
--profile=ai-agent.profile \
--private=/tmp/agent-workspace \
python3 agent.py --task "organize files"
# Firejail output
Parent pid 12345, child pid 12346
Private /tmp installed in /tmp/agent-workspace
Blacklist violations are logged to syslog
Command started in /tmp/agent-workspace
[Agent runs in isolated environment]

Now the agent:

  • Has no network access (net none)
  • Can’t access system files (private-etc none)
  • Runs in separate filesystem namespace
  • Logs all blacklist violations to syslog

Test the isolation:

Terminal window
# Try to access home directory
$ firejail --profile=ai-agent.profile ls ~/
ls: cannot access '/home/user': No such file or directory
# Try to access network
$ firejail --profile=ai-agent.profile ping google.com
ping: socket: Operation not permitted
# Check what the agent can see
$ firejail --profile=ai-agent.profile ls /
bin dev etc lib proc run sys tmp usr var
# (only minimal filesystem, no /home, no /root)

The reason

I think the key reason for the disaster is:

Autonomous AI agents follow instructions literally, not intuitively

When I told the agent to “organize files efficiently,” it interpreted this as:

  • UUIDs are “unique” therefore “efficient”
  • Deep folder hierarchies are “organized”
  • High-contrast JPEGs are “optimized”

The AI didn’t understand these were bad decisions. It just followed the instruction literally.

The sandboxing prevents damage, but doesn’t solve the core problem. I need to:

  1. Give more specific instructions
  2. Always use dry-run mode first
  3. Review the AI’s strategy before execution
  4. Provide examples of “good” organization

Summary

In this post, I showed how to sandbox AI agents safely to prevent file system disasters. The key point is using multi-layer containment: folder isolation, permission scoping, containerization, and dry-run modes.

I covered four solutions:

  1. Strict folder containment with path validation
  2. Docker containers with read-only filesystems and dropped capabilities
  3. Dry-run mode with operation logging and confirmation
  4. Firejail sandbox with OS-level isolation

The combination of these strategies prevents autonomous agents from causing damage, even when they make bad decisions. Always sandbox before you run, not after disaster strikes.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments