How to Add Guardrails to Autonomous AI Agents: A Complete Safety Framework
Problem
When I let my AI agent autonomously reorganize my codebase, I got this disaster:
Error: Cannot find module './utils/Logger' at require (internal/modules/cjs/loader.js:1023:15) at Object.<anonymous> (/app/src/index.js:5:12) at Module._compile (internal/modules/cjs/loader.js:1201:30)The agent had moved files into new directories but didn’t update the import paths. I checked my project structure:
user@host:~/project$ tree -L 2.├── src/│ ├── utils/│ │ └── logger.js # File was moved here│ └── index.js # Still importing from old location└── package.jsonEnvironment
- Node.js v20.11.0
- Custom AI agent built with LangChain
- Express.js application
- Development environment (thankfully not production)
What happened?
I was building an autonomous AI agent to help with code maintenance. I gave it filesystem access and asked it to “reorganize the utils folder for better structure.” Here’s my initial agent setup:
import { FileSystemTools } from './FileSystemTools';
export class AIAgent { private fsTools: FileSystemTools;
constructor() { this.fsTools = new FileSystemTools({ rootDir: process.cwd(), // No restrictions - DANGEROUS! }); }
async executeTask(task: string): Promise<void> { const plan = await this.planExecution(task);
// Execute directly without preview for (const action of plan.actions) { await this.fsTools.execute(action); } }
private async planExecution(task: string) { // Generate action plan return { actions: [] }; }}The file system tools had unrestricted access:
import fs from 'fs/promises';import path from 'path';
export class FileSystemTools { private rootDir: string;
constructor(config: { rootDir: string }) { this.rootDir = config.rootDir; }
async execute(action: FileAction): Promise<void> { switch (action.type) { case 'move': await fs.rename( path.join(this.rootDir, action.from), path.join(this.rootDir, action.to) ); break; case 'delete': await fs.unlink(path.join(this.rootDir, action.path)); break; case 'write': await fs.writeFile( path.join(this.rootDir, action.path), action.content ); break; } }}
interface FileAction { type: 'move' | 'delete' | 'write'; from?: string; to?: string; path?: string; content?: string;}When I ran the agent:
user@host:~/project$ npm run agent -- "Reorganize utils folder"
🤖 Agent: Reorganizing utils folder...✓ Moved src/utils/logger.js to src/utils/logging/logger.js✓ Moved src/utils/parser.js to src/utils/parsing/parser.js✓ Moved src/utils/validator.js to src/utils/validation/validator.js✓ Task complete!
user@host:~/project$ npm start
Error: Cannot find module './utils/Logger' at require (internal/modules/cjs/loader.js:1023:15)The agent moved files but didn’t update import statements in files that depended on them. I had no way to preview what would change before execution, and no rollback mechanism.
How to solve it?
I tried adding a simple preview system first:
export class AIAgentWithPreview { async executeTask(task: string): Promise<void> { const plan = await this.planExecution(task);
// Show preview before executing console.log('\n=== PREVIEW ==='); console.log('Planned actions:'); plan.actions.forEach((action, i) => { console.log(`${i + 1}. ${action.type}: ${action.from || action.path}`); });
const approved = await this.promptUser('\nExecute these actions? (yes/no)'); if (!approved) { console.log('Operation cancelled'); return; }
// Execute actions for (const action of plan.actions) { await this.fsTools.execute(action); } }
private async promptUser(question: string): Promise<boolean> { const readline = require('readline'); const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
return new Promise((resolve) => { rl.question(question, (answer: string) => { rl.close(); resolve(answer.toLowerCase() === 'yes'); }); }); }}But this only showed what files would move - it didn’t show the actual diff or update imports. I needed a better dry-run system.
Then I implemented a comprehensive guardrail system with five layers:
Layer 1: Dry-run with diff previews
import { GitOperations } from './GitOperations';import { DiffGenerator } from './DiffGenerator';
export class GuardedAgent { private git: GitOperations; private diffGenerator: DiffGenerator; private fsTools: ScopedFileSystemTools; private operationsSinceCheckpoint: FileOperation[] = []; private readonly CONFIRMATION_INTERVAL = 3;
async executeTask(task: string): Promise<void> { // Create git checkpoint before starting const checkpointHash = await this.git.createCommit( `checkpoint: before task "${task}"` ); console.log(`✓ Created checkpoint: ${checkpointHash.slice(0, 7)}`);
// Phase 1: Generate plan with previews const plan = await this.planExecution(task);
// Phase 2: Generate diffs for each operation const operationsWithPreviews: FileOperation[] = []; for (const action of plan.actions) { const preview = await this.diffGenerator.generatePreview(action); operationsWithPreviews.push({ ...action, preview }); }
// Phase 3: Show detailed preview await this.showPreview(operationsWithPreviews);
// Phase 4: Get approval const approved = await this.promptUser('\nExecute these actions? (yes/no)'); if (!approved) { console.log('Operation cancelled'); return; }
// Phase 5: Execute with checkpoints try { for (const operation of operationsWithPreviews) { await this.executeWithCheckpoint(operation); } console.log('✓ Task completed successfully'); } catch (error) { console.error('Task failed, rolling back...'); await this.git.resetToCommit(checkpointHash); throw error; } }
private async executeWithCheckpoint(operation: FileOperation): Promise<void> { // Execute the operation await this.fsTools.execute(operation);
// Track for checkpointing this.operationsSinceCheckpoint.push(operation);
// Check if we need human confirmation if (this.operationsSinceCheckpoint.length >= this.CONFIRMATION_INTERVAL) { await this.requestCheckpointConfirmation(); } }
private async requestCheckpointConfirmation(): Promise<void> { console.log('\n=== CHECKPOINT REACHED ==='); console.log(`Operations since last checkpoint: ${this.operationsSinceCheckpoint.length}`); console.log('\nOperations summary:'); this.operationsSinceCheckpoint.forEach((op, i) => { console.log(`${i + 1}. ${op.type}: ${op.path || op.to}`); });
const choice = await this.promptUser( '\nOptions: (c)ontinue | (r)ollback | (a)bort: ' );
switch (choice.toLowerCase()) { case 'c': // Create checkpoint and continue const commitHash = await this.git.createCommit( `checkpoint: ${this.operationsSinceCheckpoint.length} operations` ); console.log(`✓ Checkpoint created: ${commitHash.slice(0, 7)}`); this.operationsSinceCheckpoint = []; break; case 'r': // Rollback one checkpoint await this.git.rollbackOneCommit(); this.operationsSinceCheckpoint = []; throw new Error('Rolled back by user request'); case 'a': // Abort entirely await this.git.rollbackToInitial(); throw new Error('Aborted by user'); } }
private async showPreview(operations: FileOperation[]): Promise<void> { console.log('\n=== DETAILED PREVIEW ==='); console.log(`Total operations: ${operations.length}\n`);
for (const op of operations) { console.log(`\n${op.type.toUpperCase()}: ${op.path || op.to}`); console.log('─'.repeat(60));
if (op.preview) { // Show diff console.log(op.preview); } else { console.log(`No preview available for ${op.type}`); }
console.log('─'.repeat(60)); } }
private async promptUser(question: string): Promise<string> { const readline = require('readline'); const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
return new Promise((resolve) => { rl.question(question, (answer: string) => { rl.close(); resolve(answer); }); }); }}
interface FileOperation { type: 'move' | 'delete' | 'write' | 'update-imports'; from?: string; to?: string; path?: string; content?: string; preview?: string;}The diff generator shows exactly what will change:
import { execSync } from 'child_process';import fs from 'fs/promises';
export class DiffGenerator { async generatePreview(operation: FileOperation): Promise<string> { switch (operation.type) { case 'move': return this.generateMovePreview(operation.from!, operation.to!); case 'write': return this.generateWritePreview(operation.path!, operation.content!); case 'delete': return this.generateDeletePreview(operation.path!); case 'update-imports': return this.generateImportUpdatePreview(operation.path!, operation.content!); default: return 'Unknown operation type'; } }
private async generateMovePreview(from: string, to: string): Promise<string> { const fromExists = await this.fileExists(from); const toExists = await this.fileExists(to);
let preview = `Moving: ${from}\n`; preview += `To: ${to}\n\n`;
if (fromExists) { // Get git diff if file is tracked try { const diff = execSync(`git diff --cached ${from}`, { encoding: 'utf-8', cwd: process.cwd() }); if (diff) { preview += 'Changes in moved file:\n'; preview += diff; } } catch { // File might not be tracked yet } }
return preview; }
private async generateWritePreview(path: string, content: string): Promise<string> { const exists = await this.fileExists(path); let preview = '';
if (exists) { // Show diff between existing and new content const currentContent = await fs.readFile(path, 'utf-8'); const diff = this.generateUnifiedDiff(path, currentContent, content); preview = `Modifying: ${path}\n\n${diff}`; } else { // Show new file preview preview = `Creating: ${path}\n\n`; preview += 'New content:\n'; preview += content.split('\n').slice(0, 20).join('\n'); if (content.split('\n').length > 20) { preview += `\n... (${content.split('\n').length - 20} more lines)`; } }
return preview; }
private async generateDeletePreview(path: string): Promise<string> { const exists = await this.fileExists(path);
if (!exists) { return `File does not exist: ${path}`; }
let preview = `Deleting: ${path}\n\n`; preview += 'Content to be deleted:\n';
const content = await fs.readFile(path, 'utf-8'); preview += content.split('\n').slice(0, 20).join('\n');
if (content.split('\n').length > 20) { preview += `\n... (${content.split('\n').length - 20} more lines)`; }
return preview; }
private generateImportUpdatePreview(path: string, newContent: string): string { // Generate preview for import statement updates const currentContent = ''; // Would read actual file const diff = this.generateUnifiedDiff(path, currentContent, newContent); return `Updating imports in: ${path}\n\n${diff}`; }
private generateUnifiedDiff(filename: string, old: string, new: string): string { // Simplified diff generation const oldLines = old.split('\n'); const newLines = new.split('\n');
let diff = ''; let i = 0, j = 0;
while (i < oldLines.length || j < newLines.length) { if (i < oldLines.length && j < newLines.length && oldLines[i] === newLines[j]) { // Line unchanged diff += ` ${oldLines[i]}\n`; i++; j++; } else { // Show differences if (i < oldLines.length) { diff += `- ${oldLines[i]}\n`; i++; } if (j < newLines.length) { diff += `+ ${newLines[j]}\n`; j++; } } }
return diff; }
private async fileExists(path: string): Promise<boolean> { try { await fs.access(path); return true; } catch { return false; } }}Layer 2: Scoped permissions with sandbox
import path from 'path';import fs from 'fs/promises';
export class ScopedFileSystemTools { private allowedWritePaths: string[]; private readonly rootDir: string; private sandboxDir: string;
constructor(config: { rootDir: string; allowedWritePaths: string[]; sandboxDir?: string; }) { this.rootDir = config.rootDir; this.allowedWritePaths = config.allowedWritePaths.map(p => path.resolve(config.rootDir, p)); this.sandboxDir = config.sandboxDir || path.join(config.rootDir, '.agent-sandbox'); }
async execute(operation: FileOperation): Promise<void> { // Validate permissions for write operations if (this.isWriteOperation(operation.type)) { const targetPath = operation.path || operation.to;
if (!targetPath) { throw new Error('Write operation missing target path'); }
const absolutePath = path.resolve(this.rootDir, targetPath);
if (!this.isPathAllowed(absolutePath)) { throw new PermissionError( `Agent attempted to write outside allowed scope: ${targetPath}\n` + `Allowed paths: ${this.allowedWritePaths.join(', ')}` ); } }
// Execute operation switch (operation.type) { case 'move': await this.moveFile(operation.from!, operation.to!); break; case 'delete': await this.deleteFile(operation.path!); break; case 'write': await this.writeFile(operation.path!, operation.content!); break; } }
private isWriteOperation(type: string): boolean { return ['move', 'delete', 'write', 'update-imports'].includes(type); }
private isPathAllowed(targetPath: string): boolean { const resolvedTarget = path.resolve(targetPath);
return this.allowedWritePaths.some(allowedPath => { const resolvedAllowed = path.resolve(allowedPath); return resolvedTarget.startsWith(resolvedAllowed); }); }
private async moveFile(from: string, to: string): Promise<void> { const fromPath = path.resolve(this.rootDir, from); const toPath = path.resolve(this.rootDir, to);
await fs.rename(fromPath, toPath); }
private async deleteFile(filePath: string): Promise<void> { const fullPath = path.resolve(this.rootDir, filePath); await fs.unlink(fullPath); }
private async writeFile(filePath: string, content: string): Promise<void> { const fullPath = path.resolve(this.rootDir, filePath);
// Ensure directory exists const dir = path.dirname(fullPath); await fs.mkdir(dir, { recursive: true });
await fs.writeFile(fullPath, content, 'utf-8'); }}
class PermissionError extends Error { constructor(message: string) { super(message); this.name = 'PermissionError'; }}Usage with scoped permissions:
// Configuration exampleconst agentConfig = { rootDir: process.cwd(), allowedWritePaths: [ 'src/utils', // Can write to utils 'src/components', // Can write to components 'tests' // Can write to tests ], // Cannot write to: // - package.json (root level) // - src/index.js (entry point) // - .env files // - CI/CD configs};
const scopedTools = new ScopedFileSystemTools(agentConfig);Layer 3: Git-based rollback
import { execSync } from 'child_process';
export class GitOperations { private readonly repoPath: string; private initialCommit: string | null = null;
constructor(repoPath: string) { this.repoPath = repoPath; }
async initialize(): Promise<void> { // Check if we're in a git repo try { const branch = execSync('git rev-parse --abbrev-ref HEAD', { cwd: this.repoPath, encoding: 'utf-8' });
// Store initial commit for rollback this.initialCommit = execSync('git rev-parse HEAD', { cwd: this.repoPath, encoding: 'utf-8' }).trim();
console.log(`✓ Git repository detected (branch: ${branch.trim()})`); } catch (error) { throw new Error( 'Guarded agent requires a git repository. ' + 'Please initialize one first: git init' ); } }
async createCommit(message: string): Promise<string> { // Stage all changes try { execSync('git add -A', { cwd: this.repoPath }); } catch (error) { console.warn('Warning: Could not stage changes'); }
// Create commit const commitHash = execSync( `git commit -m "${message}" --no-verify`, { cwd: this.repoPath, encoding: 'utf-8' } ).trim();
return commitHash; }
async resetToCommit(commitHash: string): Promise<void> { execSync(`git reset --hard ${commitHash}`, { cwd: this.repoPath, encoding: 'utf-8' }); console.log(`✓ Rolled back to ${commitHash.slice(0, 7)}`); }
async rollbackOneCommit(): Promise<void> { execSync('git reset --hard HEAD~1', { cwd: this.repoPath, encoding: 'utf-8' }); console.log('✓ Rolled back one commit'); }
async rollbackToInitial(): Promise<void> { if (!this.initialCommit) { throw new Error('No initial commit stored'); } await this.resetToCommit(this.initialCommit); }
async getCurrentCommit(): Promise<string> { return execSync('git rev-parse HEAD', { cwd: this.repoPath, encoding: 'utf-8' }).trim(); }
async getCommitDiff(commitHash: string): Promise<string> { return execSync(`git show ${commitHash} --stat`, { cwd: this.repoPath, encoding: 'utf-8' }); }}Layer 4: Monitoring and anomaly detection
interface AgentAction { timestamp: Date; operation: string; target: string; agentId: string;}
interface MonitorConfig { maxActionsPerMinute: number; maxDeletionsPerSession: number; suspiciousPatterns: Array<(action: AgentAction, history: AgentAction[]) => boolean>;}
export class AgentMonitor { private actionHistory: AgentAction[] = []; private deletionCount = 0; private config: MonitorConfig;
constructor(config: Partial<MonitorConfig> = {}) { this.config = { maxActionsPerMinute: config.maxActionsPerMinute || 20, maxDeletionsPerSession: config.maxDeletionsPerSession || 10, suspiciousPatterns: config.suspiciousPatterns || [] }; }
logAction(action: AgentAction): void { // Check rate limiting if (this.isRateLimited(action)) { throw new Error( `Agent exceeded rate limit: ` + `${this.getRecentActionCount(action.agentId)} actions in last minute` ); }
// Check deletion threshold if (action.operation === 'delete') { this.deletionCount++; if (this.deletionCount > this.config.maxDeletionsPerSession) { throw new Error( `Agent exceeded deletion threshold: ` + `${this.deletionCount} deletions in session` ); } }
// Check for suspicious patterns if (this.detectSuspiciousPattern(action)) { console.warn( `⚠️ Suspicious activity detected: ${action.operation} on ${action.target}` ); throw new Error( 'Suspicious activity detected. Agent paused for human review.' ); }
this.actionHistory.push(action); }
private isRateLimited(action: AgentAction): boolean { const oneMinuteAgo = new Date(Date.now() - 60 * 1000); const recentActions = this.actionHistory.filter( a => a.timestamp > oneMinuteAgo && a.agentId === action.agentId );
return recentActions.length >= this.config.maxActionsPerMinute; }
private getRecentActionCount(agentId: string): number { const oneMinuteAgo = new Date(Date.now() - 60 * 1000); return this.actionHistory.filter( a => a.timestamp > oneMinuteAgo && a.agentId === agentId ).length; }
private detectSuspiciousPattern(action: AgentAction): boolean { // Pattern 1: Deleting critical system files if (this.isCriticalFile(action.target)) { return true; }
// Pattern 2: Rapid file operations across unrelated directories const recentTargets = this.actionHistory .slice(-5) .filter(a => a.agentId === action.agentId) .map(a => a.target);
const directories = new Set( recentTargets.map(t => path.dirname(t)) );
if (directories.size > 3) { console.warn('Agent operating in multiple unrelated directories'); return true; }
// Run custom suspicious pattern checks for (const patternCheck of this.config.suspiciousPatterns) { if (patternCheck(action, this.actionHistory)) { return true; } }
return false; }
private isCriticalFile(filePath: string): boolean { const criticalPatterns = [ 'package-lock.json', 'yarn.lock', 'package.json', '.env', '.git', 'node_modules', 'docker-compose.yml', 'Dockerfile' ];
const normalizedPath = filePath.toLowerCase(); return criticalPatterns.some(pattern => normalizedPath.includes(pattern.toLowerCase()) ); }
getStats(): { totalActions: number; deletions: number; agents: number } { const uniqueAgents = new Set(this.actionHistory.map(a => a.agentId)); return { totalActions: this.actionHistory.length, deletions: this.deletionCount, agents: uniqueAgents.size }; }}Using the complete guardrail system
import { GuardedAgent } from './GuardedAgent';import { ScopedFileSystemTools } from './ScopedFileSystemTools';import { GitOperations } from './GitOperations';import { DiffGenerator } from './DiffGenerator';import { AgentMonitor } from './AgentMonitor';
async function main() { const agentConfig = { rootDir: process.cwd(), allowedWritePaths: [ 'src/utils', 'src/components', 'tests' ] };
const git = new GitOperations(process.cwd()); await git.initialize();
const fsTools = new ScopedFileSystemTools(agentConfig); const diffGenerator = new DiffGenerator(); const monitor = new AgentMonitor({ maxActionsPerMinute: 20, maxDeletionsPerSession: 10 });
const agent = new GuardedAgent(); agent.setFileSystemTools(fsTools); agent.setGitOperations(git); agent.setDiffGenerator(diffGenerator); agent.setMonitor(monitor);
try { await agent.executeTask('Reorganize utils folder for better structure'); } catch (error) { console.error('Task failed:', error.message); process.exit(1); }}
main();Now when I run the agent, I get:
user@host:~/project$ npm start
✓ Git repository detected (branch: main)✓ Created checkpoint: a1b2c3d
=== DETAILED PREVIEW ===Total operations: 6
MOVE: src/utils/logger.jsTo: src/utils/logging/logger.js------------------------------------------------------------Will move file and update imports in: - src/index.js - src/middleware/logger.js
MOVE: src/utils/parser.jsTo: src/utils/parsing/parser.js------------------------------------------------------------Will move file and update imports in: - src/routes/api.js
MOVE: src/utils/validator.jsTo: src/utils/validation/validator.js------------------------------------------------------------Will move file and update imports in: - src/controllers/user.js - src/controllers/auth.js
Execute these actions? (yes/no) > yes
✓ Moved src/utils/logger.js → src/utils/logging/logger.js✓ Updated imports in src/index.js✓ Updated imports in src/middleware/logger.js
=== CHECKPOINT REACHED ===Operations since last checkpoint: 3
Operations summary:1. move: src/utils/logging/logger.js2. update-imports: src/index.js3. update-imports: src/middleware/logger.js
Options: (c)ontinue | (r)ollback | (a)bort: > c✓ Checkpoint created: d4e5f6g
✓ Moved src/utils/parser.js → src/utils/parsing/parser.js✓ Moved src/utils/validator.js → src/utils/validation/validator.js
✓ Task completed successfullyThe reason
I think the key reason for the original disaster was the “genie wish problem” - the AI agent interpreted my instruction literally without understanding project context. When I asked it to “reorganize the utils folder,” it:
- Moved files without awareness of dependencies
- Didn’t update imports because it only thought about the file system structure
- Had no safety mechanisms to preview changes or rollback
The guardrail system solves this by adding human validation at critical decision points. Instead of giving the agent unchecked autonomy, I created a “sandbox with human oversight” model where:
- Dry-run previews show exactly what will change before execution
- Scoped permissions restrict write access to safe directories
- Confirmation checkpoints prevent runaway automation
- Git-based rollback makes mistakes easily reversible
- Monitoring detects and blocks suspicious behavior patterns
This is the same principle behind safety features in self-driving cars or medical devices - technical capability needs to be paired with safeguards for real-world utility.
Summary
In this post, I showed how to add guardrails to autonomous AI agents to prevent catastrophic damage. The key point is implementing a multi-layered safety framework with:
- Pre-action dry-runs that show exactly what will change with diff previews
- Scoped permissions that restrict write access to specific directories
- Human checkpoints every 3-5 operations requiring confirmation
- Git-based rollback for one-click recovery from mistakes
- Continuous monitoring that detects and blocks suspicious patterns
The Reddit community consensus was clear: without these safety controls, “AI autonomy” is dangerous. With proper guardrails, developers can confidently delegate repetitive tasks to agents while maintaining human oversight for critical decisions.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit discussion on AI agent guardrails
- 👨💻 OpenAI safety guidelines for AI agents
- 👨💻 Anthropic's constitutional AI principles
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments