Skip to content

How to Add Guardrails to Autonomous AI Agents: A Complete Safety Framework

Problem

When I let my AI agent autonomously reorganize my codebase, I got this disaster:

Terminal window
Error: Cannot find module './utils/Logger'
at require (internal/modules/cjs/loader.js:1023:15)
at Object.<anonymous> (/app/src/index.js:5:12)
at Module._compile (internal/modules/cjs/loader.js:1201:30)

The agent had moved files into new directories but didn’t update the import paths. I checked my project structure:

Terminal window
user@host:~/project$ tree -L 2
.
├── src/
├── utils/
└── logger.js # File was moved here
└── index.js # Still importing from old location
└── package.json

Environment

  • Node.js v20.11.0
  • Custom AI agent built with LangChain
  • Express.js application
  • Development environment (thankfully not production)

What happened?

I was building an autonomous AI agent to help with code maintenance. I gave it filesystem access and asked it to “reorganize the utils folder for better structure.” Here’s my initial agent setup:

src/agent/AIAgent.ts
import { FileSystemTools } from './FileSystemTools';
export class AIAgent {
private fsTools: FileSystemTools;
constructor() {
this.fsTools = new FileSystemTools({
rootDir: process.cwd(),
// No restrictions - DANGEROUS!
});
}
async executeTask(task: string): Promise<void> {
const plan = await this.planExecution(task);
// Execute directly without preview
for (const action of plan.actions) {
await this.fsTools.execute(action);
}
}
private async planExecution(task: string) {
// Generate action plan
return { actions: [] };
}
}

The file system tools had unrestricted access:

src/agent/FileSystemTools.ts
import fs from 'fs/promises';
import path from 'path';
export class FileSystemTools {
private rootDir: string;
constructor(config: { rootDir: string }) {
this.rootDir = config.rootDir;
}
async execute(action: FileAction): Promise<void> {
switch (action.type) {
case 'move':
await fs.rename(
path.join(this.rootDir, action.from),
path.join(this.rootDir, action.to)
);
break;
case 'delete':
await fs.unlink(path.join(this.rootDir, action.path));
break;
case 'write':
await fs.writeFile(
path.join(this.rootDir, action.path),
action.content
);
break;
}
}
}
interface FileAction {
type: 'move' | 'delete' | 'write';
from?: string;
to?: string;
path?: string;
content?: string;
}

When I ran the agent:

Terminal window
user@host:~/project$ npm run agent -- "Reorganize utils folder"
🤖 Agent: Reorganizing utils folder...
Moved src/utils/logger.js to src/utils/logging/logger.js
Moved src/utils/parser.js to src/utils/parsing/parser.js
Moved src/utils/validator.js to src/utils/validation/validator.js
Task complete!
user@host:~/project$ npm start
Error: Cannot find module './utils/Logger'
at require (internal/modules/cjs/loader.js:1023:15)

The agent moved files but didn’t update import statements in files that depended on them. I had no way to preview what would change before execution, and no rollback mechanism.

How to solve it?

I tried adding a simple preview system first:

src/agent/AIAgentWithPreview.ts
export class AIAgentWithPreview {
async executeTask(task: string): Promise<void> {
const plan = await this.planExecution(task);
// Show preview before executing
console.log('\n=== PREVIEW ===');
console.log('Planned actions:');
plan.actions.forEach((action, i) => {
console.log(`${i + 1}. ${action.type}: ${action.from || action.path}`);
});
const approved = await this.promptUser('\nExecute these actions? (yes/no)');
if (!approved) {
console.log('Operation cancelled');
return;
}
// Execute actions
for (const action of plan.actions) {
await this.fsTools.execute(action);
}
}
private async promptUser(question: string): Promise<boolean> {
const readline = require('readline');
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
return new Promise((resolve) => {
rl.question(question, (answer: string) => {
rl.close();
resolve(answer.toLowerCase() === 'yes');
});
});
}
}

But this only showed what files would move - it didn’t show the actual diff or update imports. I needed a better dry-run system.

Then I implemented a comprehensive guardrail system with five layers:

Layer 1: Dry-run with diff previews

src/agent/GuardedAgent.ts
import { GitOperations } from './GitOperations';
import { DiffGenerator } from './DiffGenerator';
export class GuardedAgent {
private git: GitOperations;
private diffGenerator: DiffGenerator;
private fsTools: ScopedFileSystemTools;
private operationsSinceCheckpoint: FileOperation[] = [];
private readonly CONFIRMATION_INTERVAL = 3;
async executeTask(task: string): Promise<void> {
// Create git checkpoint before starting
const checkpointHash = await this.git.createCommit(
`checkpoint: before task "${task}"`
);
console.log(`✓ Created checkpoint: ${checkpointHash.slice(0, 7)}`);
// Phase 1: Generate plan with previews
const plan = await this.planExecution(task);
// Phase 2: Generate diffs for each operation
const operationsWithPreviews: FileOperation[] = [];
for (const action of plan.actions) {
const preview = await this.diffGenerator.generatePreview(action);
operationsWithPreviews.push({
...action,
preview
});
}
// Phase 3: Show detailed preview
await this.showPreview(operationsWithPreviews);
// Phase 4: Get approval
const approved = await this.promptUser('\nExecute these actions? (yes/no)');
if (!approved) {
console.log('Operation cancelled');
return;
}
// Phase 5: Execute with checkpoints
try {
for (const operation of operationsWithPreviews) {
await this.executeWithCheckpoint(operation);
}
console.log('✓ Task completed successfully');
} catch (error) {
console.error('Task failed, rolling back...');
await this.git.resetToCommit(checkpointHash);
throw error;
}
}
private async executeWithCheckpoint(operation: FileOperation): Promise<void> {
// Execute the operation
await this.fsTools.execute(operation);
// Track for checkpointing
this.operationsSinceCheckpoint.push(operation);
// Check if we need human confirmation
if (this.operationsSinceCheckpoint.length >= this.CONFIRMATION_INTERVAL) {
await this.requestCheckpointConfirmation();
}
}
private async requestCheckpointConfirmation(): Promise<void> {
console.log('\n=== CHECKPOINT REACHED ===');
console.log(`Operations since last checkpoint: ${this.operationsSinceCheckpoint.length}`);
console.log('\nOperations summary:');
this.operationsSinceCheckpoint.forEach((op, i) => {
console.log(`${i + 1}. ${op.type}: ${op.path || op.to}`);
});
const choice = await this.promptUser(
'\nOptions: (c)ontinue | (r)ollback | (a)bort: '
);
switch (choice.toLowerCase()) {
case 'c':
// Create checkpoint and continue
const commitHash = await this.git.createCommit(
`checkpoint: ${this.operationsSinceCheckpoint.length} operations`
);
console.log(`✓ Checkpoint created: ${commitHash.slice(0, 7)}`);
this.operationsSinceCheckpoint = [];
break;
case 'r':
// Rollback one checkpoint
await this.git.rollbackOneCommit();
this.operationsSinceCheckpoint = [];
throw new Error('Rolled back by user request');
case 'a':
// Abort entirely
await this.git.rollbackToInitial();
throw new Error('Aborted by user');
}
}
private async showPreview(operations: FileOperation[]): Promise<void> {
console.log('\n=== DETAILED PREVIEW ===');
console.log(`Total operations: ${operations.length}\n`);
for (const op of operations) {
console.log(`\n${op.type.toUpperCase()}: ${op.path || op.to}`);
console.log(''.repeat(60));
if (op.preview) {
// Show diff
console.log(op.preview);
} else {
console.log(`No preview available for ${op.type}`);
}
console.log(''.repeat(60));
}
}
private async promptUser(question: string): Promise<string> {
const readline = require('readline');
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
return new Promise((resolve) => {
rl.question(question, (answer: string) => {
rl.close();
resolve(answer);
});
});
}
}
interface FileOperation {
type: 'move' | 'delete' | 'write' | 'update-imports';
from?: string;
to?: string;
path?: string;
content?: string;
preview?: string;
}

The diff generator shows exactly what will change:

src/agent/DiffGenerator.ts
import { execSync } from 'child_process';
import fs from 'fs/promises';
export class DiffGenerator {
async generatePreview(operation: FileOperation): Promise<string> {
switch (operation.type) {
case 'move':
return this.generateMovePreview(operation.from!, operation.to!);
case 'write':
return this.generateWritePreview(operation.path!, operation.content!);
case 'delete':
return this.generateDeletePreview(operation.path!);
case 'update-imports':
return this.generateImportUpdatePreview(operation.path!, operation.content!);
default:
return 'Unknown operation type';
}
}
private async generateMovePreview(from: string, to: string): Promise<string> {
const fromExists = await this.fileExists(from);
const toExists = await this.fileExists(to);
let preview = `Moving: ${from}\n`;
preview += `To: ${to}\n\n`;
if (fromExists) {
// Get git diff if file is tracked
try {
const diff = execSync(`git diff --cached ${from}`, {
encoding: 'utf-8',
cwd: process.cwd()
});
if (diff) {
preview += 'Changes in moved file:\n';
preview += diff;
}
} catch {
// File might not be tracked yet
}
}
return preview;
}
private async generateWritePreview(path: string, content: string): Promise<string> {
const exists = await this.fileExists(path);
let preview = '';
if (exists) {
// Show diff between existing and new content
const currentContent = await fs.readFile(path, 'utf-8');
const diff = this.generateUnifiedDiff(path, currentContent, content);
preview = `Modifying: ${path}\n\n${diff}`;
} else {
// Show new file preview
preview = `Creating: ${path}\n\n`;
preview += 'New content:\n';
preview += content.split('\n').slice(0, 20).join('\n');
if (content.split('\n').length > 20) {
preview += `\n... (${content.split('\n').length - 20} more lines)`;
}
}
return preview;
}
private async generateDeletePreview(path: string): Promise<string> {
const exists = await this.fileExists(path);
if (!exists) {
return `File does not exist: ${path}`;
}
let preview = `Deleting: ${path}\n\n`;
preview += 'Content to be deleted:\n';
const content = await fs.readFile(path, 'utf-8');
preview += content.split('\n').slice(0, 20).join('\n');
if (content.split('\n').length > 20) {
preview += `\n... (${content.split('\n').length - 20} more lines)`;
}
return preview;
}
private generateImportUpdatePreview(path: string, newContent: string): string {
// Generate preview for import statement updates
const currentContent = ''; // Would read actual file
const diff = this.generateUnifiedDiff(path, currentContent, newContent);
return `Updating imports in: ${path}\n\n${diff}`;
}
private generateUnifiedDiff(filename: string, old: string, new: string): string {
// Simplified diff generation
const oldLines = old.split('\n');
const newLines = new.split('\n');
let diff = '';
let i = 0, j = 0;
while (i < oldLines.length || j < newLines.length) {
if (i < oldLines.length && j < newLines.length && oldLines[i] === newLines[j]) {
// Line unchanged
diff += ` ${oldLines[i]}\n`;
i++;
j++;
} else {
// Show differences
if (i < oldLines.length) {
diff += `- ${oldLines[i]}\n`;
i++;
}
if (j < newLines.length) {
diff += `+ ${newLines[j]}\n`;
j++;
}
}
}
return diff;
}
private async fileExists(path: string): Promise<boolean> {
try {
await fs.access(path);
return true;
} catch {
return false;
}
}
}

Layer 2: Scoped permissions with sandbox

src/agent/ScopedFileSystemTools.ts
import path from 'path';
import fs from 'fs/promises';
export class ScopedFileSystemTools {
private allowedWritePaths: string[];
private readonly rootDir: string;
private sandboxDir: string;
constructor(config: {
rootDir: string;
allowedWritePaths: string[];
sandboxDir?: string;
}) {
this.rootDir = config.rootDir;
this.allowedWritePaths = config.allowedWritePaths.map(p => path.resolve(config.rootDir, p));
this.sandboxDir = config.sandboxDir || path.join(config.rootDir, '.agent-sandbox');
}
async execute(operation: FileOperation): Promise<void> {
// Validate permissions for write operations
if (this.isWriteOperation(operation.type)) {
const targetPath = operation.path || operation.to;
if (!targetPath) {
throw new Error('Write operation missing target path');
}
const absolutePath = path.resolve(this.rootDir, targetPath);
if (!this.isPathAllowed(absolutePath)) {
throw new PermissionError(
`Agent attempted to write outside allowed scope: ${targetPath}\n` +
`Allowed paths: ${this.allowedWritePaths.join(', ')}`
);
}
}
// Execute operation
switch (operation.type) {
case 'move':
await this.moveFile(operation.from!, operation.to!);
break;
case 'delete':
await this.deleteFile(operation.path!);
break;
case 'write':
await this.writeFile(operation.path!, operation.content!);
break;
}
}
private isWriteOperation(type: string): boolean {
return ['move', 'delete', 'write', 'update-imports'].includes(type);
}
private isPathAllowed(targetPath: string): boolean {
const resolvedTarget = path.resolve(targetPath);
return this.allowedWritePaths.some(allowedPath => {
const resolvedAllowed = path.resolve(allowedPath);
return resolvedTarget.startsWith(resolvedAllowed);
});
}
private async moveFile(from: string, to: string): Promise<void> {
const fromPath = path.resolve(this.rootDir, from);
const toPath = path.resolve(this.rootDir, to);
await fs.rename(fromPath, toPath);
}
private async deleteFile(filePath: string): Promise<void> {
const fullPath = path.resolve(this.rootDir, filePath);
await fs.unlink(fullPath);
}
private async writeFile(filePath: string, content: string): Promise<void> {
const fullPath = path.resolve(this.rootDir, filePath);
// Ensure directory exists
const dir = path.dirname(fullPath);
await fs.mkdir(dir, { recursive: true });
await fs.writeFile(fullPath, content, 'utf-8');
}
}
class PermissionError extends Error {
constructor(message: string) {
super(message);
this.name = 'PermissionError';
}
}

Usage with scoped permissions:

src/agent/agent-config.ts
// Configuration example
const agentConfig = {
rootDir: process.cwd(),
allowedWritePaths: [
'src/utils', // Can write to utils
'src/components', // Can write to components
'tests' // Can write to tests
],
// Cannot write to:
// - package.json (root level)
// - src/index.js (entry point)
// - .env files
// - CI/CD configs
};
const scopedTools = new ScopedFileSystemTools(agentConfig);

Layer 3: Git-based rollback

src/agent/GitOperations.ts
import { execSync } from 'child_process';
export class GitOperations {
private readonly repoPath: string;
private initialCommit: string | null = null;
constructor(repoPath: string) {
this.repoPath = repoPath;
}
async initialize(): Promise<void> {
// Check if we're in a git repo
try {
const branch = execSync('git rev-parse --abbrev-ref HEAD', {
cwd: this.repoPath,
encoding: 'utf-8'
});
// Store initial commit for rollback
this.initialCommit = execSync('git rev-parse HEAD', {
cwd: this.repoPath,
encoding: 'utf-8'
}).trim();
console.log(`✓ Git repository detected (branch: ${branch.trim()})`);
} catch (error) {
throw new Error(
'Guarded agent requires a git repository. ' +
'Please initialize one first: git init'
);
}
}
async createCommit(message: string): Promise<string> {
// Stage all changes
try {
execSync('git add -A', { cwd: this.repoPath });
} catch (error) {
console.warn('Warning: Could not stage changes');
}
// Create commit
const commitHash = execSync(
`git commit -m "${message}" --no-verify`,
{
cwd: this.repoPath,
encoding: 'utf-8'
}
).trim();
return commitHash;
}
async resetToCommit(commitHash: string): Promise<void> {
execSync(`git reset --hard ${commitHash}`, {
cwd: this.repoPath,
encoding: 'utf-8'
});
console.log(`✓ Rolled back to ${commitHash.slice(0, 7)}`);
}
async rollbackOneCommit(): Promise<void> {
execSync('git reset --hard HEAD~1', {
cwd: this.repoPath,
encoding: 'utf-8'
});
console.log('✓ Rolled back one commit');
}
async rollbackToInitial(): Promise<void> {
if (!this.initialCommit) {
throw new Error('No initial commit stored');
}
await this.resetToCommit(this.initialCommit);
}
async getCurrentCommit(): Promise<string> {
return execSync('git rev-parse HEAD', {
cwd: this.repoPath,
encoding: 'utf-8'
}).trim();
}
async getCommitDiff(commitHash: string): Promise<string> {
return execSync(`git show ${commitHash} --stat`, {
cwd: this.repoPath,
encoding: 'utf-8'
});
}
}

Layer 4: Monitoring and anomaly detection

src/agent/AgentMonitor.ts
interface AgentAction {
timestamp: Date;
operation: string;
target: string;
agentId: string;
}
interface MonitorConfig {
maxActionsPerMinute: number;
maxDeletionsPerSession: number;
suspiciousPatterns: Array<(action: AgentAction, history: AgentAction[]) => boolean>;
}
export class AgentMonitor {
private actionHistory: AgentAction[] = [];
private deletionCount = 0;
private config: MonitorConfig;
constructor(config: Partial<MonitorConfig> = {}) {
this.config = {
maxActionsPerMinute: config.maxActionsPerMinute || 20,
maxDeletionsPerSession: config.maxDeletionsPerSession || 10,
suspiciousPatterns: config.suspiciousPatterns || []
};
}
logAction(action: AgentAction): void {
// Check rate limiting
if (this.isRateLimited(action)) {
throw new Error(
`Agent exceeded rate limit: ` +
`${this.getRecentActionCount(action.agentId)} actions in last minute`
);
}
// Check deletion threshold
if (action.operation === 'delete') {
this.deletionCount++;
if (this.deletionCount > this.config.maxDeletionsPerSession) {
throw new Error(
`Agent exceeded deletion threshold: ` +
`${this.deletionCount} deletions in session`
);
}
}
// Check for suspicious patterns
if (this.detectSuspiciousPattern(action)) {
console.warn(
`⚠️ Suspicious activity detected: ${action.operation} on ${action.target}`
);
throw new Error(
'Suspicious activity detected. Agent paused for human review.'
);
}
this.actionHistory.push(action);
}
private isRateLimited(action: AgentAction): boolean {
const oneMinuteAgo = new Date(Date.now() - 60 * 1000);
const recentActions = this.actionHistory.filter(
a => a.timestamp > oneMinuteAgo && a.agentId === action.agentId
);
return recentActions.length >= this.config.maxActionsPerMinute;
}
private getRecentActionCount(agentId: string): number {
const oneMinuteAgo = new Date(Date.now() - 60 * 1000);
return this.actionHistory.filter(
a => a.timestamp > oneMinuteAgo && a.agentId === agentId
).length;
}
private detectSuspiciousPattern(action: AgentAction): boolean {
// Pattern 1: Deleting critical system files
if (this.isCriticalFile(action.target)) {
return true;
}
// Pattern 2: Rapid file operations across unrelated directories
const recentTargets = this.actionHistory
.slice(-5)
.filter(a => a.agentId === action.agentId)
.map(a => a.target);
const directories = new Set(
recentTargets.map(t => path.dirname(t))
);
if (directories.size > 3) {
console.warn('Agent operating in multiple unrelated directories');
return true;
}
// Run custom suspicious pattern checks
for (const patternCheck of this.config.suspiciousPatterns) {
if (patternCheck(action, this.actionHistory)) {
return true;
}
}
return false;
}
private isCriticalFile(filePath: string): boolean {
const criticalPatterns = [
'package-lock.json',
'yarn.lock',
'package.json',
'.env',
'.git',
'node_modules',
'docker-compose.yml',
'Dockerfile'
];
const normalizedPath = filePath.toLowerCase();
return criticalPatterns.some(pattern =>
normalizedPath.includes(pattern.toLowerCase())
);
}
getStats(): { totalActions: number; deletions: number; agents: number } {
const uniqueAgents = new Set(this.actionHistory.map(a => a.agentId));
return {
totalActions: this.actionHistory.length,
deletions: this.deletionCount,
agents: uniqueAgents.size
};
}
}

Using the complete guardrail system

src/agent/index.ts
import { GuardedAgent } from './GuardedAgent';
import { ScopedFileSystemTools } from './ScopedFileSystemTools';
import { GitOperations } from './GitOperations';
import { DiffGenerator } from './DiffGenerator';
import { AgentMonitor } from './AgentMonitor';
async function main() {
const agentConfig = {
rootDir: process.cwd(),
allowedWritePaths: [
'src/utils',
'src/components',
'tests'
]
};
const git = new GitOperations(process.cwd());
await git.initialize();
const fsTools = new ScopedFileSystemTools(agentConfig);
const diffGenerator = new DiffGenerator();
const monitor = new AgentMonitor({
maxActionsPerMinute: 20,
maxDeletionsPerSession: 10
});
const agent = new GuardedAgent();
agent.setFileSystemTools(fsTools);
agent.setGitOperations(git);
agent.setDiffGenerator(diffGenerator);
agent.setMonitor(monitor);
try {
await agent.executeTask('Reorganize utils folder for better structure');
} catch (error) {
console.error('Task failed:', error.message);
process.exit(1);
}
}
main();

Now when I run the agent, I get:

Terminal window
user@host:~/project$ npm start
Git repository detected (branch: main)
Created checkpoint: a1b2c3d
=== DETAILED PREVIEW ===
Total operations: 6
MOVE: src/utils/logger.js
To: src/utils/logging/logger.js
------------------------------------------------------------
Will move file and update imports in:
- src/index.js
- src/middleware/logger.js
MOVE: src/utils/parser.js
To: src/utils/parsing/parser.js
------------------------------------------------------------
Will move file and update imports in:
- src/routes/api.js
MOVE: src/utils/validator.js
To: src/utils/validation/validator.js
------------------------------------------------------------
Will move file and update imports in:
- src/controllers/user.js
- src/controllers/auth.js
Execute these actions? (yes/no) > yes
Moved src/utils/logger.js src/utils/logging/logger.js
Updated imports in src/index.js
Updated imports in src/middleware/logger.js
=== CHECKPOINT REACHED ===
Operations since last checkpoint: 3
Operations summary:
1. move: src/utils/logging/logger.js
2. update-imports: src/index.js
3. update-imports: src/middleware/logger.js
Options: (c)ontinue | (r)ollback | (a)bort: > c
Checkpoint created: d4e5f6g
Moved src/utils/parser.js src/utils/parsing/parser.js
Moved src/utils/validator.js src/utils/validation/validator.js
Task completed successfully

The reason

I think the key reason for the original disaster was the “genie wish problem” - the AI agent interpreted my instruction literally without understanding project context. When I asked it to “reorganize the utils folder,” it:

  1. Moved files without awareness of dependencies
  2. Didn’t update imports because it only thought about the file system structure
  3. Had no safety mechanisms to preview changes or rollback

The guardrail system solves this by adding human validation at critical decision points. Instead of giving the agent unchecked autonomy, I created a “sandbox with human oversight” model where:

  • Dry-run previews show exactly what will change before execution
  • Scoped permissions restrict write access to safe directories
  • Confirmation checkpoints prevent runaway automation
  • Git-based rollback makes mistakes easily reversible
  • Monitoring detects and blocks suspicious behavior patterns

This is the same principle behind safety features in self-driving cars or medical devices - technical capability needs to be paired with safeguards for real-world utility.

Summary

In this post, I showed how to add guardrails to autonomous AI agents to prevent catastrophic damage. The key point is implementing a multi-layered safety framework with:

  1. Pre-action dry-runs that show exactly what will change with diff previews
  2. Scoped permissions that restrict write access to specific directories
  3. Human checkpoints every 3-5 operations requiring confirmation
  4. Git-based rollback for one-click recovery from mistakes
  5. Continuous monitoring that detects and blocks suspicious patterns

The Reddit community consensus was clear: without these safety controls, “AI autonomy” is dangerous. With proper guardrails, developers can confidently delegate repetitive tasks to agents while maintaining human oversight for critical decisions.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments