AI Agent Deleted Production Data: 5 Safeguards I Wish I Had Implemented
Problem
A Reddit user shared a disturbing story:
“Given an AI in production recently lost some data while helping with a migration…”
I felt that in my gut. Because I had been there.
Last month, my AI agent ran a database migration script. It worked perfectly in staging. In production? It deleted 2,000 rows before I could stop it. The agent had “optimized” a query by removing what it thought was redundant logic.
That was not a bug in the AI. That was a missing safeguard in my system.
Another comment hit harder:
“What happens when AI can’t solve issues it creates?”
Exactly. An AI agent that deletes data cannot undo that deletion. Humans must be in the loop before damage occurs, not after.
Environment
- LangGraph 0.2 (interrupts, checkpointing)
- Python 3.11
- OpenAI Moderation API
- PostgreSQL (production database)
What I learned the hard way
After the incident, I realized my approach to AI agents was fundamentally flawed. I treated them like autonomous workers. But they are not workers - they are powerful tools that need guardrails.
I call this the “faith-driven development” problem. I trusted the AI would do the right thing. I was wrong.
Here are the 5 layers of safeguards I implemented afterward.
Layer 1: Human-in-the-Loop Interrupts
The most critical safeguard is requiring human approval before destructive operations. LangGraph’s interrupt() function pauses execution and surfaces the pending action to a human reviewer.
When to use interrupts:
- Database schema changes (DROP, ALTER, TRUNCATE)
- Production deployments
- File deletions
- API calls with irreversible effects
- High-value transactions
from langchain.tools import toolfrom langgraph.types import interrupt
@tooldef execute_migration(sql: str, environment: str) -> str: """Execute a database migration.""" if environment == "production": response = interrupt({ "action": "migration", "sql": sql, "environment": environment, "message": "Approve production migration?", "risk": "HIGH - irreversible data changes" })
if response.get("action") != "approve": return "Migration cancelled by human"
# Execute migration return run_migration(sql)The key insight: interrupt() does not ask for permission. It stops execution until a human responds. The agent cannot proceed without explicit approval.
Layer 2: Content and Action Guardrails
Guardrails act as a filter between user input, agent reasoning, and final output. They prevent:
- Malicious prompts (injections, jailbreaks)
- Harmful content generation
- Off-topic requests
- Unauthorized command execution
I implemented a middleware layer that blocks dangerous SQL commands:
from langchain.agents.middleware import AgentMiddleware, AgentState
class SQLSafetyMiddleware(AgentMiddleware): """Block dangerous SQL operations before execution."""
DANGEROUS_COMMANDS = [ "DROP TABLE", "DROP DATABASE", "DELETE FROM", "TRUNCATE", "ALTER TABLE", "GRANT ALL" ]
def before_agent(self, state: AgentState, runtime) -> dict | None: for msg in state.get("messages", []): content = str(msg.content).upper() for cmd in self.DANGEROUS_COMMANDS: if cmd in content: return { "messages": [{ "role": "assistant", "content": f"BLOCKED: '{cmd}' requires human approval." }], "blocked": True, "reason": f"dangerous_operation:{cmd}" } return NoneNVIDIA NeMo Guardrails provides specialized microservices:
- ContentSafety: Filters harmful/biased outputs
- TopicControl: Keeps conversations on approved topics
- JailbreakDetect: Prevents prompt injection attacks
Layer 3: Durable Execution and Checkpointing
When agents perform long-running tasks, system failures leave work in an inconsistent state. Durable execution ensures:
- Completed steps are never re-executed
- Failed workflows resume from the last checkpoint
- Human reviewers can inspect state at any point
from langgraph.checkpoint.memory import InMemorySaverfrom langgraph.func import entrypoint, task
@entrypoint(checkpointer=InMemorySaver())def migration_pipeline(migration_id: str) -> dict: """Migration pipeline with automatic recovery.""" backup_result = create_backup(migration_id).result() validate_result = validate_backup(migration_id).result()
# If migration fails, can resume from here # Backup is already done, no re-execution migrate_result = run_migration(migration_id).result()
return { "backup": backup_result, "validate": validate_result, "migrate": migrate_result }The checkpoint pattern saved me twice already. When a migration failed halfway, I could inspect the checkpoint to see exactly what completed, then resume without re-running the backup.
Layer 4: Permission Scoping and Tool Restrictions
Not all agents need all permissions. I now limit agent capabilities by role:
├─────────────────────────────────────────────────────────────┤│ ││ code_reviewer devops_assistant ││ ┌──────────────┐ ┌──────────────┐ ││ │ read_file │ │ deploy_staging│ ││ │ post_comment │ │ rollback │ ││ │ create_pr │ │ check_logs │ ││ │ │ │ │ ││ │ NO WRITE │ │ NO PROD │ ││ │ NO DEPLOY │ │ NEEDS APPROVAL│ ││ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────┘AGENT_PERMISSIONS = { "code_reviewer": { "tools": ["read_file", "post_comment", "create_pr"], "restrictions": ["no_write_access", "no_deploy"] }, "devops_assistant": { "tools": ["deploy_staging", "rollback", "check_logs"], "restrictions": ["production_requires_approval"] }}
def check_permission(agent_role: str, tool: str, params: dict) -> bool: perms = AGENT_PERMISSIONS.get(agent_role, {}) if tool not in perms.get("tools", []): return False
# Check restrictions for restriction in perms.get("restrictions", []): if violates_restriction(restriction, params): return False
return TrueLayer 5: Audit Logging and Observability
Every agent action must be logged. After my incident, I spent 4 hours trying to reconstruct what happened. Now I have complete trails.
Key events to log:
- Tool invocations with parameters
- Decision reasoning chains
- Human intervention points
- Errors and recovery actions
import jsonfrom datetime import datetime
def log_agent_action( agent_id: str, action: str, params: dict, result: str, human_approved: bool = False): log_entry = { "timestamp": datetime.utcnow().isoformat(), "agent_id": agent_id, "action": action, "params": params, "result": result, "human_approved": human_approved } # Write to append-only audit log with open("agent_audit.log", "a") as f: f.write(json.dumps(log_entry) + "\n")This has already helped me debug two “why did the agent do that?” situations.
Putting it all together
Here is the mental model I now use:
├─────────────────────────────────────────────────────────────┤│ ││ User Input ││ ↓ ││ [1] Content Guardrails ──── Block malicious input ││ ↓ ││ [2] Permission Check ────── Block unauthorized tools ││ ↓ ││ Agent Processing ││ ↓ ││ [3] HITL Interrupt ──────── Pause for human approval ││ ↓ ││ [4] Durable Execution ────── Checkpoint for recovery ││ ↓ ││ [5] Audit Log ────────────── Record everything ││ ↓ ││ Output ││ │└─────────────────────────────────────────────────────────────┘What I would do differently
If I could go back before that migration incident:
- Start with interrupts - Add HITL approval for any operation that touches production
- Log everything - You cannot debug what you cannot see
- Scope permissions tightly - An agent that cannot delete data cannot accidentally delete data
- Test failure scenarios - What happens when the agent fails mid-task? What gets rolled back?
- Create rollback procedures - Every agent-initiated change needs a documented undo path
Summary
In this post, I explained how to prevent AI agents from causing production incidents. The key insight is that AI agents need defense in depth - multiple safeguards that catch failures at different stages.
Five layers work together:
- Human-in-the-loop interrupts for destructive operations
- Content guardrails to filter dangerous commands
- Durable execution to recover from failures
- Permission scoping to limit blast radius
- Audit logging to understand what happened
The goal is not eliminating agent autonomy. The goal is ensuring that when things go wrong - and they will - the blast radius is contained and recovery is possible.
I learned this the hard way. Do not make my mistake.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 LangGraph Human-in-the-Loop Patterns
- 👨💻 NVIDIA NeMo Guardrails Documentation
- 👨💻 OpenAI Moderation API Guide
- 👨💻 Reddit: AI Employees in my company
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments