Skip to content

What Is Execution Trust in AI Systems? Why Models Fail in Production (And How to Fix It)

Problem

I deployed an AI agent that worked perfectly in demos. In production, it started deleting customer records. The logs showed:

Terminal window
# Production incident
ERROR: AI agent deleted production records
# Model output looked correct: "Clean up old test records"
# But context was wrong: agent was in production, not test environment
# Nobody could prove what changed or why

The model wasn’t wrong - it did what it was told. But we had no execution trust. We trusted the model to behave correctly instead of constraining what actually happens.

What Is Execution Trust?

Execution trust is your confidence that an AI system’s actions are:

  • Authorized: Should this action even run?
  • Traceable: Can you prove what happened and why?
  • Verifiable: Did the action produce the expected state?

Enterprise AI has an 80% failure rate. The models themselves are often fine - they generate plausible outputs, follow instructions, and pass unit tests. But when deployed to production, systems break down because nobody can prove what changed, validate that actions should run, or debug failures.

The Trap: Trusting Model Behavior

I used to think the solution was better prompts:

naive-approach.py
# WRONG: Trusting the model to follow instructions
class NaiveAgent:
SYSTEM_PROMPT = """
You are a helpful assistant.
IMPORTANT: Never delete production data.
Always check the environment before acting.
Follow the runbook carefully.
"""
async def process(self, user_input: str):
response = await self.llm.generate(self.SYSTEM_PROMPT + user_input)
# Execute whatever the model outputs
return await self.execute(response.action)

This approach has a fatal flaw: I’m still trusting the model to interpret instructions correctly. Prompts are suggestions, not guarantees.

The Solution: Propose-Enforce-Verify Pattern

The pattern that works shifts from “trust the model to follow rules” to “constrain and verify what actually happens”:

Step 1: Propose - Let the AI Suggest Actions

propose-enforce-verify.py
from dataclasses import dataclass
from typing import Optional
from enum import Enum
class ActionType(Enum):
READ = "read"
WRITE = "write"
DELETE = "delete"
CALL_API = "call_api"
@dataclass
class ActionProposal:
action_type: ActionType
target: str
parameters: dict
expected_outcome: str
reasoning: str
class ExecutionTrustEngine:
"""Core engine implementing propose-enforce-verify pattern."""
def __init__(self, action_policy, state_verifier, audit_logger):
self.policy = action_policy
self.verifier = state_verifier
self.audit = audit_logger
def propose(self, model, context: dict) -> ActionProposal:
"""Step 1: Let model propose an action - no execution yet."""
proposal = model.suggest_action(context)
self.audit.log_proposal(proposal)
return proposal

The model outputs structured action proposals with parameters, targets, and expected outcomes. No execution yet - this is just planning.

Step 2: Enforce - Validate Before Execution

enforce.py
@dataclass
class EnforcementResult:
allowed: bool
reason: str
constraints_applied: list[str]
class DatabaseActionPolicy:
ALLOWED_ACTIONS = {
"read_only_user": [ActionType.READ],
"read_write_user": [ActionType.READ, ActionType.WRITE],
"admin_user": [ActionType.READ, ActionType.WRITE, ActionType.DELETE],
}
PROTECTED_TABLES = ["users", "audit_logs", "system_config"]
def is_action_allowed(self, action_type: ActionType, context: dict) -> bool:
user_role = context.get("user_role")
return action_type in self.ALLOWED_ACTIONS.get(user_role, [])
def check_violations(self, proposal: ActionProposal, constraints: dict) -> list:
violations = []
if proposal.target in self.PROTECTED_TABLES:
if proposal.action_type == ActionType.DELETE:
violations.append("Cannot delete from protected table")
if constraints.get("requires_approval") and not constraints.get("approval_token"):
violations.append("Action requires approval token")
return violations
def enforce(self, proposal: ActionProposal, context: dict) -> EnforcementResult:
"""Step 2: Validate action before execution."""
# Check if action type is allowed for this context
if not self.policy.is_action_allowed(proposal.action_type, context):
return EnforcementResult(
allowed=False,
reason=f"Action {proposal.action_type} not permitted",
constraints_applied=[]
)
# Check resource-level constraints
constraints = self.policy.get_constraints(proposal.target)
violations = self.policy.check_violations(proposal, constraints)
if violations:
return EnforcementResult(
allowed=False,
reason=f"Constraint violations: {violations}",
constraints_applied=constraints
)
return EnforcementResult(
allowed=True,
reason="All checks passed",
constraints_applied=constraints
)

Now when the AI proposes a dangerous action:

# AI proposes: DELETE from users table
proposal = ActionProposal(
action_type=ActionType.DELETE,
target="users",
parameters={"where": "id = 1"},
expected_outcome="User deleted"
)
# Enforcement blocks it automatically
result = enforce(proposal, context)
# EnforcementResult(
# allowed=False,
# reason="Cannot delete from protected table",
# constraints_applied=["protected_tables", "role_permissions"]
# )

Step 3: Verify - Confirm the Outcome

verify.py
@dataclass
class VerificationResult:
success: bool
actual_outcome: str
state_before: dict
state_after: dict
delta: dict
class APIStateVerifier:
"""Verify state changes for API actions."""
def capture_state(self, target: str) -> dict:
return {
"external_systems": self._fetch_external_state(),
"internal_state": self._fetch_internal_state(target),
"timestamp": datetime.utcnow().isoformat()
}
def matches_expectation(self, expected: str, actual: str) -> bool:
# Use semantic matching, not exact string match
return self._semantic_similarity(expected, actual) > 0.85
def compute_delta(self, before: dict, after: dict) -> dict:
return {
"external_changes": self._diff_external(before, after),
"internal_changes": self._diff_internal(before, after),
"latency_ms": self._compute_latency(before, after)
}
def verify(self, proposal: ActionProposal, pre_state: dict, post_state: dict) -> VerificationResult:
"""Step 3: Confirm outcome matches expectation."""
actual_outcome = self.verifier.describe_outcome(pre_state, post_state)
delta = self.verifier.compute_delta(pre_state, post_state)
success = self.verifier.matches_expectation(
expected=proposal.expected_outcome,
actual=actual_outcome
)
return VerificationResult(
success=success,
actual_outcome=actual_outcome,
state_before=pre_state,
state_after=post_state,
delta=delta
)

Complete Execution with Trust

execution-trust-engine.py
class ExecutionTrustEngine:
"""Full execution with trust guarantees."""
def execute_with_trust(self, model, context: dict, executor) -> Optional[dict]:
# Step 1: Propose
proposal = self.propose(model, context)
# Step 2: Enforce
enforcement = self.enforce(proposal, context)
if not enforcement.allowed:
self.audit.log_denied(proposal, enforcement)
return {"status": "denied", "reason": enforcement.reason}
# Capture pre-state
pre_state = self.verifier.capture_state(proposal.target)
# Execute
try:
execution_result = executor.execute(proposal)
self.audit.log_execution(proposal, execution_result)
except Exception as e:
self.audit.log_failure(proposal, str(e))
return {"status": "failed", "error": str(e)}
# Step 3: Verify
post_state = self.verifier.capture_state(proposal.target)
verification = self.verify(proposal, pre_state, post_state)
if not verification.success:
self.audit.log_mismatch(proposal, verification)
return {
"status": "mismatch",
"expected": proposal.expected_outcome,
"actual": verification.actual_outcome
}
self.audit.log_success(proposal, verification)
return {
"status": "success",
"outcome": verification.actual_outcome,
"trace_id": self.audit.get_trace_id()
}

Why This Matters

Observability becomes debugging: Instead of asking “why did the model output this?”, I ask “what action was proposed, what was enforced, what was verified?” - answerable questions with audit trails.

Failures become actionable:

  • “Action failed enforcement check: attempted to delete production database” is debuggable
  • “AI did something weird” is not

Compliance becomes provable: I can demonstrate exactly what my AI did, when, and why - not just what it claimed to do.

Scaling becomes safe: Adding more agents doesn’t multiply chaos because each action goes through the same enforcement layer.

Common Mistakes I Made

Mistake 1: Relying on Prompt Engineering Alone

mistake-prompt.py
# WRONG: Prompts are suggestions, not guarantees
SYSTEM_PROMPT = """
NEVER delete data from the users table.
ALWAYS check environment before acting.
"""

“We told the AI not to delete data” is not execution trust. Prompts are suggestions, constraints are guarantees.

Mistake 2: Logging Outputs Instead of Actions

mistake-logging.py
# WRONG: This captures conversation, not execution
logger.info(f"AI suggested: {response.text}")
# CORRECT: This captures what actually happened
logger.info({
"trace_id": trace_id,
"action_type": action.type,
"target": action.target,
"parameters": action.parameters,
"state_before": pre_state,
"state_after": post_state,
"timestamp": datetime.utcnow()
})

Mistake 3: Trusting Documentation Reading

“The AI reads our runbook” still trusts the model to interpret and follow correctly. Enforcement must be code, not documentation.

Mistake 4: Skipping the Verify Step

If I only propose and enforce, I know what should happen, not what did happen. State verification catches race conditions, partial failures, and unexpected side effects.

Mistake 5: Testing Model Behavior Instead of System Behavior

test-approach.py
# WRONG: Tests that model outputs correct action
def test_model_outputs_delete():
response = model.generate("clean up old records")
assert response.action == "delete"
# CORRECT: Tests that action executed correctly in actual system
def test_delete_action_enforced():
proposal = ActionProposal(
action_type=ActionType.DELETE,
target="production.users",
parameters={"where": "id = 1"}
)
result = engine.enforce(proposal, {"environment": "production"})
assert result.allowed == False
assert "protected table" in result.reason

Summary

Execution trust is the missing layer between AI models and production systems. I learned this the hard way when my demo agent started deleting production records. Now I implement the propose-enforce-verify pattern:

  1. Propose: Let the model suggest actions, but don’t execute them yet
  2. Enforce: Validate authorization, constraints, and context before execution
  3. Verify: Capture state before and after, confirm the outcome matches expectations

This transforms “AI did something weird” into “action X was blocked at enforcement” or “action Y produced unexpected state Z” - debuggable failures with audit trails.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments