Skip to content

How to Build AI Agents with Governance and Modular Architecture

Purpose

This post shows how to build AI agents that stay on track with governance and modular architecture.

Problem

I’ve been working on AI agents for production use, and I noticed a problem in the community. Many “agent” frameworks are just automation wrapped in AI buzzwords. They run tasks or handle webhooks but don’t have actual governance. When these agents make decisions in production, there’s no oversight.

Here’s a basic agent without governance:

agent-without-governance.ts
class BasicAgent {
async onMessage(input: string) {
// Direct action without validation or oversight
const response = await this.llm.generate(input);
await this.executeAction(response.action);
// No logging, no constraints, no oversight
}
}

This works in demos, but in production this is dangerous. The agent can:

  • Take harmful actions
  • Exceed cost limits
  • Make decisions that violate business rules
  • Fail without any record of what happened

Environment

  • TypeScript 5.3
  • Node.js 20
  • PostgreSQL for audit logs
  • Redis for circuit breaker state

Solution

I implemented a modular governance architecture with four layers:

  1. Policy Engine - Validates actions before execution
  2. State Management - Immutable audit trail
  3. Constraint System - Resource limits and safety checks
  4. Monitoring & Control - Circuit breakers and manual overrides

Here’s the governed agent:

governed-agent.ts
interface GovernancePolicy {
validate(action: Action): Promise<PolicyResult>;
}
interface AuditLog {
logDecision(decision: Decision): Promise<void>;
}
class GovernedAgent {
constructor(
private policyEngine: GovernancePolicy,
private auditLog: AuditLog,
private circuitBreaker: CircuitBreaker
) {}
async onMessage(input: string) {
// Layer 1: Generate candidate action
const candidate = await this.llm.generate(input);
// Layer 2: Policy validation (before execution)
const policyCheck = await this.policyEngine.validate(candidate);
if (!policyCheck.allowed) {
await this.auditLog.logDecision({
action: candidate,
blocked: true,
reason: policyCheck.reason
});
return this.fallbackResponse();
}
// Layer 3: Constraint check
if (!this.circuitBreaker.canExecute()) {
throw new Error("Circuit breaker open - too many failures");
}
// Layer 4: Execute with monitoring
const result = await this.executeAction(candidate);
// Layer 5: Audit trail
await this.auditLog.logDecision({
action: candidate,
result: result,
timestamp: new Date(),
allowed: true
});
return result;
}
private async executeAction(action: Action) {
try {
return await this.actionExecutor.execute(action);
} catch (error) {
this.circuitBreaker.recordFailure();
throw error;
}
}
}

Policy Engine

The policy engine checks every action before execution:

policy-engine.ts
class RuleBasedPolicy implements GovernancePolicy {
constructor(private rules: PolicyRule[]) {}
async validate(action: Action): Promise<PolicyResult> {
// Check resource limits
if (action.estimatedCost > this.maxCostPerAction) {
return { allowed: false, reason: "Cost limit exceeded" };
}
// Check allowed actions
if (!this.allowedActions.includes(action.type)) {
return { allowed: false, reason: "Action type not allowed" };
}
// Check safety constraints
const safetyCheck = await this.safetyValidator.validate(action);
if (!safetyCheck.safe) {
return { allowed: false, reason: safetyCheck.reason };
}
return { allowed: true };
}
}

I can explain the key parts:

  • Cost limits - Prevent runaway API bills
  • Allowed actions - Whitelist of actions the agent can take
  • Safety checks - Validate parameters against business rules

Audit Log

Every decision gets logged to an immutable audit trail:

audit-log.ts
interface Decision {
action: Action;
allowed: boolean;
blocked?: boolean;
reason?: string;
result?: any;
timestamp: Date;
}
class PostgresAuditLog implements AuditLog {
async logDecision(decision: Decision): Promise<void> {
await this.db.insert('agent_decisions', {
agent_id: this.agentId,
action_type: decision.action.type,
allowed: decision.allowed,
blocked: decision.blocked,
reason: decision.reason,
result: decision.result ? JSON.stringify(decision.result) : null,
timestamp: decision.timestamp
});
}
async getHistory(agentId: string, limit: number): Promise<Decision[]> {
return await this.db
.select('*')
.from('agent_decisions')
.where('agent_id', agentId)
.orderBy('timestamp', 'desc')
.limit(limit);
}
}

This audit trail is crucial for debugging. When an agent does something unexpected, I can query the history to understand why.

Circuit Breaker

The circuit breaker prevents cascading failures:

circuit-breaker.ts
class CircuitBreaker {
private failureCount = 0;
private lastFailureTime: Date | null = null;
private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
canExecute(): boolean {
if (this.state === 'OPEN') {
// Check if cooldown period passed
if (Date.now() - this.lastFailureTime!.getTime() > this.cooldownMs) {
this.state = 'HALF_OPEN';
return true;
}
return false;
}
return true;
}
recordFailure(): void {
this.failureCount++;
this.lastFailureTime = new Date();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
this.notifyAlerting();
}
}
recordSuccess(): void {
this.failureCount = 0;
this.state = 'CLOSED';
}
private notifyAlerting(): void {
// Send alert to monitoring system
this.alertSystem.send({
severity: 'CRITICAL',
message: `Agent circuit breaker opened: ${this.failureCount} failures`
});
}
}

When I test this with intentional failures:

Terminal window
# Test circuit breaker
curl -X POST http://localhost:3000/agent/message \
-H "Content-Type: application/json" \
-d '{"message": "test"}'
# Simulate failures
# After 5 failures, circuit breaker opens
{"status": "blocked", "reason": "Circuit breaker open - too many failures"}
# Wait for cooldown, then it retries

Why This Matters

Without governance, agents are “a car with no brakes.” They might work in demos, but in production they can:

  • Violate compliance requirements (no audit trail)
  • Cause cost overruns (no resource limits)
  • Make harmful decisions (no policy validation)
  • Fail catastrophically (no circuit breakers)

The modular architecture separates concerns:

  • Policy layer - What actions are allowed?
  • Audit layer - What did the agent do and why?
  • Control layer - How do we stop failures?

Each layer can be tested independently and swapped out as needed.

Summary

In this post, I showed how to build AI agents with governance using modular architecture. The key point is that production-ready agents need policy validation, audit trails, and constraint systems. The complexity is worth it for real-world deployments where trust, safety, and compliance matter.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments