Skip to content

How to Build AI Coding Workflows That Survive Model Changes

Five hours of work. Gone. My AI coding assistant suddenly couldn’t figure out how to create a zip file. On retry, it provided nothing but an archive of the original code plus a description of changes it “would have made.”

But here’s what caught my attention: another developer reported having a governance discipline system that let them use Codex 5.1 with performance “quite close to 5.4”—they didn’t suffer from the model degradation that broke my workflow.

The difference? I had ad-hoc prompts. They had structure.

The Problem

When AI models change or degrade, workflows break. This happens silently—no API version change, no deprecation warning. The endpoint stays the same, but the model behind it behaves differently.

I saw this firsthand:

  • Prompts that worked for months suddenly failed
  • Instructions were understood but not executed
  • Quality dropped while token usage increased

The developer with the governance system had a different experience. As they put it: “Development is slower and burns more tokens indeed, but at least I don’t suffer backlashes from model degradation.”

That trade-off—slower but stable—became my goal.

What is AI Workflow Governance?

Governance is a structured approach to using AI coding assistants. Instead of throwing prompts at a model and hoping for the best, you create systems that:

  1. Validate outputs before accepting them
  2. Provide explicit constraints and verification
  3. Maintain consistency across model versions
  4. Reduce impact of model degradation

Think of it like building with guardrails. You still get the productivity benefits of AI, but you catch errors before they cascade.

The Four Components

1. Structured Prompt Templates

The most impactful change I made was standardizing prompts. Here’s the template I now use:

prompt-template.md
## Task
{specific_objective}
## Constraints (MUST follow)
- DO NOT modify: {protected_files}
- DO NOT change: {protected_logic}
- MUST preserve: {required_elements}
## Verification Required
Before making any changes:
1. List ALL files you intend to modify
2. Show EXACT changes you plan to make
3. Explain how each constraint is satisfied
4. WAIT for my approval before applying changes
## Output Format
Provide your response in JSON:
{
"plan": "description of what you will do",
"files_to_modify": ["file1", "file2"],
"changes": [...],
"constraint_compliance": [...]
}

This structure forces the AI to plan before acting and makes failures visible.

2. Multi-Stage Validation

I split every interaction into four stages:

Plan → Review → Execute → Verify

At each stage, there’s a gate:

  • Plan: AI describes what it will do in structured format
  • Review: Automated checks + optional human approval
  • Execute: AI makes the actual changes
  • Verify: Changes are validated against requirements

This catches issues at the planning stage instead of after broken code is committed.

3. Model Fallback Chains

When your primary model fails quality checks, you need alternatives:

fallback_config.py
MODEL_CHAIN = {
"primary": {
"model": "codex-5.4",
"quality_threshold": 0.85
},
"secondary": {
"model": "codex-5.2",
"quality_threshold": 0.80
},
"tertiary": {
"model": "claude-sonnet",
"quality_threshold": 0.75
}
}

When codex-5.4 started failing, my system automatically fell back to 5.2. I didn’t lose productivity—I just paid slightly higher token costs while debugging the root cause.

4. Change Logging and Rollback

Every AI change is logged. When something breaks, I can see exactly what changed and revert:

change_logger.py
import json
from datetime import datetime
from pathlib import Path
class ChangeLogger:
def __init__(self, log_dir="ai_changes"):
self.log_dir = Path(log_dir)
self.log_dir.mkdir(exist_ok=True)
def log_change(self, action, files_modified, diff, model_version):
entry = {
"timestamp": datetime.now().isoformat(),
"action": action,
"files": files_modified,
"diff": diff,
"model": model_version,
"session_id": self.get_session_id()
}
log_file = self.log_dir / f"{datetime.now().strftime('%Y-%m-%d')}.jsonl"
with open(log_file, "a") as f:
f.write(json.dumps(entry) + "\n")
return entry
def rollback(self, timestamp):
"""Find and revert changes from a specific time."""
# Implementation depends on your version control
pass

This gives me visibility into what the AI actually did versus what I asked for.

Implementation

Here’s the governance system I built:

governance_system.py
from dataclasses import dataclass
from typing import List, Optional, Callable
from enum import Enum
import subprocess
import json
class Stage(Enum):
PLAN = "plan"
REVIEW = "review"
EXECUTE = "execute"
VERIFY = "verify"
@dataclass
class AIAction:
description: str
files_to_modify: List[str]
changes_preview: str
constraints: List[str]
approved: bool = False
class GovernanceSystem:
def __init__(self, model_client, config: dict):
self.model = model_client
self.config = config
self.fallback_models = config.get("fallback_models", [])
self.change_log = []
def structured_prompt(self, task: str, constraints: dict) -> str:
"""Generate a structured prompt with governance constraints."""
template = """
## Task
{task}
## Constraints (MUST follow)
{constraints_text}
## Verification Required
Before making any changes:
1. List ALL files you intend to modify
2. Show EXACT changes you plan to make
3. Explain how each constraint is satisfied
4. WAIT for my approval before applying changes
## Output Format
Provide your response in this JSON structure:
{{
"plan": "description of what you will do",
"files_to_modify": ["file1", "file2"],
"changes": [
{{"file": "path/to/file", "change": "description"}}
],
"constraint_compliance": [
{{"constraint": "X", "how_satisfied": "Y"}}
]
}}
"""
constraints_text = "\n".join(f"- {c}" for c in constraints.get("must_not", []))
constraints_text += "\n".join(f"- MUST: {c}" for c in constraints.get("must", []))
return template.format(task=task, constraints_text=constraints_text)
def execute_with_governance(self, task: str, constraints: dict) -> dict:
"""Full governance workflow: Plan -> Review -> Execute -> Verify."""
# Stage 1: PLAN
prompt = self.structured_prompt(task, constraints)
plan_response = self.model.generate(prompt, temperature=0.0)
try:
action = self.parse_plan(plan_response)
except json.JSONDecodeError:
return {"success": False, "error": "Failed to parse plan", "stage": Stage.PLAN}
# Stage 2: REVIEW
if not self.review_action(action):
return {"success": False, "error": "Action not approved", "stage": Stage.REVIEW}
# Stage 3: EXECUTE
results = self.execute_action(action)
self.change_log.append({
"action": action,
"results": results,
"timestamp": datetime.now().isoformat()
})
# Stage 4: VERIFY
verification = self.verify_changes(action, results)
return {
"success": verification["passed"],
"stage": Stage.VERIFY,
"action": action,
"results": results,
"verification": verification
}
def review_action(self, action: AIAction) -> bool:
"""Review the proposed action against constraints."""
# Automated checks
for constraint in action.constraints:
if not self.check_constraint(action, constraint):
print(f"Constraint violation: {constraint}")
return False
# Optional human-in-the-loop
if self.config.get("require_human_approval", False):
print(f"\nProposed action: {action.description}")
print(f"Files to modify: {action.files_to_modify}")
print(f"Preview:\n{action.changes_preview}")
response = input("Approve? (y/n): ")
return response.lower() == 'y'
return True
def verify_changes(self, action: AIAction, results: dict) -> dict:
"""Verify the changes were applied correctly."""
checks = {
"files_modified": True,
"constraints_satisfied": True,
"tests_pass": True,
"errors": []
}
# Run tests if configured
if self.config.get("run_tests_on_change", True):
test_result = subprocess.run(
["pytest", "--tb=short"],
capture_output=True
)
checks["tests_pass"] = test_result.returncode == 0
if not checks["tests_pass"]:
checks["errors"].append(test_result.stdout.decode())
return {
"passed": all([checks["files_modified"],
checks["constraints_satisfied"],
checks["tests_pass"]]),
"checks": checks
}
def fallback_execute(self, task: str, constraints: dict) -> dict:
"""Try primary model, fallback to others if needed."""
for model_version in [self.model] + self.fallback_models:
try:
result = self.execute_with_governance(task, constraints)
if result["success"]:
return result
except Exception as e:
print(f"Model failed: {e}")
continue
return {"success": False, "error": "All models failed"}
def parse_plan(self, response) -> AIAction:
"""Parse model response into structured action."""
# Extract JSON from response
content = response.content if hasattr(response, 'content') else str(response)
data = json.loads(content)
return AIAction(
description=data.get("plan", ""),
files_to_modify=data.get("files_to_modify", []),
changes_preview=json.dumps(data.get("changes", []), indent=2),
constraints=[c.get("constraint") for c in data.get("constraint_compliance", [])]
)
def check_constraint(self, action: AIAction, constraint: str) -> bool:
"""Check if action satisfies a specific constraint."""
# Implement constraint checking logic
# This is domain-specific
return True

Configuration

Here’s how I configure the system:

governance_config.json
{
"require_human_approval": false,
"run_tests_on_change": true,
"fallback_models": [
{"provider": "openai", "model": "codex-5.2"},
{"provider": "anthropic", "model": "claude-sonnet"}
],
"constraints": {
"must_not": [
"Modify files outside of src/",
"Delete existing functionality",
"Change API signatures without explicit approval"
],
"must": [
"Add tests for new functionality",
"Update documentation",
"Follow existing code style"
]
},
"verification": {
"run_linter": true,
"run_tests": true,
"check_types": true
}
}

Why This Works

I tested this system during the Codex 5.4 degradation incident. While others reported:

  • “5 hours of work lost”
  • “It replaced content instead of creating new page”
  • “Suddenly couldn’t figure out how to create a zip file”

My governance system caught failures at the plan stage. When the model proposed wrong targets or shortcuts, the constraint checks flagged them. When quality dropped below thresholds, fallback models took over.

The trade-off is real—development is slower. Each change goes through four stages instead of one. But I haven’t lost work to model degradation since implementing this.

The Cost-Benefit Analysis

Before governance:

  • Fast development when model works well
  • Catastrophic failures when model degrades
  • Lost hours debugging “fixes” that introduced bugs
  • No visibility into what changed

After governance:

  • Slower development (maybe 20-30% more tokens)
  • Consistent quality regardless of model changes
  • Clear audit trail of all changes
  • Automatic fallback to stable models

The math works out: spending 20% more tokens is better than losing 5 hours of work.

Common Patterns

I’ve found these patterns consistently useful:

PatternDescriptionBenefit
Plan-Review-ExecuteAI proposes, human/system approves, then executesCatches errors early
Sandbox ExecutionTest changes in isolated environmentSafe experimentation
Incremental ChangesSmall, atomic modificationsEasy to debug and rollback
Constraint TemplatesReusable prompt structuresConsistency across tasks

What I Do Now

My production workflow looks like this:

  1. Every task goes through the governance system
  2. Automated tests run after each change
  3. If tests fail or constraints are violated, changes are rejected
  4. Primary model fails quality check? Fallback to previous version
  5. All changes logged for audit and rollback

This system caught the Codex 5.4 degradation within the first day. I didn’t lose work—I just saw my fallback chain activate and started investigating.

This governance approach connects to several related problems:

The governance system I built addresses all of these: detection through monitoring, instruction following through constraints, and benchmarking through quality thresholds.

Governance systems protect your AI coding workflows from model degradation and changes. Structure your prompts, validate outputs, maintain fallbacks, and log all changes. Slower development is better than lost work.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments