How to Build AI Coding Workflows That Survive Model Changes

Mar 22, 2026

Five hours of work. Gone. My AI coding assistant suddenly couldn’t figure out how to create a zip file. On retry, it provided nothing but an archive of the original code plus a description of changes it “would have made.”

But here’s what caught my attention: another developer reported having a governance discipline system that let them use Codex 5.1 with performance “quite close to 5.4”—they didn’t suffer from the model degradation that broke my workflow.

The difference? I had ad-hoc prompts. They had structure.

The Problem

When AI models change or degrade, workflows break. This happens silently—no API version change, no deprecation warning. The endpoint stays the same, but the model behind it behaves differently.

I saw this firsthand:

Prompts that worked for months suddenly failed
Instructions were understood but not executed
Quality dropped while token usage increased

The developer with the governance system had a different experience. As they put it: “Development is slower and burns more tokens indeed, but at least I don’t suffer backlashes from model degradation.”

That trade-off—slower but stable—became my goal.

What is AI Workflow Governance?

Governance is a structured approach to using AI coding assistants. Instead of throwing prompts at a model and hoping for the best, you create systems that:

Validate outputs before accepting them
Provide explicit constraints and verification
Maintain consistency across model versions
Reduce impact of model degradation

Think of it like building with guardrails. You still get the productivity benefits of AI, but you catch errors before they cascade.

The Four Components

1. Structured Prompt Templates

The most impactful change I made was standardizing prompts. Here’s the template I now use:

## Task
{specific_objective}

## Constraints (MUST follow)
- DO NOT modify: {protected_files}
- DO NOT change: {protected_logic}
- MUST preserve: {required_elements}

## Verification Required
Before making any changes:
1. List ALL files you intend to modify
2. Show EXACT changes you plan to make
3. Explain how each constraint is satisfied
4. WAIT for my approval before applying changes

## Output Format
Provide your response in JSON:
{
  "plan": "description of what you will do",
  "files_to_modify": ["file1", "file2"],
  "changes": [...],
  "constraint_compliance": [...]
}

This structure forces the AI to plan before acting and makes failures visible.

2. Multi-Stage Validation

I split every interaction into four stages:

Plan → Review → Execute → Verify

At each stage, there’s a gate:

Plan: AI describes what it will do in structured format
Review: Automated checks + optional human approval
Execute: AI makes the actual changes
Verify: Changes are validated against requirements

This catches issues at the planning stage instead of after broken code is committed.

3. Model Fallback Chains

When your primary model fails quality checks, you need alternatives:

MODEL_CHAIN = {
    "primary": {
        "model": "codex-5.4",
        "quality_threshold": 0.85
    },
    "secondary": {
        "model": "codex-5.2",
        "quality_threshold": 0.80
    },
    "tertiary": {
        "model": "claude-sonnet",
        "quality_threshold": 0.75
    }
}

When codex-5.4 started failing, my system automatically fell back to 5.2. I didn’t lose productivity—I just paid slightly higher token costs while debugging the root cause.

4. Change Logging and Rollback

Every AI change is logged. When something breaks, I can see exactly what changed and revert:

import json
from datetime import datetime
from pathlib import Path

class ChangeLogger:
    def __init__(self, log_dir="ai_changes"):
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(exist_ok=True)

    def log_change(self, action, files_modified, diff, model_version):
        entry = {
            "timestamp": datetime.now().isoformat(),
            "action": action,
            "files": files_modified,
            "diff": diff,
            "model": model_version,
            "session_id": self.get_session_id()
        }

        log_file = self.log_dir / f"{datetime.now().strftime('%Y-%m-%d')}.jsonl"
        with open(log_file, "a") as f:
            f.write(json.dumps(entry) + "\n")

        return entry

    def rollback(self, timestamp):
        """Find and revert changes from a specific time."""
        # Implementation depends on your version control
        pass

This gives me visibility into what the AI actually did versus what I asked for.

Implementation

Here’s the governance system I built:

from dataclasses import dataclass
from typing import List, Optional, Callable
from enum import Enum
import subprocess
import json

class Stage(Enum):
    PLAN = "plan"
    REVIEW = "review"
    EXECUTE = "execute"
    VERIFY = "verify"

@dataclass
class AIAction:
    description: str
    files_to_modify: List[str]
    changes_preview: str
    constraints: List[str]
    approved: bool = False

class GovernanceSystem:
    def __init__(self, model_client, config: dict):
        self.model = model_client
        self.config = config
        self.fallback_models = config.get("fallback_models", [])
        self.change_log = []

    def structured_prompt(self, task: str, constraints: dict) -> str:
        """Generate a structured prompt with governance constraints."""
        template = """
## Task
{task}

## Constraints (MUST follow)
{constraints_text}

## Verification Required
Before making any changes:
1. List ALL files you intend to modify
2. Show EXACT changes you plan to make
3. Explain how each constraint is satisfied
4. WAIT for my approval before applying changes

## Output Format
Provide your response in this JSON structure:
{{
    "plan": "description of what you will do",
    "files_to_modify": ["file1", "file2"],
    "changes": [
        {{"file": "path/to/file", "change": "description"}}
    ],
    "constraint_compliance": [
        {{"constraint": "X", "how_satisfied": "Y"}}
    ]
}}
"""
        constraints_text = "\n".join(f"- {c}" for c in constraints.get("must_not", []))
        constraints_text += "\n".join(f"- MUST: {c}" for c in constraints.get("must", []))

        return template.format(task=task, constraints_text=constraints_text)

    def execute_with_governance(self, task: str, constraints: dict) -> dict:
        """Full governance workflow: Plan -> Review -> Execute -> Verify."""

        # Stage 1: PLAN
        prompt = self.structured_prompt(task, constraints)
        plan_response = self.model.generate(prompt, temperature=0.0)

        try:
            action = self.parse_plan(plan_response)
        except json.JSONDecodeError:
            return {"success": False, "error": "Failed to parse plan", "stage": Stage.PLAN}

        # Stage 2: REVIEW
        if not self.review_action(action):
            return {"success": False, "error": "Action not approved", "stage": Stage.REVIEW}

        # Stage 3: EXECUTE
        results = self.execute_action(action)
        self.change_log.append({
            "action": action,
            "results": results,
            "timestamp": datetime.now().isoformat()
        })

        # Stage 4: VERIFY
        verification = self.verify_changes(action, results)

        return {
            "success": verification["passed"],
            "stage": Stage.VERIFY,
            "action": action,
            "results": results,
            "verification": verification
        }

    def review_action(self, action: AIAction) -> bool:
        """Review the proposed action against constraints."""
        # Automated checks
        for constraint in action.constraints:
            if not self.check_constraint(action, constraint):
                print(f"Constraint violation: {constraint}")
                return False

        # Optional human-in-the-loop
        if self.config.get("require_human_approval", False):
            print(f"\nProposed action: {action.description}")
            print(f"Files to modify: {action.files_to_modify}")
            print(f"Preview:\n{action.changes_preview}")

            response = input("Approve? (y/n): ")
            return response.lower() == 'y'

        return True

    def verify_changes(self, action: AIAction, results: dict) -> dict:
        """Verify the changes were applied correctly."""
        checks = {
            "files_modified": True,
            "constraints_satisfied": True,
            "tests_pass": True,
            "errors": []
        }

        # Run tests if configured
        if self.config.get("run_tests_on_change", True):
            test_result = subprocess.run(
                ["pytest", "--tb=short"],
                capture_output=True
            )
            checks["tests_pass"] = test_result.returncode == 0
            if not checks["tests_pass"]:
                checks["errors"].append(test_result.stdout.decode())

        return {
            "passed": all([checks["files_modified"],
                          checks["constraints_satisfied"],
                          checks["tests_pass"]]),
            "checks": checks
        }

    def fallback_execute(self, task: str, constraints: dict) -> dict:
        """Try primary model, fallback to others if needed."""
        for model_version in [self.model] + self.fallback_models:
            try:
                result = self.execute_with_governance(task, constraints)
                if result["success"]:
                    return result
            except Exception as e:
                print(f"Model failed: {e}")
                continue

        return {"success": False, "error": "All models failed"}

    def parse_plan(self, response) -> AIAction:
        """Parse model response into structured action."""
        # Extract JSON from response
        content = response.content if hasattr(response, 'content') else str(response)
        data = json.loads(content)

        return AIAction(
            description=data.get("plan", ""),
            files_to_modify=data.get("files_to_modify", []),
            changes_preview=json.dumps(data.get("changes", []), indent=2),
            constraints=[c.get("constraint") for c in data.get("constraint_compliance", [])]
        )

    def check_constraint(self, action: AIAction, constraint: str) -> bool:
        """Check if action satisfies a specific constraint."""
        # Implement constraint checking logic
        # This is domain-specific
        return True

Configuration

Here’s how I configure the system:

{
  "require_human_approval": false,
  "run_tests_on_change": true,
  "fallback_models": [
    {"provider": "openai", "model": "codex-5.2"},
    {"provider": "anthropic", "model": "claude-sonnet"}
  ],
  "constraints": {
    "must_not": [
      "Modify files outside of src/",
      "Delete existing functionality",
      "Change API signatures without explicit approval"
    ],
    "must": [
      "Add tests for new functionality",
      "Update documentation",
      "Follow existing code style"
    ]
  },
  "verification": {
    "run_linter": true,
    "run_tests": true,
    "check_types": true
  }
}

Why This Works

I tested this system during the Codex 5.4 degradation incident. While others reported:

“5 hours of work lost”
“It replaced content instead of creating new page”
“Suddenly couldn’t figure out how to create a zip file”

My governance system caught failures at the plan stage. When the model proposed wrong targets or shortcuts, the constraint checks flagged them. When quality dropped below thresholds, fallback models took over.

The trade-off is real—development is slower. Each change goes through four stages instead of one. But I haven’t lost work to model degradation since implementing this.

The Cost-Benefit Analysis

Before governance:

Fast development when model works well
Catastrophic failures when model degrades
Lost hours debugging “fixes” that introduced bugs
No visibility into what changed

After governance:

Slower development (maybe 20-30% more tokens)
Consistent quality regardless of model changes
Clear audit trail of all changes
Automatic fallback to stable models

The math works out: spending 20% more tokens is better than losing 5 hours of work.

Common Patterns

I’ve found these patterns consistently useful:

Pattern	Description	Benefit
Plan-Review-Execute	AI proposes, human/system approves, then executes	Catches errors early
Sandbox Execution	Test changes in isolated environment	Safe experimentation
Incremental Changes	Small, atomic modifications	Easy to debug and rollback
Constraint Templates	Reusable prompt structures	Consistency across tasks

What I Do Now

My production workflow looks like this:

Every task goes through the governance system
Automated tests run after each change
If tests fail or constraints are violated, changes are rejected
Primary model fails quality check? Fallback to previous version
All changes logged for audit and rollback

This system caught the Codex 5.4 degradation within the first day. I didn’t lose work—I just saw my fallback chain activate and started investigating.

This governance approach connects to several related problems:

Model degradation detection: I wrote about how to detect AI model degradation symptoms before they break production
Instruction following: The core issue of models ignoring constraints is covered in my post on why AI coding assistants ignore instructions
Benchmarking: Building benchmarking systems for AI models gives you the data to detect degradation early

The governance system I built addresses all of these: detection through monitoring, instruction following through constraints, and benchmarking through quality thresholds.

Governance systems protect your AI coding workflows from model degradation and changes. Structure your prompts, validate outputs, maintain fallbacks, and log all changes. Slower development is better than lost work.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Codex 5.4 degradation reports
👨‍💻 OpenAI Model Versioning Documentation
👨‍💻 Anthropic: Claude Best Practices

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!