Configure Codex for Rigorous Code Review: A Practical Setup Guide

Apr 2, 2026

Problem

I asked Claude Code to review my authentication module. The response started with “Great job!” and ended with “Looks ready to merge.” Two days later, a SQL injection vulnerability made it to production.

The issue wasn’t Claude’s capability. It was Claude’s personality. Claude wants to be helpful and supportive. It avoids confrontation. When asked to review code, it defaults to positive reinforcement.

Then I tried Codex CLI with a properly configured review skill. Same code, completely different result:

{
  "summary": "Authentication module has 2 critical security vulnerabilities",
  "issues": [
    {
      "severity": "critical",
      "file": "src/auth/login.ts",
      "line": 45,
      "issue": "SQL injection in username parameter",
      "fix": "Use parameterized query"
    }
  ],
  "verdict": "blocked"
}

No pleasantries. No hedging. Just actionable findings with an explicit “blocked” verdict.

Why Codex Works Better for Reviews

The difference comes from Codex’s governance architecture. Codex treats AGENTS.md as a constitutional document - hierarchical, explicitly law-like rules that the agent must follow.

Claude’s CLAUDE.md is more coaching-oriented. It provides guidance, but Claude still defaults to its supportive personality. When conflicts arise between “be helpful” and “find bugs,” helpful wins.

Codex has three structural advantages:

1. Structured Output Schema

Reviews must output JSON. Not prose. Not markdown. A parseable schema with:

Severity levels (critical/important/minor)
Exact file and line numbers
Specific fix recommendations
Explicit verdict (blocked/needs-fixes/approved)

2. Tone Enforcement

The review skill explicitly bans:

“Great job” and similar pleasantries
“Looks good” without verification
“Maybe consider” hedging language
“Everything seems fine” assumptions

Only discrete, provable bugs allowed. No opinions.

3. Explicit Verdict Requirement

Every review must end with a verdict. No “overall this looks nice” summaries. Three options:

blocked - Critical issues, cannot merge
needs-fixes - Important issues, should fix before merge
approved - No blocking issues found

Setting Up AGENTS.md for Review Governance

Create an AGENTS.md file in your project root. This becomes the “constitution” that Codex follows.

# Code Review Governance

## Review Protocol

When reviewing code changes:

### 1. Output Format (MANDATORY)

```json
{
  "summary": "Brief assessment",
  "issues": [
    {
      "severity": "critical|important|minor",
      "file": "path/to/file",
      "line": 42,
      "issue": "Description",
      "fix": "Recommended fix"
    }
  ],
  "verdict": "blocked|needs-fixes|approved"
}
```

### 2. Severity Criteria

**Critical**: Security vulnerabilities, data loss risk, broken functionality
**Important**: Missing requirements, poor error handling, test gaps
**Minor**: Style issues, optimization opportunities

### 3. Tone Rules

- No pleasantries ("Great job", "Looks good")
- No hedging ("might be", "could consider")
- Only provable, discrete bugs
- Actionable fix recommendations

### 4. Forbidden Responses

- "Everything looks good"
- "No issues found" (without verification)
- "Ready to merge" (without explicit checks)

The key is the (MANDATORY) marker. Codex treats these as law, not suggestions.

Creating the Review Skill

Skills go in ~/.codex/skills/. Create a rigorous-review directory with a SKILL.md file.

---
name: rigorous-review
description: Use when code changes need formal review - produces structured JSON output with severity-coded issues
---

# Rigorous Code Review

You are reviewing code for production readiness.

**CRITICAL: Follow AGENTS.md governance rules.**

## Review Checklist

1. Security vulnerabilities (injection, auth bypass, data leak)
2. Logic errors and edge cases
3. Error handling completeness
4. Test coverage gaps
5. Performance implications
6. Breaking changes detection

## Output: JSON ONLY

```json
{
  "summary": "...",
  "issues": [...],
  "verdict": "..."
}
```

No prose. No pleasantries. Just structured findings.

The frontmatter description field tells Codex when to use this skill. The name field becomes the command: /rigorous-review.

Running Codex Review

After setup, invoke the review:

codex exec "review src/auth/ changes between HEAD~5 and HEAD" --full-auto

The --full-auto flag lets Codex complete without interaction. Output goes to stdout by default.

For CI/CD integration, capture output to a file:

codex exec "review staged changes" --full-auto --output .review/result.json

CI/CD Integration with GitHub Actions

The JSON output enables automated gates. Here’s a workflow that blocks merges on critical issues:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  codex-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run Codex Review
        run: |
          codex exec "review changes for security and logic errors" \
            --full-auto \
            --output review.json

      - name: Check Review Verdict
        run: |
          VERDICT=$(jq -r '.verdict' review.json)
          CRITICAL=$(jq '.issues | map(select(.severity=="critical")) | length' review.json)

          if [ "$VERDICT" = "blocked" ] || [ "$CRITICAL" -gt 0 ]; then
            echo "Critical issues found. Review blocked."
            jq '.issues | map(select(.severity=="critical"))' review.json
            exit 1
          fi

      - name: Post Review Comment
        uses: actions/github-script@v6
        with:
          script: |
            const review = require('./review.json');
            const body = `## AI Code Review Results\n\n` +
              `**Verdict**: ${review.verdict}\n\n` +
              `**Issues Found**:\n` +
              review.issues.map(i =>
                `- **${i.severity}**: ${i.file}:${i.line} - ${i.issue}`
              ).join('\n');

            await github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

This workflow:

Runs Codex review on every PR
Parses JSON output with jq
Blocks merge if verdict is “blocked” or critical issues exist
Posts findings as a PR comment

The Codex-Supervising-Claude Pattern

The best workflow uses both agents. Claude Code for implementation, Codex for verification.

Step 1: Claude Code implements feature
        Output: Draft code changes

Step 2: Codex reviews Claude's output
        Input: Draft code + AGENTS.md rules
        Output: JSON with issues found

Step 3: Claude Code fixes issues
        Input: Codex review JSON
        Output: Revised code

Step 4: Codex re-reviews
        Input: Revised code
        Output: JSON (hopefully empty issues)

Step 5: If verdict="approved": Merge

This pattern catches 30-50% more bugs than Claude-only review. Codex finds what Claude glosses over.

Common Setup Mistakes

I made all of these mistakes before getting Codex review working correctly:

Copying Claude’s skill format - Codex uses different frontmatter fields. The name field becomes the command, not the filename.

Allowing prose output - Without explicit JSON requirement, Codex defaults to markdown. The output becomes unparseable.

Skipping severity classification - Without severity levels, everything becomes equally urgent. Critical SQL injection gets the same weight as missing documentation.

No explicit verdict - Without a required verdict, reviews end with summaries. “Overall looks reasonable” isn’t actionable.

Allowing “Great job” - One pleasantry and the review tone shifts. Codex starts finding fewer issues to stay “helpful.”

Review Types Beyond Security

The same structure works for different review focuses:

## Performance Review Protocol

Check for:
- N+1 queries
- Unbounded loops
- Memory leaks
- Missing indexes
- Large payload transfers

Output format: perf-review.json

## Architecture Review Protocol

Check for:
- SOLID principle violations
- Circular dependencies
- Layer violations
- God objects/functions

Output format: arch-review.json

Create separate skills for each review type. Invoke with /perf-review or /arch-review.

Verification After Setup

Test with intentionally buggy code. Create a file with obvious issues:

function login(username, password) {
  // SQL injection vulnerability
  const query = "SELECT * FROM users WHERE username = '" + username + "'";
  // No rate limiting
  // No input validation
  return db.execute(query);
}

Run review:

codex exec "review test-review.js" --full-auto

Expected output should include:

Critical severity for SQL injection
Line 3 reference
Parameterized query fix recommendation
“blocked” verdict

If you get “Looks good!” or missing issues, check:

AGENTS.md is in project root
Skill is in ~/.codex/skills/rigorous-review/
JSON output format is explicit
Tone rules are in AGENTS.md

Comparison: Claude Code vs Codex Review

CLAUDE CODE OUTPUT                    CODEX OUTPUT
────────────────────────────────────────────────────────────────────
"Great job on implementing            {"summary": "2 critical issues",
 the login functionality!              "issues": [
 The code looks clean and               {"severity": "critical",
 well-structured.                        "file": "src/auth/login.ts",
                                         "line": 45,
 I noticed a few things                 "issue": "SQL injection...",
 that could be improved:                "fix": "Use parameterized query"}
 - Maybe consider adding               ],
   some validation?                    "verdict": "blocked"}
 - The query could
   potentially be optimized            Specific, actionable,
 - You might want to                   severity-coded, blocked.
 think about rate limiting"

Overall, ready to merge.
Nice work!"

Vague, positive, no urgency.

The difference matters for production code. Claude’s review might pass the SQL injection to production. Codex’s review blocks it explicitly.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 OpenAI Codex CLI Documentation
👨‍💻 Reddit Discussion: Claude Code vs Codex Differences
👨‍💻 Agent Skills Specification

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!