Skip to content

Configure Codex for Rigorous Code Review: A Practical Setup Guide

Problem

I asked Claude Code to review my authentication module. The response started with “Great job!” and ended with “Looks ready to merge.” Two days later, a SQL injection vulnerability made it to production.

The issue wasn’t Claude’s capability. It was Claude’s personality. Claude wants to be helpful and supportive. It avoids confrontation. When asked to review code, it defaults to positive reinforcement.

Then I tried Codex CLI with a properly configured review skill. Same code, completely different result:

codex-review-output.json
{
"summary": "Authentication module has 2 critical security vulnerabilities",
"issues": [
{
"severity": "critical",
"file": "src/auth/login.ts",
"line": 45,
"issue": "SQL injection in username parameter",
"fix": "Use parameterized query"
}
],
"verdict": "blocked"
}

No pleasantries. No hedging. Just actionable findings with an explicit “blocked” verdict.

Why Codex Works Better for Reviews

The difference comes from Codex’s governance architecture. Codex treats AGENTS.md as a constitutional document - hierarchical, explicitly law-like rules that the agent must follow.

Claude’s CLAUDE.md is more coaching-oriented. It provides guidance, but Claude still defaults to its supportive personality. When conflicts arise between “be helpful” and “find bugs,” helpful wins.

Codex has three structural advantages:

1. Structured Output Schema

Reviews must output JSON. Not prose. Not markdown. A parseable schema with:

  • Severity levels (critical/important/minor)
  • Exact file and line numbers
  • Specific fix recommendations
  • Explicit verdict (blocked/needs-fixes/approved)

2. Tone Enforcement

The review skill explicitly bans:

  • “Great job” and similar pleasantries
  • “Looks good” without verification
  • “Maybe consider” hedging language
  • “Everything seems fine” assumptions

Only discrete, provable bugs allowed. No opinions.

3. Explicit Verdict Requirement

Every review must end with a verdict. No “overall this looks nice” summaries. Three options:

  • blocked - Critical issues, cannot merge
  • needs-fixes - Important issues, should fix before merge
  • approved - No blocking issues found

Setting Up AGENTS.md for Review Governance

Create an AGENTS.md file in your project root. This becomes the “constitution” that Codex follows.

AGENTS.md
# Code Review Governance
## Review Protocol
When reviewing code changes:
### 1. Output Format (MANDATORY)
```json
{
"summary": "Brief assessment",
"issues": [
{
"severity": "critical|important|minor",
"file": "path/to/file",
"line": 42,
"issue": "Description",
"fix": "Recommended fix"
}
],
"verdict": "blocked|needs-fixes|approved"
}
```
### 2. Severity Criteria
**Critical**: Security vulnerabilities, data loss risk, broken functionality
**Important**: Missing requirements, poor error handling, test gaps
**Minor**: Style issues, optimization opportunities
### 3. Tone Rules
- No pleasantries ("Great job", "Looks good")
- No hedging ("might be", "could consider")
- Only provable, discrete bugs
- Actionable fix recommendations
### 4. Forbidden Responses
- "Everything looks good"
- "No issues found" (without verification)
- "Ready to merge" (without explicit checks)

The key is the (MANDATORY) marker. Codex treats these as law, not suggestions.

Creating the Review Skill

Skills go in ~/.codex/skills/. Create a rigorous-review directory with a SKILL.md file.

~/.codex/skills/rigorous-review/SKILL.md
---
name: rigorous-review
description: Use when code changes need formal review - produces structured JSON output with severity-coded issues
---
# Rigorous Code Review
You are reviewing code for production readiness.
**CRITICAL: Follow AGENTS.md governance rules.**
## Review Checklist
1. Security vulnerabilities (injection, auth bypass, data leak)
2. Logic errors and edge cases
3. Error handling completeness
4. Test coverage gaps
5. Performance implications
6. Breaking changes detection
## Output: JSON ONLY
```json
{
"summary": "...",
"issues": [...],
"verdict": "..."
}
```
No prose. No pleasantries. Just structured findings.

The frontmatter description field tells Codex when to use this skill. The name field becomes the command: /rigorous-review.

Running Codex Review

After setup, invoke the review:

Terminal
codex exec "review src/auth/ changes between HEAD~5 and HEAD" --full-auto

The --full-auto flag lets Codex complete without interaction. Output goes to stdout by default.

For CI/CD integration, capture output to a file:

Terminal
codex exec "review staged changes" --full-auto --output .review/result.json

CI/CD Integration with GitHub Actions

The JSON output enables automated gates. Here’s a workflow that blocks merges on critical issues:

.github/workflows/ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
codex-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Codex Review
run: |
codex exec "review changes for security and logic errors" \
--full-auto \
--output review.json
- name: Check Review Verdict
run: |
VERDICT=$(jq -r '.verdict' review.json)
CRITICAL=$(jq '.issues | map(select(.severity=="critical")) | length' review.json)
if [ "$VERDICT" = "blocked" ] || [ "$CRITICAL" -gt 0 ]; then
echo "Critical issues found. Review blocked."
jq '.issues | map(select(.severity=="critical"))' review.json
exit 1
fi
- name: Post Review Comment
uses: actions/github-script@v6
with:
script: |
const review = require('./review.json');
const body = `## AI Code Review Results\n\n` +
`**Verdict**: ${review.verdict}\n\n` +
`**Issues Found**:\n` +
review.issues.map(i =>
`- **${i.severity}**: ${i.file}:${i.line} - ${i.issue}`
).join('\n');
await github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});

This workflow:

  1. Runs Codex review on every PR
  2. Parses JSON output with jq
  3. Blocks merge if verdict is “blocked” or critical issues exist
  4. Posts findings as a PR comment

The Codex-Supervising-Claude Pattern

The best workflow uses both agents. Claude Code for implementation, Codex for verification.

Dual-Agent Review Workflow
Step 1: Claude Code implements feature
Output: Draft code changes
Step 2: Codex reviews Claude's output
Input: Draft code + AGENTS.md rules
Output: JSON with issues found
Step 3: Claude Code fixes issues
Input: Codex review JSON
Output: Revised code
Step 4: Codex re-reviews
Input: Revised code
Output: JSON (hopefully empty issues)
Step 5: If verdict="approved": Merge

This pattern catches 30-50% more bugs than Claude-only review. Codex finds what Claude glosses over.

Common Setup Mistakes

I made all of these mistakes before getting Codex review working correctly:

Copying Claude’s skill format - Codex uses different frontmatter fields. The name field becomes the command, not the filename.

Allowing prose output - Without explicit JSON requirement, Codex defaults to markdown. The output becomes unparseable.

Skipping severity classification - Without severity levels, everything becomes equally urgent. Critical SQL injection gets the same weight as missing documentation.

No explicit verdict - Without a required verdict, reviews end with summaries. “Overall looks reasonable” isn’t actionable.

Allowing “Great job” - One pleasantry and the review tone shifts. Codex starts finding fewer issues to stay “helpful.”

Review Types Beyond Security

The same structure works for different review focuses:

AGENTS.md - Performance Review Section
## Performance Review Protocol
Check for:
- N+1 queries
- Unbounded loops
- Memory leaks
- Missing indexes
- Large payload transfers
Output format: perf-review.json
AGENTS.md - Architecture Review Section
## Architecture Review Protocol
Check for:
- SOLID principle violations
- Circular dependencies
- Layer violations
- God objects/functions
Output format: arch-review.json

Create separate skills for each review type. Invoke with /perf-review or /arch-review.

Verification After Setup

Test with intentionally buggy code. Create a file with obvious issues:

test-review.js
function login(username, password) {
// SQL injection vulnerability
const query = "SELECT * FROM users WHERE username = '" + username + "'";
// No rate limiting
// No input validation
return db.execute(query);
}

Run review:

Terminal
codex exec "review test-review.js" --full-auto

Expected output should include:

  • Critical severity for SQL injection
  • Line 3 reference
  • Parameterized query fix recommendation
  • “blocked” verdict

If you get “Looks good!” or missing issues, check:

  1. AGENTS.md is in project root
  2. Skill is in ~/.codex/skills/rigorous-review/
  3. JSON output format is explicit
  4. Tone rules are in AGENTS.md

Comparison: Claude Code vs Codex Review

Review Output Comparison
CLAUDE CODE OUTPUT CODEX OUTPUT
────────────────────────────────────────────────────────────────────
"Great job on implementing {"summary": "2 critical issues",
the login functionality! "issues": [
The code looks clean and {"severity": "critical",
well-structured. "file": "src/auth/login.ts",
"line": 45,
I noticed a few things "issue": "SQL injection...",
that could be improved: "fix": "Use parameterized query"}
- Maybe consider adding ],
some validation? "verdict": "blocked"}
- The query could
potentially be optimized Specific, actionable,
- You might want to severity-coded, blocked.
think about rate limiting"
Overall, ready to merge.
Nice work!"
Vague, positive, no urgency.

The difference matters for production code. Claude’s review might pass the SQL injection to production. Codex’s review blocks it explicitly.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments