Configure Codex for Rigorous Code Review: A Practical Setup Guide
Problem
I asked Claude Code to review my authentication module. The response started with “Great job!” and ended with “Looks ready to merge.” Two days later, a SQL injection vulnerability made it to production.
The issue wasn’t Claude’s capability. It was Claude’s personality. Claude wants to be helpful and supportive. It avoids confrontation. When asked to review code, it defaults to positive reinforcement.
Then I tried Codex CLI with a properly configured review skill. Same code, completely different result:
{ "summary": "Authentication module has 2 critical security vulnerabilities", "issues": [ { "severity": "critical", "file": "src/auth/login.ts", "line": 45, "issue": "SQL injection in username parameter", "fix": "Use parameterized query" } ], "verdict": "blocked"}No pleasantries. No hedging. Just actionable findings with an explicit “blocked” verdict.
Why Codex Works Better for Reviews
The difference comes from Codex’s governance architecture. Codex treats AGENTS.md as a constitutional document - hierarchical, explicitly law-like rules that the agent must follow.
Claude’s CLAUDE.md is more coaching-oriented. It provides guidance, but Claude still defaults to its supportive personality. When conflicts arise between “be helpful” and “find bugs,” helpful wins.
Codex has three structural advantages:
1. Structured Output Schema
Reviews must output JSON. Not prose. Not markdown. A parseable schema with:
- Severity levels (critical/important/minor)
- Exact file and line numbers
- Specific fix recommendations
- Explicit verdict (blocked/needs-fixes/approved)
2. Tone Enforcement
The review skill explicitly bans:
- “Great job” and similar pleasantries
- “Looks good” without verification
- “Maybe consider” hedging language
- “Everything seems fine” assumptions
Only discrete, provable bugs allowed. No opinions.
3. Explicit Verdict Requirement
Every review must end with a verdict. No “overall this looks nice” summaries. Three options:
blocked- Critical issues, cannot mergeneeds-fixes- Important issues, should fix before mergeapproved- No blocking issues found
Setting Up AGENTS.md for Review Governance
Create an AGENTS.md file in your project root. This becomes the “constitution” that Codex follows.
# Code Review Governance
## Review Protocol
When reviewing code changes:
### 1. Output Format (MANDATORY)
```json{ "summary": "Brief assessment", "issues": [ { "severity": "critical|important|minor", "file": "path/to/file", "line": 42, "issue": "Description", "fix": "Recommended fix" } ], "verdict": "blocked|needs-fixes|approved"}```
### 2. Severity Criteria
**Critical**: Security vulnerabilities, data loss risk, broken functionality**Important**: Missing requirements, poor error handling, test gaps**Minor**: Style issues, optimization opportunities
### 3. Tone Rules
- No pleasantries ("Great job", "Looks good")- No hedging ("might be", "could consider")- Only provable, discrete bugs- Actionable fix recommendations
### 4. Forbidden Responses
- "Everything looks good"- "No issues found" (without verification)- "Ready to merge" (without explicit checks)The key is the (MANDATORY) marker. Codex treats these as law, not suggestions.
Creating the Review Skill
Skills go in ~/.codex/skills/. Create a rigorous-review directory with a SKILL.md file.
---name: rigorous-reviewdescription: Use when code changes need formal review - produces structured JSON output with severity-coded issues---
# Rigorous Code Review
You are reviewing code for production readiness.
**CRITICAL: Follow AGENTS.md governance rules.**
## Review Checklist
1. Security vulnerabilities (injection, auth bypass, data leak)2. Logic errors and edge cases3. Error handling completeness4. Test coverage gaps5. Performance implications6. Breaking changes detection
## Output: JSON ONLY
```json{ "summary": "...", "issues": [...], "verdict": "..."}```
No prose. No pleasantries. Just structured findings.The frontmatter description field tells Codex when to use this skill. The name field becomes the command: /rigorous-review.
Running Codex Review
After setup, invoke the review:
codex exec "review src/auth/ changes between HEAD~5 and HEAD" --full-autoThe --full-auto flag lets Codex complete without interaction. Output goes to stdout by default.
For CI/CD integration, capture output to a file:
codex exec "review staged changes" --full-auto --output .review/result.jsonCI/CD Integration with GitHub Actions
The JSON output enables automated gates. Here’s a workflow that blocks merges on critical issues:
name: AI Code Review
on: pull_request: types: [opened, synchronize]
jobs: codex-review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- name: Run Codex Review run: | codex exec "review changes for security and logic errors" \ --full-auto \ --output review.json
- name: Check Review Verdict run: | VERDICT=$(jq -r '.verdict' review.json) CRITICAL=$(jq '.issues | map(select(.severity=="critical")) | length' review.json)
if [ "$VERDICT" = "blocked" ] || [ "$CRITICAL" -gt 0 ]; then echo "Critical issues found. Review blocked." jq '.issues | map(select(.severity=="critical"))' review.json exit 1 fi
- name: Post Review Comment uses: actions/github-script@v6 with: script: | const review = require('./review.json'); const body = `## AI Code Review Results\n\n` + `**Verdict**: ${review.verdict}\n\n` + `**Issues Found**:\n` + review.issues.map(i => `- **${i.severity}**: ${i.file}:${i.line} - ${i.issue}` ).join('\n');
await github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: body });This workflow:
- Runs Codex review on every PR
- Parses JSON output with jq
- Blocks merge if verdict is “blocked” or critical issues exist
- Posts findings as a PR comment
The Codex-Supervising-Claude Pattern
The best workflow uses both agents. Claude Code for implementation, Codex for verification.
Step 1: Claude Code implements feature Output: Draft code changes
Step 2: Codex reviews Claude's output Input: Draft code + AGENTS.md rules Output: JSON with issues found
Step 3: Claude Code fixes issues Input: Codex review JSON Output: Revised code
Step 4: Codex re-reviews Input: Revised code Output: JSON (hopefully empty issues)
Step 5: If verdict="approved": MergeThis pattern catches 30-50% more bugs than Claude-only review. Codex finds what Claude glosses over.
Common Setup Mistakes
I made all of these mistakes before getting Codex review working correctly:
Copying Claude’s skill format - Codex uses different frontmatter fields. The name field becomes the command, not the filename.
Allowing prose output - Without explicit JSON requirement, Codex defaults to markdown. The output becomes unparseable.
Skipping severity classification - Without severity levels, everything becomes equally urgent. Critical SQL injection gets the same weight as missing documentation.
No explicit verdict - Without a required verdict, reviews end with summaries. “Overall looks reasonable” isn’t actionable.
Allowing “Great job” - One pleasantry and the review tone shifts. Codex starts finding fewer issues to stay “helpful.”
Review Types Beyond Security
The same structure works for different review focuses:
## Performance Review Protocol
Check for:- N+1 queries- Unbounded loops- Memory leaks- Missing indexes- Large payload transfers
Output format: perf-review.json## Architecture Review Protocol
Check for:- SOLID principle violations- Circular dependencies- Layer violations- God objects/functions
Output format: arch-review.jsonCreate separate skills for each review type. Invoke with /perf-review or /arch-review.
Verification After Setup
Test with intentionally buggy code. Create a file with obvious issues:
function login(username, password) { // SQL injection vulnerability const query = "SELECT * FROM users WHERE username = '" + username + "'"; // No rate limiting // No input validation return db.execute(query);}Run review:
codex exec "review test-review.js" --full-autoExpected output should include:
- Critical severity for SQL injection
- Line 3 reference
- Parameterized query fix recommendation
- “blocked” verdict
If you get “Looks good!” or missing issues, check:
- AGENTS.md is in project root
- Skill is in
~/.codex/skills/rigorous-review/ - JSON output format is explicit
- Tone rules are in AGENTS.md
Comparison: Claude Code vs Codex Review
CLAUDE CODE OUTPUT CODEX OUTPUT────────────────────────────────────────────────────────────────────"Great job on implementing {"summary": "2 critical issues", the login functionality! "issues": [ The code looks clean and {"severity": "critical", well-structured. "file": "src/auth/login.ts", "line": 45, I noticed a few things "issue": "SQL injection...", that could be improved: "fix": "Use parameterized query"} - Maybe consider adding ], some validation? "verdict": "blocked"} - The query could potentially be optimized Specific, actionable, - You might want to severity-coded, blocked. think about rate limiting"
Overall, ready to merge.Nice work!"
Vague, positive, no urgency.The difference matters for production code. Claude’s review might pass the SQL injection to production. Codex’s review blocks it explicitly.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 OpenAI Codex CLI Documentation
- 👨💻 Reddit Discussion: Claude Code vs Codex Differences
- 👨💻 Agent Skills Specification
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments