Codex Reviewing Codex: Second-Pass Review with Independent Sessions¶

For / Key Points

For: Intermediate to advanced developers using Codex CLI as their primary coding agent and designing repeatable plan or implementation review steps

Key Points:

A separate Codex session can still be useful even when the implementation was also done by Codex
The value is not perfect objectivity; it is a second pass with less implementation-session context
In practice, call a review Skill first; raw codex exec is the implementation detail underneath

When Codex implements a change, the natural next question is whether another model should review it. That is a valid pattern. Claude, Gemini, or another agent may bring different failure modes and different instincts.

But model diversity is not the only source of review value. If Codex is already your main coding agent, you can still ask a separate Codex session to read only the diff, plan, acceptance criteria, and test output.

This article answers a narrow question: does Codex reviewing Codex have practical value?

The Answer: Not Objective, But More Independent¶

A same-model review is not fully objective. If the model, repo rules, and prompt style are similar, the review can still share the same blind spots.

It can still be more independent.

The implementation session carries design choices, abandoned alternatives, user discussion, and implicit justification for the path it took. A reviewer that only receives the diff and criteria sees something closer to a pull request. It has less conversational inertia.

Human review works the same way. The implementer rereading their own code is useful, but a separate reviewer reading the PR diff is better at challenging assumptions. The point is not to create a different personality. The point is to separate inputs and roles.

The causal chain is simple: the working session accumulates reasons for a decision, those reasons make the diff easier to excuse, a fresh session receives only the plan or diff, it reads the artifact without that history, and it is more likely to catch gaps in requirements, tests, and risk.

This is not a guarantee. The same model, prompt style, and diff can still share the same blind spots. Treat the independent session as a second pass over evidence, not as proof that the change is safe.

Three Review Patterns¶

Question for this section: what does each review pattern actually separate?

Codex review workflows are easier to reason about when split into three patterns.

Pattern	What it separates	Strong fit	Weak spot
Different-model review	Model behavior	Design tradeoffs and missed risks	Context must be reattached
Separate Codex session	Conversation history and role	Skill-driven plan review, diff review, and cool-down checks	Same-model habits remain
Codex subagents	Review perspective and parallelism	Security, tests, maintainability	Higher token and control cost

GitHub Agent HQ is closer to the first pattern. GitHub describes assigning Copilot, Claude, and Codex to the same issue or PR to compare results and surface tradeoffs or edge cases earlier¹.

This article is narrower. It focuses on the second and third patterns: keep Codex as the agent family, but separate the review context from the implementation context.

Start with Review Skills¶

Question for this section: what should the human-facing entry point be if users should not hand-write codex exec commands?

In day-to-day work, the entry point should not be a hand-written codex exec command. It should be a review Skill. That is the same shape as a Claude Code to Codex CLI review loop: the user asks for review, and the Skill fixes the diff capture, read-only execution, verdict format, and maximum review rounds.

This repository separates two review Skills:

codex-review: reviews an implementation plan before coding
codex-impl-review: reviews the actual diff before commit

When Codex is the main coding agent, codex-review can review the plan before implementation in a separate session. After implementation, codex-impl-review can review the current git diff and, when a plan exists, check the diff against that plan. So the architecture is still "Codex calls Codex," but the user-facing unit is the Skill, not the raw command.

Under the hood, the flow looks like this. The unique ID and temporary files are not instructions for a human to type manually. They are implementation details the Skill uses to isolate review runs safely.

flowchart TD
    A[User asks for plan or implementation review] --> B{Which Skill?}
    B -->|Plan review| C[codex-review captures the plan]
    B -->|Diff review| D[codex-impl-review captures plan if available]
    D --> E[capture current git diff]
    C --> F[Generate unique review ID]
    E --> F
    F --> G[Start Codex CLI in read-only mode]
    G --> H[Codex reviews plan, diff, tests, and risks]
    H --> I{Verdict}
    I -->|APPROVED| J[Report approval and reviewed scope]
    I -->|REVISE| K[Revise plan or implementation]
    K --> L[Refresh plan or diff]
    L --> M[Resume review session for follow-up]
    M --> I

OpenAI's non-interactive mode documentation describes codex exec as suitable for automation such as CI and pre-merge checks². It also documents codex exec resume for continuing a previous run². That means you can choose whether a review should continue prior context or start fresh.

That distinction matters.

Plan review: the Skill reviews the implementation plan before coding starts
Tracking review: the Skill continues the same review session to verify fixes
Independent review: the Skill starts a fresh review session to avoid carrying the working-session context
Final audit: after an iterative loop converges, run one fresh session over the whole plan or diff

Resume is useful when the goal is continuity. Fresh execution is useful when the goal is distance.

Splitting Review with Subagents¶

Question for this section: if a fresh session is not enough, what smaller review lenses should be separated?

Codex also supports subagent workflows. The official docs describe spawning specialized agents in parallel and collecting their results into one response³.

For review, that maps directly:

Review this PR against main.
Spawn one subagent per perspective and summarize the results.

1. Correctness and behavior regressions
2. Security and permission boundaries
3. Missing tests and weak test design
4. Maintainability and future change cost

The Codex subagents docs include a PR review example that splits work across pr_explorer, reviewer, and docs_researcher custom agents³. Custom agents can also set fields such as model, model_reasoning_effort, and sandbox_mode³.

The value here is not just a different model. It is a different review lens. One prompt that asks for everything often produces a broad but shallow report. Separate reviewer agents can stay narrow: one can focus on security, another on tests, another on maintainability.

OpenAI's Own Workflow Points the Same Way¶

This is not just a local trick. In OpenAI's Harness Engineering article, Ryan Lopopolo describes instructing Codex to review its own changes locally, request additional agent reviews, respond to feedback, and iterate until reviewers are satisfied⁴.

The same article says review effort moved heavily toward agent-to-agent review⁴. That cuts against the idea that a Codex-family reviewer is useless because the implementer was also Codex. The important part is system design: make work legible, make review criteria explicit, and connect feedback to verification.

There is a caveat. OpenAI's example assumes a strong harness. That caveat matters when deciding where this pattern belongs.

How to Add the Review Skills¶

Question for this section: how do you translate the Claude Code setup path into the Codex setup path?

The setup here is about review Skills, not Codex CLI itself. Use a Skill when you want to standardize plan review, diff capture, read-only execution, review rounds, and verdict format across a team.

Use the same adoption shape as Level 1 in the Claude Code x Codex review-loop article. The only real difference is the destination path: Claude Code uses .claude/skills/..., while Codex repository Skills live under .agents/skills/...⁵.

The mapping is:

Environment	Location	Invocation
Claude Code calling Codex	`.claude/skills/codex-review/SKILL.md`	`/codex-review`
Codex calling a Codex review pass	`.agents/skills/codex-review/SKILL.md`	`/codex-review` or explicit Skill mention

The current Codex CLI does not expose a CLI subcommand named codex skill install .... The official docs do describe $skill-installer for curated local skills or skills from other repositories⁵. This article is about repo-scoped Skills checked into git, so the flow stays close to the existing article: create the directory, then place SKILL.md.

Checked on May 9, 2026

OpenAI documents .agents/skills as a repository Skill location for Codex⁵. It also documents codex exec resume for continuing non-interactive runs². The local Codex CLI in this environment includes a codex review subcommand, but the official CLI reference command overview does not list it as a stable primary command. This article therefore does not depend on codex review for the setup path⁷.

npm install -g @openai/codex
mkdir -p .agents/skills/codex-review
curl -L \
  https://gist.githubusercontent.com/LuD1161/84102959a9375961ad9252e4d16ed592/raw \
  -o /tmp/codex-review.claude.SKILL.md
cp /tmp/codex-review.claude.SKILL.md .agents/skills/codex-review/SKILL.md
$EDITOR .agents/skills/codex-review/SKILL.md
/codex-review

Aseem Shrey's Gist was originally written for Claude Code calling Codex CLI⁶. For Codex-to-Codex review, the point is not to treat that file as a magic universal installer. Use the same placement flow, then adapt the actor, review target, and stopping condition for Codex.

At minimum, check these items:

Replace wording that assumes Claude or Claude Code with working-Codex and reviewer-Codex roles
Rename temp files such as /tmp/claude-plan-... if you want Codex-specific names
Verify model flags such as -m gpt-5.3-codex, or omit them and use your config default
Separate resume rounds for follow-up verification from fresh-session rounds for final audit
Keep the exit condition explicit, such as VERDICT: APPROVED / VERDICT: REVISE

If you only need plan review, codex-review is enough. If you also want pre-commit diff review, add codex-impl-review with the same placement pattern:

.agents/
└── skills/
    ├── codex-review/
    │   └── SKILL.md
    └── codex-impl-review/
        └── SKILL.md

mkdir -p .agents/skills/codex-impl-review
$EDITOR .agents/skills/codex-impl-review/SKILL.md

If your team has a template, place its SKILL.md there. Otherwise, duplicate codex-review and adjust the target to git diff HEAD.

From the user's side, call them like this:

Plan review before implementation: "Use /codex-review to review this plan."
Diff review after implementation: "Use /codex-impl-review to review the current diff."

Codex can invoke a Skill implicitly when the task matches its description. In CLI and IDE surfaces, users can also run /skills or type $ to explicitly mention a Skill⁵. If a new Skill does not appear, restart Codex.

When to Use It¶

Question for this section: when does a second pass justify its waiting time and token cost?

Without evidence, review becomes commentary

A separate session is not enough by itself. Without tests, CI, logs, UI validation, or acceptance criteria, the review can collapse into plausible commentary. The value comes from passing a narrow artifact plus evidence, not from merely asking another session.

Codex-to-Codex review should not run on every edit. For typo fixes and small link changes, the wait time and token cost can exceed the risk.

Use it where the cost of missing something is higher than the cost of another pass.

Authentication, permissions, payments, deletion, or irreversible data changes
Refactors that change architecture or ownership boundaries
Large diffs produced by an agent across many files
Changes where tests pass but spec alignment is uncertain
Pull requests where a human reviewer needs a narrowed list of concerns

Avoid sending a vague "review everything" prompt. Independence without scope produces scattered findings. A useful second pass fixes the review target, perspective, verdict format, and maximum number of rounds.

Context contamination is most likely when:

the working session repeatedly explained why a design was chosen
the user or agent accepted a direction early and the rest of the session reinforces it
a large diff looks coherent only if you already know the implementation intent
tests are thin enough that review drifts toward "probably works"

Fresh sessions can still share the same blind spot. Treat independent review as a second pass that narrows concerns before lint, type checks, tests, or human review, not as a replacement for them.

Relation to Cross-Model Review¶

Claude Code to Codex review uses model difference as part of the value. Codex to Codex review keeps the model family but separates context and role.

They are not competing patterns.

For high-risk changes, you might first run a fresh Codex second pass, then ask a different model or GitHub Agent HQ for a contrasting view¹. Review is not a single event. It is a set of layers scaled to risk.

That is the right frame: Codex-to-Codex review is not self-review. It is a second pass separated from the implementation session.

Summary¶

The value of Codex reviewing Codex is not model novelty. The value is reviewing the artifact with less implementation-session baggage.

That is not perfect objectivity. It is better independence.

In practice, use Skills such as codex-review and codex-impl-review to fix the review procedure, then let the Skill choose between fresh execution, resume for follow-up verification, and subagents for focused review lenses. In all cases, make the reviewer read-only, pass a bounded diff or plan, require evidence, and define the exit condition. A second pass becomes useful when it is part of the harness, not just another prompt.

Automating the Claude Code x Codex Review Loop — Cross-model review patterns using Claude Code and Codex
Codex /goal: Stop Typing "Keep Going" — Treating long-running loops as state machines that can stop
What Should Harness Engineering Build? — How to sequence review, validation, and permission boundaries

GitHub Blog, Pick your agent: Use Claude and Codex on Agent HQ, February 4, 2026. ↩↩
OpenAI Developers, Non-interactive mode (codex exec and codex exec resume). ↩↩↩
OpenAI Developers, Subagents (parallel subagents, custom agents, and PR review example). ↩↩↩
Ryan Lopopolo, OpenAI, Harness engineering: leveraging Codex in an agent-first world, February 11, 2026. ↩↩
OpenAI Developers, Agent Skills (SKILL.md, .agents/skills, explicit and implicit invocation). ↩↩↩↩
Aseem Shrey, I Made Claude and Codex Argue Until My Code Plan Was Actually Good, February 20, 2026; SKILL.md on GitHub Gist ↩
OpenAI Developers, Command line options – Codex CLI (official CLI command reference). ↩