Safety settings to create before running Codex for long sessions¶

For / Key Points

For: Practitioners who want Codex to prepare larger changes or PRs without losing control of scope and review.

Key Points:

Narrow the work area before tuning the model
Treat sandbox and approval as separate controls
Use AGENTS.md and stop conditions to reduce repeated instructions

Long Codex sessions do not become reliable just because the model is strong. The useful question is more practical: where may Codex read, where may it edit, and when must it stop?

This guide focuses on the boundary settings to define before giving Codex work that may run for many minutes. OpenAI describes Codex CLI as a local coding agent that can read, modify, and run code in a selected directory¹. That makes scope, approval, and review rules part of the operating design, not an afterthought.

Narrow the work area first¶

The first setting is not the model. It is the set of files Codex may touch.

If the target is broad, Codex may find a path that looks efficient while still being hard to review. Start with one directory, one theme, and one completion condition. For article work, that may mean only the JP and EN pair under docs/blog/.

Do not bundle repository cleanup, article drafting, configuration edits, and PR creation into one request. The more mixed the request becomes, the harder it is to tell whether the result stayed inside the intended job.

This boundary makes review smaller. Before judging whether the prose or code is good, the reviewer can ask a simpler question: did the diff stay where it was allowed to stay? For long sessions, a narrow diff is part of the quality bar.

Decide sandbox and approval separately¶

Sandbox is the technical boundary around what Codex can access. Approval is the operating rule for when Codex should ask before moving past that boundary². Treating them as one setting tends to create either too much access or too many interruptions.

OpenAI documents local CLI and IDE defaults that keep network access off and keep writes focused on the workspace³. For routine work, that default shape is a sensible starting point. Then add only the operations the task truly needs.

Decision	Initial shape	Review question
Work area	One target directory	Did the diff leave the intended area?
Sandbox	Workspace-centered edits	Did the run depend on external files or network?
Approval	Ask for external operations and broad deletion	Did anything proceed without the expected approval?
Stop condition	No PR when the quality gate fails	Was a failure recorded as success?

The key move is to create a stopping point before widening access. If approvals feel noisy, reduce the actions that need approval before removing the approval step.

Use AGENTS.md to reduce repeated promises¶

For long sessions, do not rely only on the first prompt. The longer the work runs, the more local decisions accumulate.

OpenAI explains that Codex reads AGENTS.md before work and applies instructions by repository scope⁴. That means repeated chat instructions can move into a durable repository rule.

AGENTS.md is a good place for rules such as:

directories Codex may edit
checks Codex may run
heavy commands Codex should avoid
date, language-pair, and publication rules
required harnesses for git or GitHub operations

With those rules in the repository, the prompt can stay short. "Run one candidate under these rules" is enough when the operating contract already lives beside the code.

Write the exit before the run¶

The weak point of a long run is that it becomes harder to stop after many small decisions have stacked up. So write the exit first.

Stop conditions should describe what happens when the result is not ready. For example: try a failing test once, hold an unverified claim, or skip PR creation when a quality gate fails.

OpenAI's Codex best practices emphasize reviewable work units and using tests or checks to verify the result⁵. The same idea matters for longer sessions. The final output should be the diff that met the conditions, not a diary of effort.

At the end, keep four facts visible:

what was in scope
what was run
what was not run
what the human reviewer should inspect next

If those facts are present, review stays short even after a longer run. If they are missing, the result may look useful but will be difficult to repeat.

Summary¶

The first thing to create before a long Codex run is not a larger prompt. It is a small operating contract: scope, sandbox, approval, AGENTS.md rules, and stop conditions.

With that contract, Codex is not an open-ended worker. It becomes an executor that prepares reviewable PR candidates inside a narrow area.

Run one candidate first and check that quality gates and review do not regress. What you later copy is not the article text. It is the operating contract and the evaluation set.

Codex / Codex CLI Complete Guide