Safety settings to create before running Codex for long sessions¶
For / Key Points
For: Practitioners who want Codex to prepare larger changes or PRs without losing control of scope and review.
Key Points:
- Narrow the work area before tuning the model
- Treat sandbox and approval as separate controls
- Use AGENTS.md and stop conditions to reduce repeated instructions
Long Codex sessions do not become reliable just because the model is strong. The useful question is more practical: where may Codex read, where may it edit, and when must it stop?
This guide focuses on the boundary settings to define before giving Codex work that may run for many minutes. OpenAI describes Codex CLI as a local coding agent that can read, modify, and run code in a selected directory1. That makes scope, approval, and review rules part of the operating design, not an afterthought.
Narrow the work area first¶
The first setting is not the model. It is the set of files Codex may touch.
If the target is broad, Codex may find a path that looks efficient while still being hard to review. Start with one directory, one theme, and one completion condition. For article work, that may mean only the JP and EN pair under docs/blog/.
Do not bundle repository cleanup, article drafting, configuration edits, and PR creation into one request. The more mixed the request becomes, the harder it is to tell whether the result stayed inside the intended job.
This boundary makes review smaller. Before judging whether the prose or code is good, the reviewer can ask a simpler question: did the diff stay where it was allowed to stay? For long sessions, a narrow diff is part of the quality bar.
Decide sandbox and approval separately¶
Sandbox is the technical boundary around what Codex can access. Approval is the operating rule for when Codex should ask before moving past that boundary2. Treating them as one setting tends to create either too much access or too many interruptions.
OpenAI documents local CLI and IDE defaults that keep network access off and keep writes focused on the workspace3. For routine work, that default shape is a sensible starting point. Then add only the operations the task truly needs.
| Decision | Initial shape | Review question |
|---|---|---|
| Work area | One target directory | Did the diff leave the intended area? |
| Sandbox | Workspace-centered edits | Did the run depend on external files or network? |
| Approval | Ask for external operations and broad deletion | Did anything proceed without the expected approval? |
| Stop condition | No PR when the quality gate fails | Was a failure recorded as success? |
The key move is to create a stopping point before widening access. If approvals feel noisy, reduce the actions that need approval before removing the approval step.
Use AGENTS.md to reduce repeated promises¶
For long sessions, do not rely only on the first prompt. The longer the work runs, the more local decisions accumulate.
OpenAI explains that Codex reads AGENTS.md before work and applies instructions by repository scope4. That means repeated chat instructions can move into a durable repository rule.
AGENTS.md is a good place for rules such as:
- directories Codex may edit
- checks Codex may run
- heavy commands Codex should avoid
- date, language-pair, and publication rules
- required harnesses for git or GitHub operations
With those rules in the repository, the prompt can stay short. "Run one candidate under these rules" is enough when the operating contract already lives beside the code.
Write the exit before the run¶
The weak point of a long run is that it becomes harder to stop after many small decisions have stacked up. So write the exit first.
Stop conditions should describe what happens when the result is not ready. For example: try a failing test once, hold an unverified claim, or skip PR creation when a quality gate fails.
OpenAI's Codex best practices emphasize reviewable work units and using tests or checks to verify the result5. The same idea matters for longer sessions. The final output should be the diff that met the conditions, not a diary of effort.
At the end, keep four facts visible:
- what was in scope
- what was run
- what was not run
- what the human reviewer should inspect next
If those facts are present, review stays short even after a longer run. If they are missing, the result may look useful but will be difficult to repeat.
Summary¶
The first thing to create before a long Codex run is not a larger prompt. It is a small operating contract: scope, sandbox, approval, AGENTS.md rules, and stop conditions.
With that contract, Codex is not an open-ended worker. It becomes an executor that prepares reviewable PR candidates inside a narrow area.
Run one candidate first and check that quality gates and review do not regress. What you later copy is not the article text. It is the operating contract and the evaluation set.