February 2026 Latest Update
GPT-5.3-Codex (latest) includes improved context window handling. This is an updated version with the latest workarounds for Codex CLI v0.101.0+.
Fix Codex Ran Out of Room in Context Window (2026)¶
When Codex CLI says "ran out of room in the model's context window", "start a new thread or clear earlier history before retrying", or "your input exceeds the context window", fix it in 30 seconds: start a fresh session, selectively restore with codex resume --last, then try /compact as fallback.
Target Audience
- Codex CLI beginners (less than 1 month experience)
- Anyone blocked by "ran out of room" or "context window exceeded" errors
- Users experiencing "stream disconnected" issues
Key Points¶
- Start a new Codex session first when the context window is full.
- Use
codex resume --lastonly to recover the turns you still need. - Treat
/compactas a fallback, then split future work into smaller chunks.
Quick Fix: 3 Officially Aligned Steps¶
Follow the order recommended by the CLI message and documentation: reset the session first, then selectively restore history.
Step 1: Start a fresh session¶
# Exit the current session with Ctrl+C
# Then relaunch Codex
codex
ℹ️ Why this is first - The error itself tells you to "Start a new conversation or clear earlier history before retrying." (openai/codex#4926) - Clearing the buffer restores the full context window and eliminates most overflow retries.
Step 2: Restore only what you need with codex resume --last¶
# Resume the most recent session
codex resume --last
💡 Documented command The official "Getting started" guide lists
codex resume --lastas the supported way to reopen your previous session (docs/getting-started.md). Use it after step 1 when you need context snippets without bringing back the entire transcript.
Step 3: Try /compact (if it still fails, go back to step 1)¶
# Inside the Codex prompt
/compact
⚠️ Known limitation -
/compactsummarizes history in place but remains unreliable in current builds, as tracked in openai/codex#4868. - If nothing changes, restart the session instead of retrying endlessly.
Understanding the Root Cause¶
Here's why these errors occur, explained for beginners.
What Is a Context Window?¶
Think of it as temporary "working memory" for AI conversations. It stores your chat history and file contents. When this memory fills up, errors occur.
Common Causes¶
- Conversations get too long
Extended sessions accumulate chat history, filling memory
Too many files loaded
Reading many project files at once quickly fills memory
Connection interruption (stream disconnected)
- When AI thinks for a long time, the network assumes it's unresponsive and cuts the connection
Prevention Tips¶
Build these habits to avoid future errors:
- Start fresh every 30 minutes: Reset regularly during long work sessions
- Minimize file selection: Load only necessary files
- Split large edits: Divide multi-file editing across sessions
🆕 Nov 2025: Auto-Approval-Friendly Playbook¶
ℹ️ Context window overflow is a frequently encountered issue, so we baked the recurring fixes into a reusable playbook for auto-approval runs.
| Step | What to do | Purpose |
|---|---|---|
| 1. Snapshot the baseline | codex --full-auto --transcript analysis/20251108_context.jsonl at session start | Capture before/after diffs |
| 2. Chunk via TodoWrite | Break the TodoWrite checklist into ≤3 bullets per Codex turn | Prevent prompt bloat |
| 3. Recovery macro | Keep codex resume --last && /compact in a shell alias for instant retries | Shrink recovery time |
# context-window-safe.sh
codex --full-auto --transcript analysis/$(date +%Y%m%d)_context.jsonl <<"TASKS"
1. TodoWrite: <paste task URL>
2. Execute step A (<= 3 checklist bullets)
3. Attach transcript path back to TodoWrite
TASKS
Canary rule with TodoWrite
Run gh issue comment <ISSUE> --body "Progress: step N completed" after each chunk: it tells teammates which prompts already ran, avoiding duplicate submissions that inflate the context window.
Latest updates (as of 2025-10-23)¶
- Codex CLI 0.48.0 shipped on 2025-10-23 with a tokenizer-backed truncation path for
unified_exec, which reduces surprises where the reported remaining tokens diverge from the enforced limit (release notes, PR #5514). - The same release adds a local tokenizer and richer event stream updates, helping history management and disconnect recovery (PR #5508, PR #5470).
- Windows/WSL users should follow the refreshed instructions from PR #5307; if transport errors persist, upgrade first and then audit VPNs, antivirus filters, or HTTP compression on the network path.
Troubleshooting Guide¶
| Symptom | Immediate Action | Details |
|---|---|---|
| /compact doesn't work | Try Method 1 (fresh session) | Version-dependent functionality issues |
| stream disconnected | Try Method 1 (fresh session) | Resets the connection |
| Frequent errors | Split tasks + regular resets | Too much work in one session |
| ran out of room | Try Method 1 (fresh session) | Context is full |
Advanced: Technical Details¶
What changed by 2025-10-24 (Click to expand)
- Codex CLI 0.48.0 (released 2025-10-23) now trims
unified_execwith the real tokenizer, reducing drift between reported and enforced token budgets (PR #5514). - The same release adds local tokenizer support, richer event streams, and TUI polish, making it easier to recover from disconnects and review history (PR #5508, PR #5470).
- OpenAI's official docs emphasise the Responses API Background mode for long jobs, so clients don't need to hold SSE connections indefinitely (guide).
- Windows/WSL transport errors remain active on GitHub; upgrade to 0.48.0, apply the refreshed WSL instructions (PR #5307), and audit VPN/antivirus/HTTP compression along the path.
Current High-Frequency Triggers (Click to expand)
- SSE idle timeouts (Cloudflare 100 s, ALB 60 s defaults)
- Long reasoning yields no SSE events; middleboxes terminate the connection; CLI surfaces
stream disconnected. - Response-stream decode failures
Transport error: error decoding response bodyappears on Windows/WSL or extended sessions, often when compression/inspection intermediaries interfere.- TPM/RPM rate limits masked as disconnects
- Actual 429 responses feel like stream drops when retries fire.
- Context-window overflow
- GPT-5-Codex-class models expose 400k-token windows, yet measurement drift or failed compression still push requests past their limits.
Default Idle Timeouts Across Middleboxes (Click to expand)
| Device / Service | Default idle | Notes |
|---|---|---|
| Cloudflare (Free/Pro) | ~100 s | 524 after 100 s. Consider bypass, Workers, or higher tiers for SSE. |
| AWS Application Load Balancer | 60 s (configurable 1–4000 s) | Raise to 180–300 s or more for SSE workloads. |
| Azure Application Gateway for Containers | 300 s (5 min) | Pair with keep-alive comments. |
| Enterprise proxies / security gateways | Product-dependent | HTTP inspection/compression can corrupt SSE chunks. |
Error Fragments: Root Cause and Mitigations (Click to expand)
idle timeout waiting for SSE / stream closed before response.completed¶
- Root cause: long silent phases plus middlebox idle policies.
- Mitigations:
- Move long jobs to Responses API Background mode and avoid clinging to SSE streams (guide).
- Increase idle limits to 180–300 s+ (ALB via
idle_timeout.timeout_seconds). - For Cloudflare, bypass the edge, use Workers/higher plans, or send keep-alive comments with the dummy payload below.
: - Update to the latest Codex CLI (e.g. 0.48.0) to leverage the tokenizer fixes and built-in
responses-api-proxy.
Transport error: error decoding response body¶
- Root cause: chunk corruption, compression/inspection middleboxes, or long-lived connections.
- Mitigations: stay on the latest CLI, disable HTTP compression/inspection, route via
responses-api-proxy, retry with exponential backoff.
Request too large for ... / This request is over the organization TPM/RPM¶
- Root cause: rate limits or token ceilings.
- Mitigations: shrink inputs (summaries, tighter file selection, staged workflows), constrain concurrency, pursue higher quotas, retry with backoff.
Your input exceeds the context window¶
- Root cause: actual overflow or mismatch between estimated and enforced limits.
- Mitigations: rely on the latest CLI's tokenizer-backed truncation plus
/status, tighteninclude/exclude, and split sessions when needed.
Codex CLI 0.48.0 highlights (2025-10-23) (Click to expand)
- Tokenizer-backed truncation for
unified_execkeeps reported and enforced token budgets in sync (PR #5514). - Local tokenizer & richer events improve history visibility and reconnect flows (PR #5508, PR #5470).
- WSL guidance refresh documents fixes for Windows transport issues (PR #5307).
- responses-api-proxy remains bundled for a stable, officially supported SSE tunnel.
Why Background Mode Matters (Click to expand)
- Background mode hands long-running work to the Responses API asynchronously, while clients poll or accept webhooks.
- Without a persistent SSE connection, Cloudflare/ALB idle policies become largely irrelevant.
- Repeated heavy tasks no longer require the client to maintain a fragile live stream.
Input and Rate-Limit Hygiene (Click to expand)
- Use
include/excludeto drop unnecessary files and compress prompts between stages. - Break large workflows into stages and merge results later to conserve context.
- Lower reasoning levels or prompt for intermediate outputs to reduce long silence periods.
- Pair TPM/RPM monitoring with exponential backoff and deliberate job pacing.
Windows / WSL Transport Errors (Click to expand)
- Public issues still report
Transport error: error decoding response body. Network compression, antivirus scanning, or VPN hops are frequent contributors. - Combine the latest CLI, neutralized middleboxes, and backoff strategies; avoid claiming a single definitive cause.
Immediate checklist (2025-10-24 edition) (Click to expand)
- Upgrade Codex CLI to the latest (0.48.0 or newer) via
npm i -g @openai/codex@latest,brew upgrade codex, or the release binaries. - Shift long tasks to Background mode to minimize SSE exposure.
- Tune middlebox idle/keep-alive settings (Cloudflare 100 s constraint, ALB ≥180 s, Azure AGW 5 min).
- Adopt
responses-api-proxyor disable compression/inspection on intermediate proxies. - Enforce input compression and staged execution, adjust reasoning levels when silence becomes dominant.
- Harden rate-limit hygiene: cap concurrency, record usage, apply exponential backoff.
- Monitor logs via
tail -F ~/.codex/log/codex-tui.logand correlate with Request IDs.
Appendix: The 32–33% Context-Window Issue (Click to expand)
- Edge cases where requests fail despite reported headroom still exist. CLI 0.48.0's tokenizer work reduces frequency but has not eliminated them entirely.
- Practical guidance: monitor
/status, treat ~80% remaining context as a soft ceiling, use/compactwhen it behaves, restructure prompts, or start a fresh session.
FAQ¶
Why does Codex say "ran out of room in the model's context window"?¶
Your session history exceeded the available context. Exit with Ctrl+C, relaunch Codex, and pull back only the needed turns with codex resume --last.
How do I keep auto-approval runs from overloading the context window?¶
Split the TodoWrite checklist into 3 or fewer bullet chunks, send them one at a time, and attach each transcript file under analysis/ so Codex never sees the entire backlog at once.
What should I do when "codex re-connecting..." appears during long runs?¶
Capture the transcript, restart the session, and follow the reconnecting issue guide to ensure approvals stay on -a never.
Summary¶
Codex context-window errors are usually session-size problems, not a reason to keep retrying the same prompt. Restart fresh, restore only the useful context, and split the next run before the transcript grows again.
Related Articles¶
- OpenAI Codex 0.48.0 release notes (GitHub)
- GitHub issue: "Codex ran out of room in the model's context window"
- GitHub issue:
compactdoes nothing - OpenAI Codex 0.39.0 update summary
- OpenAI Codex CLI comprehensive guide
- Codex CLI auto-approval mode deep dive
- Complete Guide to Resolving Codex CLI Re-connecting Loop Issue
- Codex CLI Pricing Complete Guide
- Codex CLI Session Management Best Practices