Skip to content

OpenAI / ChatGPT Guide Hub

February 2026 Latest Update

GPT-5.3-Codex (latest) includes improved context window handling. This is an updated version with the latest workarounds for Codex CLI v0.101.0+.

Fix Codex Ran Out of Room in Context Window (2026)

When Codex CLI says "ran out of room in the model's context window", "start a new thread or clear earlier history before retrying", or "your input exceeds the context window", fix it in 30 seconds: start a fresh session, selectively restore with codex resume --last, then try /compact as fallback.

Target Audience

  • Codex CLI beginners (less than 1 month experience)
  • Anyone blocked by "ran out of room" or "context window exceeded" errors
  • Users experiencing "stream disconnected" issues

Key Points

  • Start a new Codex session first when the context window is full.
  • Use codex resume --last only to recover the turns you still need.
  • Treat /compact as a fallback, then split future work into smaller chunks.

Quick Fix: 3 Officially Aligned Steps

Follow the order recommended by the CLI message and documentation: reset the session first, then selectively restore history.

Step 1: Start a fresh session

# Exit the current session with Ctrl+C
# Then relaunch Codex
codex

ℹ️ Why this is first - The error itself tells you to "Start a new conversation or clear earlier history before retrying." (openai/codex#4926) - Clearing the buffer restores the full context window and eliminates most overflow retries.

Step 2: Restore only what you need with codex resume --last

# Resume the most recent session
codex resume --last

💡 Documented command The official "Getting started" guide lists codex resume --last as the supported way to reopen your previous session (docs/getting-started.md). Use it after step 1 when you need context snippets without bringing back the entire transcript.

Step 3: Try /compact (if it still fails, go back to step 1)

# Inside the Codex prompt
/compact

⚠️ Known limitation - /compact summarizes history in place but remains unreliable in current builds, as tracked in openai/codex#4868. - If nothing changes, restart the session instead of retrying endlessly.

Understanding the Root Cause

Here's why these errors occur, explained for beginners.

What Is a Context Window?

Think of it as temporary "working memory" for AI conversations. It stores your chat history and file contents. When this memory fills up, errors occur.

Common Causes

  1. Conversations get too long
  2. Extended sessions accumulate chat history, filling memory

  3. Too many files loaded

  4. Reading many project files at once quickly fills memory

  5. Connection interruption (stream disconnected)

  6. When AI thinks for a long time, the network assumes it's unresponsive and cuts the connection

Prevention Tips

Build these habits to avoid future errors:

  • Start fresh every 30 minutes: Reset regularly during long work sessions
  • Minimize file selection: Load only necessary files
  • Split large edits: Divide multi-file editing across sessions

🆕 Nov 2025: Auto-Approval-Friendly Playbook

ℹ️ Context window overflow is a frequently encountered issue, so we baked the recurring fixes into a reusable playbook for auto-approval runs.

StepWhat to doPurpose
1. Snapshot the baselinecodex --full-auto --transcript analysis/20251108_context.jsonl at session startCapture before/after diffs
2. Chunk via TodoWriteBreak the TodoWrite checklist into ≤3 bullets per Codex turnPrevent prompt bloat
3. Recovery macroKeep codex resume --last && /compact in a shell alias for instant retriesShrink recovery time
# context-window-safe.sh
codex --full-auto --transcript analysis/$(date +%Y%m%d)_context.jsonl <<"TASKS"
1. TodoWrite: <paste task URL>
2. Execute step A (<= 3 checklist bullets)
3. Attach transcript path back to TodoWrite
TASKS

Canary rule with TodoWrite

Run gh issue comment <ISSUE> --body "Progress: step N completed" after each chunk: it tells teammates which prompts already ran, avoiding duplicate submissions that inflate the context window.

Latest updates (as of 2025-10-23)

  • Codex CLI 0.48.0 shipped on 2025-10-23 with a tokenizer-backed truncation path for unified_exec, which reduces surprises where the reported remaining tokens diverge from the enforced limit (release notes, PR #5514).
  • The same release adds a local tokenizer and richer event stream updates, helping history management and disconnect recovery (PR #5508, PR #5470).
  • Windows/WSL users should follow the refreshed instructions from PR #5307; if transport errors persist, upgrade first and then audit VPNs, antivirus filters, or HTTP compression on the network path.

Troubleshooting Guide

SymptomImmediate ActionDetails
/compact doesn't workTry Method 1 (fresh session)Version-dependent functionality issues
stream disconnectedTry Method 1 (fresh session)Resets the connection
Frequent errorsSplit tasks + regular resetsToo much work in one session
ran out of roomTry Method 1 (fresh session)Context is full

Advanced: Technical Details

What changed by 2025-10-24 (Click to expand)
  • Codex CLI 0.48.0 (released 2025-10-23) now trims unified_exec with the real tokenizer, reducing drift between reported and enforced token budgets (PR #5514).
  • The same release adds local tokenizer support, richer event streams, and TUI polish, making it easier to recover from disconnects and review history (PR #5508, PR #5470).
  • OpenAI's official docs emphasise the Responses API Background mode for long jobs, so clients don't need to hold SSE connections indefinitely (guide).
  • Windows/WSL transport errors remain active on GitHub; upgrade to 0.48.0, apply the refreshed WSL instructions (PR #5307), and audit VPN/antivirus/HTTP compression along the path.
Current High-Frequency Triggers (Click to expand)
  1. SSE idle timeouts (Cloudflare 100 s, ALB 60 s defaults)
  2. Long reasoning yields no SSE events; middleboxes terminate the connection; CLI surfaces stream disconnected.
  3. Response-stream decode failures
  4. Transport error: error decoding response body appears on Windows/WSL or extended sessions, often when compression/inspection intermediaries interfere.
  5. TPM/RPM rate limits masked as disconnects
  6. Actual 429 responses feel like stream drops when retries fire.
  7. Context-window overflow
  8. GPT-5-Codex-class models expose 400k-token windows, yet measurement drift or failed compression still push requests past their limits.
Default Idle Timeouts Across Middleboxes (Click to expand)
Device / ServiceDefault idleNotes
Cloudflare (Free/Pro)~100 s524 after 100 s. Consider bypass, Workers, or higher tiers for SSE.
AWS Application Load Balancer60 s (configurable 1–4000 s)Raise to 180–300 s or more for SSE workloads.
Azure Application Gateway for Containers300 s (5 min)Pair with keep-alive comments.
Enterprise proxies / security gatewaysProduct-dependentHTTP inspection/compression can corrupt SSE chunks.
Error Fragments: Root Cause and Mitigations (Click to expand)

idle timeout waiting for SSE / stream closed before response.completed

  • Root cause: long silent phases plus middlebox idle policies.
  • Mitigations:
  • Move long jobs to Responses API Background mode and avoid clinging to SSE streams (guide).
  • Increase idle limits to 180–300 s+ (ALB via idle_timeout.timeout_seconds).
  • For Cloudflare, bypass the edge, use Workers/higher plans, or send keep-alive comments with the dummy payload below.
    :
    
  • Update to the latest Codex CLI (e.g. 0.48.0) to leverage the tokenizer fixes and built-in responses-api-proxy.

Transport error: error decoding response body

  • Root cause: chunk corruption, compression/inspection middleboxes, or long-lived connections.
  • Mitigations: stay on the latest CLI, disable HTTP compression/inspection, route via responses-api-proxy, retry with exponential backoff.

Request too large for ... / This request is over the organization TPM/RPM

  • Root cause: rate limits or token ceilings.
  • Mitigations: shrink inputs (summaries, tighter file selection, staged workflows), constrain concurrency, pursue higher quotas, retry with backoff.

Your input exceeds the context window

  • Root cause: actual overflow or mismatch between estimated and enforced limits.
  • Mitigations: rely on the latest CLI's tokenizer-backed truncation plus /status, tighten include/exclude, and split sessions when needed.
Codex CLI 0.48.0 highlights (2025-10-23) (Click to expand)
  • Tokenizer-backed truncation for unified_exec keeps reported and enforced token budgets in sync (PR #5514).
  • Local tokenizer & richer events improve history visibility and reconnect flows (PR #5508, PR #5470).
  • WSL guidance refresh documents fixes for Windows transport issues (PR #5307).
  • responses-api-proxy remains bundled for a stable, officially supported SSE tunnel.
Why Background Mode Matters (Click to expand)
  • Background mode hands long-running work to the Responses API asynchronously, while clients poll or accept webhooks.
  • Without a persistent SSE connection, Cloudflare/ALB idle policies become largely irrelevant.
  • Repeated heavy tasks no longer require the client to maintain a fragile live stream.
Input and Rate-Limit Hygiene (Click to expand)
  • Use include/exclude to drop unnecessary files and compress prompts between stages.
  • Break large workflows into stages and merge results later to conserve context.
  • Lower reasoning levels or prompt for intermediate outputs to reduce long silence periods.
  • Pair TPM/RPM monitoring with exponential backoff and deliberate job pacing.
Windows / WSL Transport Errors (Click to expand)
  • Public issues still report Transport error: error decoding response body. Network compression, antivirus scanning, or VPN hops are frequent contributors.
  • Combine the latest CLI, neutralized middleboxes, and backoff strategies; avoid claiming a single definitive cause.
Immediate checklist (2025-10-24 edition) (Click to expand)
  1. Upgrade Codex CLI to the latest (0.48.0 or newer) via npm i -g @openai/codex@latest, brew upgrade codex, or the release binaries.
  2. Shift long tasks to Background mode to minimize SSE exposure.
  3. Tune middlebox idle/keep-alive settings (Cloudflare 100 s constraint, ALB ≥180 s, Azure AGW 5 min).
  4. Adopt responses-api-proxy or disable compression/inspection on intermediate proxies.
  5. Enforce input compression and staged execution, adjust reasoning levels when silence becomes dominant.
  6. Harden rate-limit hygiene: cap concurrency, record usage, apply exponential backoff.
  7. Monitor logs via tail -F ~/.codex/log/codex-tui.log and correlate with Request IDs.
Appendix: The 32–33% Context-Window Issue (Click to expand)
  • Edge cases where requests fail despite reported headroom still exist. CLI 0.48.0's tokenizer work reduces frequency but has not eliminated them entirely.
  • Practical guidance: monitor /status, treat ~80% remaining context as a soft ceiling, use /compact when it behaves, restructure prompts, or start a fresh session.

FAQ

Why does Codex say "ran out of room in the model's context window"?

Your session history exceeded the available context. Exit with Ctrl+C, relaunch Codex, and pull back only the needed turns with codex resume --last.

How do I keep auto-approval runs from overloading the context window?

Split the TodoWrite checklist into 3 or fewer bullet chunks, send them one at a time, and attach each transcript file under analysis/ so Codex never sees the entire backlog at once.

What should I do when "codex re-connecting..." appears during long runs?

Capture the transcript, restart the session, and follow the reconnecting issue guide to ensure approvals stay on -a never.

Summary

Codex context-window errors are usually session-size problems, not a reason to keep retrying the same prompt. Restart fresh, restore only the useful context, and split the next run before the transcript grows again.