Why Claude Code Burns Through Tokens So Fast — 3 Causes and the Cache Bug Confirmed by a Source Code Leak¶
For / Key Points
For: Developers and team leads experiencing abnormal Claude Code usage drain, or those who want to understand the risk going forward
Key Points:
- Three factors converged simultaneously: end of a 2x promotion, intentional peak-hour throttling, and a prompt-cache bug
- The cache bug inflates costs 10-20x, exhausting even the Max $100/month plan in 1-2 hours
- The March 31 source code leak confirmed the technical root cause — the attestation system and anti-distillation mechanism
What Is Happening¶
Around March 23, 2026, reports began flooding Reddit, GitHub, and Discord that Claude Code was consuming tokens at an abnormal rate.
The symptoms are severe. Max 5x (100/month) users who previously got 8 hours of work from a session are now hitting their limit in 1 hour. Max 20x (200/month) users have reported usage jumping from 21% to 100% in a single prompt1. Pro plan ($20/month) users have seen their limit reached after just 3 prompts.
Anthropic has acknowledged the situation. They officially stated that "limits are being consumed far faster than expected" and that it is being investigated as a top priority2.
However, this is not a single-cause problem. Three distinct factors converged in the same week, amplifying the perceived impact.
Timeline of Events¶
When did each factor emerge?
| Date | Event |
|---|---|
| 3/13 | Off-peak 2x promotion launched (all plans) |
| 3/23 | Reports of abnormal token consumption spike |
| 3/26 | Anthropic announces peak-hour limit tightening |
| 3/28 | 2x promotion ends |
| 3/31 | Source code leaks via npm. Cache bug existence confirmed |
Let's examine each factor in order.
Factor 1: End of the 2x Promotion¶
From March 13 to 28, Anthropic ran a "Spring Break" promotion that doubled usage allowances during off-peak hours (weekdays outside ET 8am-2pm, plus all weekends) for all plans3.
There was strategic context behind this. In late February, the OpenAI-Pentagon contract controversy triggered a mass exodus from ChatGPT, and Claude reached #1 on the App Store's free apps chart for the first time in early March. The promotion aimed to retain this influx of new users while boosting off-peak GPU utilization.
The problem came after it ended. Users who had grown accustomed to 2x capacity over two weeks reverted to normal limits on March 29. Because the doubled allowance had become their baseline, the return to normal felt like a restriction being imposed4.
This is not a bug — it was a scheduled change announced in advance. However, its timing overlapping perfectly with the next two factors deepened the confusion.
Factor 2: Intentional Peak-Hour Throttling¶
On March 26, Anthropic engineer Thariq Shihipar officially announced the following5:
During peak hours (PT 5am-11am / ET 8am-2pm / JST 9pm-3am on weekdays), the 5-hour session limit will be consumed faster. The weekly total limit remains unchanged.
In other words, the same prompt costs more during morning hours. Anthropic estimated that roughly 7% of users would be affected.
For users in Japan, PT 5am-11am corresponds to JST 9pm-3am (March uses PDT = UTC-7). Unless you work late at night, the direct impact is minimal. However, global teams and always-on automation pipelines should take note.
Factor 3: The Prompt Cache Bug (The Core Issue)¶
Why does the same workload consume 10-20x more tokens? Of the three factors, this one has the largest impact.
How Caching Works¶
Claude Code sends an API request with every conversation turn. The payload includes the system prompt, tool definitions, the full conversation history, and the current message. In a long session, this can exceed 200,000 tokens.
Processing 200,000 tokens from scratch every turn would be ruinously expensive. This is where prompt caching comes in. Content that matches the previous request is read from cache (at 1/10th the cost), and only the new portion is processed6.
The diagram below illustrates the difference between normal operation and the bug.
graph LR
subgraph normal["Normal - Cache working correctly"]
A1["Turn 1"] -->|"Full initial processing"| B1["cache_create large"]
B1 --> C1["$0.15"]
A2["Turn 2"] -->|"Previous content from cache"| B2["cache_read large"]
B2 --> C2["$0.05"]
A3["Turn 3"] -->|"Growing history uses cache"| B3["cache_read larger"]
B3 --> C3["$0.02"]
end
subgraph bug["Bug - Cache breaks every turn"]
X1["Turn 1"] -->|"Full initial processing"| Y1["cache_create large"]
Y1 --> Z1["$0.15"]
X2["Turn 2"] -->|"Cache invalid, full reprocessing"| Y2["cache_create large"]
Y2 --> Z2["$0.15"]
X3["Turn 3"] -->|"Longer history, full processing"| Y3["cache_create even larger"]
Y3 --> Z3["$0.40"]
endUnder normal conditions, costs are "high for the first turn, then increasingly cheaper." When the bug strikes, "every turn is treated as the first, and costs grow as the conversation lengthens." This is what drives the 10-20x cost inflation.
The Claude Code team reportedly monitors cache hit rate as a SEV (critical incident) level metric. The product's economics are fundamentally built on caching.
Real Data Shows the Breakdown¶
GitHub Issue #34629 contains a quantitative reproduction report comparing the same workload across different versions7.
v2.1.68 (working correctly):
| Turn | cache_read | cache_create | Cost |
|---|---|---|---|
| 1 | 13,997 | 22,946 | $0.15 |
| 2 | 32,849 | 4,636 | $0.05 |
| 3 | 36,846 | 879 | $0.03 |
| 4 | 37,295 | 802 | $0.02 |
cache_read (tokens read cheaply from cache) grows with each turn. Conversely, cache_create (tokens requiring new processing) drops sharply. By Turn 4, only 802 tokens needed fresh processing, bringing the cost down to $0.02.
v2.1.76 (buggy) completely breaks this pattern. cache_read stays fixed at roughly 14,500 tokens (the system prompt) and never grows. The conversation history is reprocessed as "new data" every turn. As conversations grow longer, cache_create keeps increasing, and costs balloon from $0.04 to $0.40.
This bug is model-independent. The same Opus 4.6 works fine on v2.1.68 but breaks on v2.1.76. It is a clear version-dependent regression.
What the Source Code Leak Revealed¶
Why does the cache break? On March 31, a build configuration error in the npm package accidentally published Claude Code's entire source code (TypeScript, approximately 510,000 lines)8. Anthropic pulled it within hours, but it had already been widely mirrored.
Developers who analyzed the leaked code identified two mechanisms that could cause cache invalidation.
Candidate Cause 1: Attestation Data That Changes Every Request¶
Claude Code generates and attaches attestation data to every API request — proof that "this request was sent from the legitimate CLI"9. This is an anti-abuse mechanism.
The problem is that this attestation data contains values that differ with every request. Cache matching is based on a hash of the prefix (content from the beginning). When the attestation data changes, the hash changes too, causing the previous cache to be treated as "something different" and discarded.
A mechanism designed to protect legitimate usage may have been destroying the cache as a side effect.
Candidate Cause 2: Silent Injection of Fake Tools¶
A flag called ANTI_DISTILLATION_CC was also discovered. When enabled, the API silently injects fake tool definitions into the system prompt. The purpose is to thwart "distillation attacks" where competitors intercept Claude Code's traffic and use it for model training10.
Since tool definitions are part of the cache prefix, if the injected fake tools vary between requests, the same logic causes cache invalidation. This flag is controlled by a feature flag, so it is unclear whether it is always active for all users — but for users where it is enabled, it could be a direct cause of cache destruction.
A Bug That Stayed Hidden Because the Code Was Closed¶
These issues had been flagged as behavioral anomalies by external developer groups before the code was public. However, because the source was closed, they couldn't pinpoint exactly where the bug existed.
Once the leak made 510,000 lines of code readable, the candidate causes were identified within hours. Structural quality issues also became visible — zero tests for 64,464 lines of code, a single function spanning 3,167 lines with 486 branch points9.
Anthropic had previously taken legal action against a developer who reverse-engineered Claude Code. Now, a build configuration mistake exposed the entire codebase, and the result was accelerated discovery of bugs that were actively harming users. The intellectual property they wanted to protect by keeping the code closed was, precisely because it was closed, delaying the discovery of quality problems — a case study that adds a new dimension to the open-source debate.
What Users Can Do¶
The root fix requires a patch from Anthropic, but here are measures that have been reported as effective.
Immediate actions:
- Shift work to off-peak hours. In JST, peak hours are PT 5am-11am = JST 9pm-3am. Working during daytime in Japan avoids the throttling penalty
- Keep sessions short. Use
/clearto start fresh sessions, minimizing the impact of cache breakage from context accumulation - Compress conversations with
/compact. Prevents context window bloat and reduces token consumption - Default to Sonnet. Opus costs roughly 5x more per token. Switch via
/modeland reserve Opus for complex architectural decisions only
Medium-term actions:
- Consider pinning your version. Pinning to a version where caching works correctly (reports suggest v2.1.68 or earlier) has improved the situation for some users. Note that this is unsupported, so proceed at your own risk
- Avoid the
--resumeflag. Session resumption triggers full context reprocessing, maximizing exposure to the cache bug - Consider switching to API key billing. Moving from the subscription's opaque quota to pay-as-you-go billing lets you track consumption precisely
What This Problem Tells Us¶
This incident became severe because three different layers — a marketing campaign, infrastructure constraints, and a software bug — converged in the same week. Any single factor might have been tolerable, but their simultaneous occurrence created a situation where Max $200/month users were effectively getting free-tier levels of usage.
The unintentional code publication led to root-cause identification within hours for a bug that would have remained hidden far longer under closed-source conditions. Claude Code runs 510,000 lines of production code with zero tests, revealing a structural quality assurance gap. The foundation model (Opus 4.6) is among the best in the industry, but the CLI wrapping it has become the bottleneck.
The takeaway is straightforward. Infrastructure reliability is a separate layer from model performance. Now that Claude Code has become an indispensable developer tool, its quality assurance mechanisms — testing, cache monitoring, release processes — are under scrutiny. A patch for the cache bug is likely coming soon, but as of March 31, no official timeline has been provided. Tracking GitHub Issue #34629 is recommended.
Related Articles¶
- How to Fix the Claude Code 401 Authentication Error
- Claude Code x Codex CLI Review Loop Automation
- Examining Claude Code's Security Vulnerabilities
MacRumors – Claude Code Users Report Rapid Rate Limit Drain, Suspect Bug (March 26, 2026) ↩
The Register – Anthropic admits Claude Code quotas running out too fast (March 31, 2026) ↩
LaoZhang AI Blog – Claude Code Max Quota Consumption Abnormal? (March 31, 2026) ↩
GitHub Issue #34629 – Prompt cache regression in --print --resume since v2.1.69 (March 2026) ↩
VentureBeat – Claude Code's source code appears to have leaked (March 31, 2026) ↩
DEV Community – We Reverse-Engineered 12 Versions of Claude Code (March 31, 2026) ↩↩
Alex Kim's blog – The Claude Code Source Leak (March 31, 2026) ↩