Claude Sonnet 4.6 Release — Opus-Level Performance at Nearly Half the Cost¶
Audience: Intermediate engineers tracking AI development tool trends
Key Points¶
- Near-Opus benchmarks SWE-bench 79.6%, OSWorld 72.5% — nearly matching the flagship model
- Same price as Sonnet 4.53/15 per MTok — nearly half the cost of Opus 4.6
- 1M context window First Sonnet-class model with 1 million token context (beta)
Claude Sonnet 4.6 Benchmarks — A Solid Step Up from Sonnet 4.5¶
Sonnet 4.6 outperforms Sonnet 4.5 across nearly every benchmark, and even surpasses Opus 4.6 in a few. In early Claude Code testing, roughly 70% of users preferred Sonnet 4.6 over Sonnet 4.5, and 59% chose it over Opus 4.5 (released Nov 2025).
The standout is ARC-AGI-2. It jumped from Sonnet 4.5's 13.6% to 60.4% — a dramatic leap that signals a step change in general reasoning capability.
| Benchmark | Sonnet 4.6 | Sonnet 4.5 | Opus 4.6 |
|---|---|---|---|
| SWE-bench Verified | 79.6% | 77.2% | 80.8% |
| OSWorld-Verified | 72.5% | 61.4% | 72.7% |
| Terminal-Bench 2.0 | 59.1% | 51.0% | 65.4% |
| GPQA Diamond | 89.9% | 83.4% | 91.3% |
| ARC-AGI-2 (high effort) | 60.4% | 13.6% | 68.8% |
| Finance Agent (max) | 63.3% | — | 60.1% |
On the Finance Agent benchmark (63.3% vs 60.1%), Sonnet 4.6 is the only benchmark where it outperforms Opus 4.6. Its agentic capabilities are breaking out of the Sonnet tier.
So what specifically got better?
Coding — Less Over-Engineering, More Consistency¶
The biggest improvement is coding reliability over long sessions. Early enterprise testers report "significantly less overengineering" and "better multi-step consistency."
GitHub confirmed scaling improvements in complex bug fixes across large codebases. Cognition improved bug detection scaling. Replit and Cursor both reported quality gains in long-horizon reasoning tasks.
The classic Sonnet weakness — quality degradation during extended coding sessions — has been meaningfully reduced. It's now a more trustworthy daily development partner.
Computer Use — Matching Opus on OSWorld¶
OSWorld 72.5% is virtually identical to Opus 4.6's 72.7%. The 11-point jump from Sonnet 4.5's 61.4% represents a dramatic improvement in computer use capability.
Specifically, complex spreadsheet navigation and multi-tab web form completion are approaching human-level accuracy. An insurance workflow automation test achieved 94% accuracy (Pace's evaluation).
Prompt injection resistance has also been raised to Opus 4.6 levels. You can choose Sonnet for cost reasons without compromising on security.
1M Context — A Sonnet First¶
The context window expanded from 200K to 1M tokens (beta). This is the first time a Sonnet-class model supports this. Use the context-1m-2025-08-07 beta header in API requests.
Entire codebases, lengthy contracts, and multiple research papers can be loaded and reasoned over simultaneously. In the Vending-Bench Arena business simulation, Sonnet 4.6 autonomously developed a strategy of early aggressive investment followed by a late-stage profitability pivot, outperforming competitor AIs.
Note that long-context pricing applies for requests exceeding 200K tokens.
Adaptive Thinking — Flexible Reasoning Cost Control¶
Sonnet now supports Adaptive Thinking for the first time. Previously, this was an Opus 4.6-only feature.
The effort parameter (low / medium / high / max) controls reasoning token consumption based on task difficulty. Use lightweight mode for simple tasks and max effort for complex analysis.
Pricing and Availability¶
Pricing is identical to Sonnet 4.5. About 60% of Opus 4.6's cost (5/25).
| Spec | Sonnet 4.6 | Opus 4.6 |
|---|---|---|
| Input | $3 / MTok | $5 / MTok |
| Output | $15 / MTok | $25 / MTok |
| Context | 200K (1M beta) | 200K (1M beta) |
| Max output | 64K tokens | 128K tokens |
| Knowledge cutoff | Aug 2025 | May 2025 |
| Training data | Jan 2026 | Aug 2025 |
Model ID: claude-sonnet-4-6. Prompt caching offers up to 90% savings; batch API provides 50% off.
Available on:
- claude.ai — Now the default for Free and Pro plans
- Claude Code — Select via
/model claude-sonnet-4-6 - API — Immediately available
- Amazon Bedrock / Google Vertex AI / Microsoft Foundry
Summary — Where Does Opus Still Win?¶
- SWE-bench 79.6% and OSWorld 72.5% put it neck-and-neck with Opus 4.6
- ARC-AGI-2 jumped from 13.6% to 60.4%, a dramatic leap in general reasoning
- Same pricing (3/15) with 1M context + Adaptive Thinking support
With Sonnet 4.6's rise, the range of tasks that "only Opus can handle" is narrowing fast. Opus 4.6's remaining value is concentrating in areas where depth of reasoning is paramount — large-scale refactoring, multi-agent coordination, and decisions that cannot afford to be wrong. For cost-conscious use cases, Sonnet 4.6 is becoming the clear first choice.