Claude Opus 4.6 Released - Terminal-Bench 65.4% Score

Claude Opus 4.6 Released: Anthropic's Smartest Model Yet Scores 65.4% on Terminal-Bench

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring significant improvements in agentic coding, a 1M token context window, and state-of-the-art performance across multiple benchmarks.

Key Improvements

The new model builds on its predecessor's coding capabilities with several notable upgrades:

•Improved agentic planning: Opus 4.6 breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with "real precision," according to Anthropic's release notes.
•1M token context window: For the first time in an Opus-class model, users can access a 1M token context window in beta.
•Sustained agentic tasks: The model can operate reliably in larger codebases for longer periods without degradation.
•Better self-debugging: Opus 4.6 has improved code review and debugging skills to catch its own mistakes.

Benchmark Results

Opus 4.6 achieves top scores across multiple evaluations:

•Terminal-Bench 2.0: 65.4% - highest score among all models on this agentic coding evaluation
•Humanity's Last Exam: Leads all frontier models on this complex multidisciplinary reasoning test
•GDPval-AA: Outperforms the industry's next-best model (OpenAI's GPT-5.2) by approximately 144 Elo points on economically valuable knowledge work tasks
•BrowseComp: Best performance on locating hard-to-find information online

"Opus 4.6 is the strongest model Anthropic has shipped," the company stated. "It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work even when the task is ambitious."

New Developer Features

Anthropic also announced several new capabilities for developers:

•Claude Code agent teams: Developers can now assemble multiple Claude agents to work on tasks collaboratively
•Adaptive thinking: The model can pick up on contextual clues about how much extended thinking to apply
•Effort controls: New parameter lets developers dial effort between high, medium, and low to balance intelligence, speed, and cost
•Context compaction: API users can summarize context mid-task to perform longer-running operations

New Product Integrations

The release also includes expanded productivity integrations:

•Claude in Excel: Substantial upgrades to working with spreadsheets
•Claude in PowerPoint: New research preview for presentation creation
•Cowork: Claude can now multitask autonomously on your behalf

Pricing and Availability

Claude Opus 4.6 is available today on claude.ai, the Anthropic API, and all major cloud platforms. Pricing remains unchanged at $5 input / $25 output per million tokens.

Claude Opus 4.6 Released: Anthropic's Smartest Model Yet Scores 65.4% on Terminal-Bench

Claude Opus 4.6 Released: Anthropic's Smartest Model Yet Scores 65.4% on Terminal-Bench

Key Improvements

Benchmark Results

New Developer Features

New Product Integrations

Pricing and Availability

Sources

Share this article

About AI News Team

Related Articles

Claude Sonnet 4.6: Anthropic's Mid-Tier Model Now Matches Flagship Opus at One-Fifth the Cost

Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6

GLM-5.1 Review: Z.ai's Coding Model Hits 94.6% of Claude Opus 4.6 for $3/Month

OpenAI Codex Spark Breaks Speed Records With Cerebras Chip As AI Race Heats Up

Claude Opus 4.6 Launch: Agent Teams, 1M Context, and New Controls for Long-Horizon Work