Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6
Anthropic's Claude Opus 4.7 is now available with major gains in agentic coding, high-resolution vision, and a new xhigh effort level. Full benchmarks, pricing, migration guide, and hands-on findings.
Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6
Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6
Last Updated: April 16, 2026 | Reading Time: 12 minutes | Trend Alert: 🔥 Just Released
On April 16, 2026, Anthropic released Claude Opus 4.7, its most capable generally available model to date. The model delivers a step-change improvement in agentic coding over Opus 4.6, introduces high-resolution image support (up to 3.75 megapixels), and ships with a new "xhigh" effort level for fine-grained control over reasoning depth vs. cost.
This is not a minor point release. Opus 4.7 represents a meaningful architecture update with a new tokenizer, removed legacy sampling parameters, and adaptive thinking as the only reasoning mode. If you're using Opus 4.6 in production, this post covers everything you need to know before upgrading.
What's New in Claude Opus 4.7
1. Agentic Coding Performance
The headline improvement is in software engineering, particularly for long-running autonomous tasks. Opus 4.7 was designed for the hardest coding work: the kind where you hand a model a complex task and let it run.
Key findings from Anthropic's early-access testers:
| Tester | Result |
|---|---|
| Cursor (CursorBench) | 70% pass rate vs. Opus 4.6's 58% |
| CodeRabbit | 10%+ recall improvement in code review, stable precision |
| Rakuten (Rakuten-SWE-Bench) | 3x more production tasks resolved than Opus 4.6 |
| Notion | 14% improvement over Opus 4.6, 1/3 fewer tool errors |
| Factory | 10-15% lift in task success for autonomous engineering |
| Vercel | More correct and complete one-shot coding; no regressions |
Replit's President Michele Catasta noted that Opus 4.7 achieves "the same quality at lower cost" compared to Opus 4.6 for tasks like log analysis, bug finding, and fix proposals.
Hex's co-founder Caitlin Colgrove put it plainly: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." That's a direct cost efficiency win.
2. High-Resolution Vision (3.75MP)
Opus 4.7 is the first Claude model with high-resolution image support. The maximum resolution jumped from 1,568px (1.15MP) to 2,576px (3.75MP) on the long edge.
This matters for:
- •Computer-use agents reading dense UI screenshots
- •Document extraction from complex diagrams and charts
- •Pixel-perfect coordinate mapping (no more scale-factor math; coordinates are 1:1 with actual pixels)
- •Scientific and medical imaging workflows
XBOW reported a massive jump on their visual-acuity benchmark: 98.5% for Opus 4.7 vs. 54.5% for Opus 4.6. Solve Intelligence noted improved multimodal understanding for chemical structures and technical diagrams.
The tradeoff: high-res images use more tokens. Downsample if you don't need the detail.
3. New `xhigh` Effort Level
Opus 4.7 introduces a new effort level between high and max, called xhigh. The effort parameter controls the tradeoff between reasoning depth and token spend.
Anthropic recommends:
| Effort Level | Use Case |
|---|---|
low / medium | Quick tasks, simple queries |
high | Most intelligence-sensitive tasks (minimum recommended) |
xhigh | Coding and agentic use cases (new default for Claude Code) |
max | Maximum reasoning, highest cost |
This is Messages API only. Claude Managed Agents handles effort automatically.
4. Task Budgets (Beta)
A new feature that gives Claude an advisory token budget for the full agentic loop, including thinking, tool calls, and final output. The model sees a running countdown and prioritizes work accordingly.
response = client.beta.messages.create(
model="claude-opus-4-7",
max_tokens=128000,
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 128000},
},
messages=[
{"role": "user", "content": "Review the codebase and propose a refactor plan."}
],
betas=["task-budgets-2026-03-13"],
)
Key points:
- •This is an advisory cap, not a hard limit (unlike
max_tokens) - •Minimum value: 20k tokens
- •Best for workloads where you need the model to scope its own work
- •For open-ended tasks where quality matters over speed, skip the budget
5. Improved Memory and Knowledge Work
Opus 4.7 is better at maintaining and using file-system-based memory across long sessions. If your agent uses a scratchpad or notes file, Opus 4.7 should improve at writing useful notes and leveraging them in subsequent tasks.
Knowledge work improvements include:
- •Better
.docxredlining and.pptxediting with self-verification - •Improved chart and figure analysis with programmatic tool-calling
- •Harvey's BigLaw Bench: 90.9% at high effort with better legal reasoning calibration
- •Bloomberg's research-agent benchmark: top overall score at 0.715, best long-context consistency
Benchmarks: Opus 4.7 vs. Opus 4.6 vs. GPT-5.4 vs. Gemini 3.1 Pro
Anthropic published a comprehensive benchmark table comparing Opus 4.7 against Opus 4.6, Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Here are the key results from the official announcement:
Coding Benchmarks
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | State-of-art | Below | Below | Below |
| SWE-bench Pro | Leading | Below | Below | Below |
| SWE-bench Multilingual | Leading | Below | Below | Below |
| Terminal-Bench 2.0 | Best | Below | Below | Below |
| CursorBench | 70% | 58% | - | - |
| Rakuten-SWE-Bench | 3x Opus 4.6 | Baseline | - | - |
Vision Benchmarks
| Benchmark | Opus 4.7 | Opus 4.6 |
|---|---|---|
| XBOW Visual Acuity | 98.5% | 54.5% |
| SWE-bench Multimodal | Leading | Below |
| Image Localization | Improved | Baseline |
Knowledge Work Benchmarks
| Benchmark | Opus 4.7 | Opus 4.6 |
|---|---|---|
| Harvey BigLaw Bench | 90.9% | Below |
| GDPval-AA | State-of-art | Below |
| Finance Agent Eval | State-of-art | Below |
| Research Agent (Bloomberg) | 0.715 | 0.767 (Finance: 0.813 vs 0.767) |
| Databricks OfficeQA Pro | 21% fewer errors | Baseline |
Safety
Opus 4.7 shows a similar safety profile to Opus 4.6 with improvements in:
- •Honesty metrics
- •Resistance to prompt injection attacks
- •Overall misaligned behavior score (modest improvement over Opus 4.6)
It ships with new real-time cybersecurity safeguards that detect and block prohibited or high-risk requests. Security professionals can apply to the Cyber Verification Program for legitimate use cases.
Pricing and Availability
| Feature | Details |
|---|---|
| Input | $5 / million tokens |
| Output | $25 / million tokens |
| Context Window | 1M tokens (no long-context premium) |
| Max Output | 128k tokens |
| API Model ID | claude-opus-4-7 |
| AWS Bedrock | anthropic.claude-opus-4-7 |
| GCP Vertex AI | claude-opus-4-7 |
Pricing is identical to Opus 4.6. However, note the new tokenizer uses roughly 1.0-1.35x as many tokens for the same input text. Your bill may increase on text-heavy workloads even at the same per-token price.
Available on: Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and all Claude consumer products.
Breaking Changes for Developers
If you're upgrading from Opus 4.6, these changes require code updates:
1. Extended Thinking Budgets Removed
Setting thinking: {"type": "enabled", "budget_tokens": N} will return a 400 error. The only thinking-on mode is now adaptive:
# Before (Opus 4.6)
thinking = {"type": "enabled", "budget_tokens": 32000}
# After (Opus 4.7)
thinking = {"type": "adaptive"}
output_config = {"effort": "high"}
Adaptive thinking is off by default. You must set it explicitly to enable it.
2. Sampling Parameters Removed
Setting temperature, top_p, or top_k to any non-default value returns a 400 error. Remove these parameters entirely and use prompting to control behavior.
3. Thinking Content Omitted by Default
Thinking blocks still stream, but their content field is empty unless you opt in:
thinking = {
"type": "adaptive",
"display": "summarized", # opts back in to visible thinking
}
Without this, your users will see a long pause before output begins during thinking.
4. New Tokenizer
The updated tokenizer uses 1.0-1.35x more tokens for the same text. Update your max_tokens parameters and compaction triggers to account for this.
Opus 4.7 vs. Opus 4.6: What Actually Changed
Beyond the benchmarks, here are the behavioral differences that Anthropic highlighted:
- •More literal instruction following. Opus 4.7 won't silently generalize instructions or infer requests you didn't make. This can break prompts written for looser models.
- •Response length calibrates to task complexity rather than defaulting to a fixed verbosity.
- •Fewer tool calls by default, using more reasoning. Raising effort increases tool usage.
- •More direct, opinionated tone with less validation-forward phrasing and fewer emoji than Opus 4.6.
- •More regular progress updates during long agentic traces. Remove any scaffolding you added to force interim status messages.
- •Fewer subagents spawned by default, steerable through prompting.
Anthropic's advice: re-tune your prompts. Old prompts that relied on loose interpretation may produce unexpected results with Opus 4.7's more literal reading.
Also Launching Today
Alongside Opus 4.7, Anthropic announced:
- •Claude Code
/ultrareview: A dedicated review session that reads through changes and flags bugs and design issues. Pro and Max users get 3 free ultrareviews. - •Auto mode for Max users: Claude makes decisions on your behalf, enabling longer autonomous runs with fewer interruptions.
- •Default effort level raised to
xhighin Claude Code for all plans.
Should You Upgrade?
For coding and agentic workflows, the answer is clearly yes. The benchmark data from Cursor, CodeRabbit, Rakuten, Notion, and Vercel all point to a meaningful step up. Hex's finding that low-effort Opus 4.7 matches medium-effort Opus 4.6 makes the economics compelling.
For vision-heavy workflows, the jump from 1.15MP to 3.75MP and XBOW's 98.5% visual acuity score make this a clear upgrade.
The caveats:
- •The new tokenizer means 0-35% more tokens per request
- •Breaking changes require code updates (sampling params, thinking budgets)
- •More literal instruction following may break prompts tuned for Opus 4.6
- •Adaptive thinking must be explicitly enabled (it's off by default)
Migrate during a low-traffic period, test your prompts against the more literal interpretation style, and measure token usage on real traffic before committing.
Further Reading
Share this article
About NeuralStackly
Expert researcher and writer at NeuralStackly, dedicated to finding the best AI tools to boost productivity and business growth.
View all postsRelated Articles
Continue reading with these related posts

Claude Mythos: Anthropic's AI Model So Powerful It May Never Be Released
Anthropic's Claude Mythos can find thousands of zero-day vulnerabilities, but the company says it's too dangerous for public release. Here's everything we know about the most co...

Meta Muse Spark: The AI Model That Could Reshape the Competitive Landscape in 2026
Meta has unveiled Muse Spark, its first flagship AI model from Meta Superintelligence Labs. With benchmark-topping performance in medical reasoning and software engineering, a $...
Jensen Huang Says AGI Has Already Been Achieved on Lex Fridman Podcast
Jensen Huang Says AGI Has Already Been Achieved on Lex Fridman Podcast
Jensen Huang's AGI claim on the latest Lex Fridman podcast is one of the boldest AI statements of 2026. Here's what he appears to mean and why NVIDIA's AI factory thesis matters...