Anthropic launches Claude Sonnet 5, focused on coding, tool use, and agentic work

Anthropic says Sonnet 5 scores above Sonnet 4.6 on coding, tool use, and knowledge work, and comes close to Opus 4.8 on some evaluations. This piece summarizes vendor materials and external benchmarks; hands-on impressions will be handled separately.

Anthropic launched Claude Sonnet 5 on June 30. It is now the default model for Free and Pro plans, and is available to Max, Team, and Enterprise users, as well as in Claude Code and the API.

This piece summarizes the material released with the launch: which claims Anthropic put forward, where its own tables and external benchmarks line up, and what API changes users need to check.

What launched

The API model ID is claude-sonnet-5. Anthropic released it as the new Sonnet model following Sonnet 4.6.

Pricing is $2 per million input tokens and $10 per million output tokens through August 31. After that, the standard price of $3 input and $15 output applies. That standard per-token price is the same as Sonnet 4.6.

Anthropic says Sonnet 5 is the default model for Free and Pro users, and is available to Max, Team, and Enterprise users. Anthropic also shipped it to Claude Code and the Claude Platform API on launch day.

What Anthropic says changed

Anthropic's launch post says Sonnet 5 improves over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. The comparison set in the post is Sonnet 4.6 and Opus 4.8.

The company's own launch table lists these figures:

Evaluation	Sonnet 4.6	Sonnet 5	Opus 4.8
SWE-bench Pro	58.1%	63.2%	69.2%
Terminal-Bench 2.1	67.0%	80.4%	82.7%
Humanity's Last Exam, no tools	34.6%	43.2%	49.8%
Humanity's Last Exam, with tools	46.8%	57.4%	57.9%
OSWorld-Verified	78.5%	81.2%	83.4%
GDPval-AA v2	1395	1618	1615

This is Anthropic's own chart, not an independent verification. Within the table, the larger stated gains over Sonnet 4.6 appear in evaluations with tool use or longer-running work, such as Terminal-Bench 2.1, HLE with tools, and GDPval-AA v2.

Anthropic also showed cost-performance curves by effort level. Even within the same model, low, medium, high, xhigh, and max effort change both capability and token use. Anthropic says Sonnet 5 offers a wider range of cost-performance options than Sonnet 4.6, and that at higher effort it can reach Opus 4.8 levels on some tasks.

What external benchmarks saw

Artificial Analysis said in its June 30 analysis that Sonnet 5 scored 53 on the Intelligence Index at max effort, six points above Sonnet 4.6.

The same analysis said Sonnet 5 slightly led Opus 4.8 on agentic knowledge-work evaluations such as AA-Briefcase and GDPval-AA. It also said Opus-class models still lead on heavier, knowledge-heavy reasoning. On CritPt, a physics reasoning benchmark, Sonnet 5 rose sharply from Sonnet 4.6 but remained behind GLM-5.2, Opus, Fable, and GPT-5.5 tiers.

Artificial Analysis also measured cost. Under standard pricing, it found Sonnet 5 cost about twice as much per task as Sonnet 4.6 and about 15% more than Opus 4.8 in its evaluation setup. The reason was increased token use, not a higher per-token price. This is specific to Artificial Analysis's test environment.

API changes to check

Sonnet 5 uses a new tokenizer. Anthropic's docs say the same input text produces about 30% more tokens than on Sonnet 4.6. The launch post footnote puts the range at roughly 1.0-1.35x depending on content type.

Adaptive thinking is on by default. On Sonnet 4.6, a request without a thinking field ran without thinking. On Sonnet 5, the same request runs with adaptive thinking. Workloads with tight max_tokens settings need another look.

Manual extended thinking is removed. thinking: {type: "enabled", budget_tokens: N} returns a 400 error on Sonnet 5. Anthropic points users to adaptive thinking and effort instead.

Setting temperature, top_p, or top_k to a non-default value also returns a 400 error. Code that used those parameters for tone or variety needs to move that control into system prompts or examples.

One note in the effort guide is worth flagging for migration. Anthropic says Sonnet 5 at medium is comparable to Sonnet 4.6 at high effort. In other words, the same effort label means different things across the two models.

The bottom line

Anthropic is positioning Sonnet 5 around coding, tool use, and longer agentic work, and external benchmarks showed similar results on some knowledge-work evaluations. Heavier reasoning, though, still goes to Opus-class models, and the new tokenizer and effort settings mean cost and output limits have to be recalculated.

We'll cover hands-on impressions in a separate piece.

Sources

Primary sources

Anthropic: Introducing Claude Sonnet 5
Anthropic developer docs: What's new in Claude Sonnet 5
Anthropic developer docs: Migration guide
Anthropic developer docs: Prompting Claude Sonnet 5
Anthropic developer docs: Effort

Independent benchmarks

Artificial Analysis: Claude Sonnet 5: strong agentic performance at a higher cost per task
Artificial Analysis: Claude Sonnet 5 model page

Companion pieces

OpenAI previews next-generation GPT-5.6: Sol, Terra, and Luna, with broad availability to follow within weeks
After Fable 5 went offline, the open model in hot pursuit: GLM-5.2