Claude Opus 4.8 Launch Guide: Benchmarks & Pricing 2026

Anthropic launched Claude Opus 4.8 on May 28, 2026: SWE-bench Pro 69.2%, GDPval Elo 1890 (+121 over GPT-5.5), Fast mode 3x cheaper than 4.7, dynamic workflows for hundreds of parallel subagents. Pricing unchanged at $5/$25 per 1M. Full launch breakdown.

Published 28 May 2026 • Updated 24 Jul 2026 • 12 min read

Quick answer. Claude Opus 4.8 launched May 28, 2026 — 41 days after Opus 4.7. It scores 88.6% on SWE-bench Verified, 69.2% on SWE-bench Pro, and 1890 on GDPval-AA (121 Elo points ahead of GPT-5.5). Standard pricing is unchanged at $5 input / $25 output per 1M tokens; the new Fast mode is 3x cheaper than Opus 4.7's at $10/$50. 1M-token context on Anthropic API, Bedrock, and Vertex AI.

Anthropic announced Claude Opus 4.8 on May 28, 2026 — its fastest version cadence yet, landing just 41 days after Opus 4.7. The headline is not a single big number; it is that Anthropic kept standard pricing flat while pushing agentic coding, computer use, and knowledge-work scores meaningfully higher — and made Fast mode three times cheaper than the 4.7 equivalent.

This launch guide walks through what shipped, the benchmark deltas vs Opus 4.7 and the rest of the May 2026 frontier (GPT-5.5, Gemini 3.1 Pro, DeepSeek V4-Pro, Grok 4.3, Qwen 3.7 Max), the new dynamic workflows and effort control surfaces, and what to migrate today.

Update — Claude Opus 5 is here (July 24, 2026). Opus 5 supersedes Opus 4.8 at the same $5/$25 pricing — it more than doubles 4.8 on the Frontier-Bench agentic-coding suite and lifts SWE-bench Pro from 69.2% to 79.2%, so upgrading is the clear move. See the Claude Opus 5 launch guide for full benchmarks and migration notes.

What shipped with Claude Opus 4.8?

Six things, all dated May 28, 2026:

The model itself — API id claude-opus-4-8, GA across Claude API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. 1M-token context on the first three; 200K on Foundry at launch.
Fast mode as a research preview on the Claude API — speed: "fast" flag pushes throughput to ~2.5x standard at premium pricing ($10 / $50 per 1M, vs $30 / $150 on Opus 4.7).
Dynamic workflows in Claude Code — Claude plans a large task, fans out hundreds of parallel subagents, and verifies their outputs against your test suite. Designed for hundreds-of-thousands-of-LoC refactors.
Effort control across claude.ai, Cowork, and Claude Code: Low / Medium / High / xHigh / Max. Opus 4.8 defaults to High.
Alignment + system card — Anthropic reports misalignment incidence ~1.9 (vs 2.5 for 4.7), "effectively tied with Mythos Preview" — its best-aligned model. 4.8 ships under the same ASL-3 protections as 4.7; Anthropic states it does not advance the capability frontier beyond Mythos Preview.
GitHub Copilot GA — Opus 4.8 is selectable in Copilot on day one.

Three quieter platform changes also landed on day one and matter if you're already shipping with the Claude API:

Adaptive thinking is sharper. Opus 4.8 wastes fewer thinking tokens at the same effort level — it decides per turn whether to think. On simple lookups it responds directly; on multi-step problems it reasons before answering. Net effect: the same High-effort run usually finishes faster and cheaper than Opus 4.7 at High.
Better tool triggering. Opus 4.7 had a known regression where it sometimes skipped a tool call the task required (fixed inline in 4.8). If you saw "Claude described what it would do but didn't actually call the function" in 4.7 logs, that pattern should largely disappear.
Lower minimum cacheable prompt length — down to 1,024 tokens. Smaller prompts now qualify for prompt caching's 90% discount on cache reads, which matters for short-system-prompt agentic loops that were previously below the cache threshold.

Anthropic also teased a broader Mythos release in the coming weeks — Opus 4.8 is positioned as the bridge between the 4.5 family and that next generation.

How fast are the benchmark jumps over Opus 4.7?

Anthropic's framing is that Opus 4.8 is a practical upgrade — not a synthetic leaderboard reshuffle. The agentic and real-world scores tell the same story.

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
SWE-bench Verified	88.6%	87.6%	~88%	—
SWE-bench Pro (agentic coding)	69.2%	64.3%	58.6%	54.2%
Terminal-Bench 2.1	74.6%	66.1%	78.2%	—
OSWorld-Verified (computer use)	83.4%	82.3%	78.7%	76.2%
HLE w/ tools	57.9%	54.7%	<57.9%	—
GDPval-AA (Elo)	1890	1753	1769	—

The biggest single jumps are on Terminal-Bench 2.1 (+8.5 points) and GDPval-AA (+137 Elo). SWE-bench Pro climbs +4.9 and SWE-bench Verified moves +1.0 — already saturated near the ceiling. Computer use moves a small +1.1 against the Anthropic-restated Opus 4.7 baseline.

One number worth understanding: GDPval-AA Elo 1890 implies a ~67% head-to-head win rate against GPT-5.5, according to Anthropic's released chart. That is a measurable, non-marginal lead on broad knowledge-work tasks for the first time in the Opus 4.x line. The honest caveat: Opus 4.8 still uses about 30% more turns than GPT-5.5 to complete the same task, so higher quality comes with a real inference-cost overhead.

Where 4.8 doesn't win: pure terminal-only agent loops. GPT-5.5 still beats Opus 4.8 on Terminal-Bench 2.1 (78.2 vs 74.6) — narrow but real. If your stack is a single agent in a shell, GPT-5.5 remains competitive.

Is the "4x fewer code flaws" claim real?

It's Anthropic's number, not a third-party measurement, and it's framed as: Opus 4.8 is four times less likely than Opus 4.7 to miss flaws in code it produces. Treat that as a self-reported alignment metric, not a benchmark. The downstream effect early testers describe — fewer hallucinated fixes, more honest "I'm not sure" mid-task, less back-and-forth — is consistent with what we saw with the Opus 4.7 honesty patch but a step further.

How much does Claude Opus 4.8 cost?

Standard pricing is unchanged from Opus 4.7. The big delta is Fast mode, which Anthropic dropped 3x.

Mode	Input ($/1M)	Output ($/1M)	Notes
Standard	$5	$25	Unchanged from 4.7
Fast (research preview)	$10	$50	~2.5x output tokens/sec, was $30/$150 on 4.7
Cached read	~$0.50	n/a	0.1x input rate (90% savings)
Batch API	$2.50	$12.50	50% off both sides, async only

Batch + cache stack. A cached batch read against an Opus 4.8 system prompt can drop effective cost to about 5% of standard — useful for high-volume agentic eval runs.

How does this compare to the rest of the frontier?

The May 2026 frontier is more fragmented than it has ever been. Quick rate-card comparison:

Model	Input ($/1M)	Output ($/1M)	SWE-Verified	Context
Claude Opus 4.8	$5	$25	88.6%	1M
GPT-5.5	~$3 est	~$15 est	~88%	~400K
Gemini 3.1 Pro	$2 (≤200K) / $4	$12 / $18	—	2M+
Qwen 3.7 Max	$2.50	$7.50	80.4%	1M
DeepSeek V4-Pro	$0.435	$0.87	80.6%	128K
Grok 4.3 (high)	$1.25	$2.50	—	1M
Mistral Medium 3.5	$1.50	$7.50	77.6%	256K
Cohere Command A+	TBD	TBD	—	128K

Opus 4.8 is now the most expensive frontier model by a margin — DeepSeek V4-Pro is roughly 12x cheaper on input and 29x cheaper on output. The question for every architect is whether 4.8's lead on agentic coding (+10.2 points on SWE-Pro vs V4-Pro), computer use, and knowledge work justifies the rate-card delta. For high-stakes greenfield engineering work, the answer is usually yes; for high-volume routine generation, a tiered architecture (Opus 4.8 as planner, V4-Pro or Grok 4.3 as executor) is the consensus 2026 pattern. We dig deeper into how to mix models in our AI coding agents guide.

What are dynamic workflows and effort control?

These are the two product-surface changes that ship with the model and are likely to matter more to your day-to-day than any single benchmark number.

Dynamic workflows

Currently a research preview in Claude Code. The pattern is: instead of one Opus 4.8 instance grinding through a long task, Opus 4.8 plans the work, spins up hundreds of parallel subagent calls (cheaper or same-tier), watches their outputs, and self-verifies before returning. Anthropic specifically calls out codebase-scale migrations across hundreds of thousands of lines of code, using your existing test suite as the success signal.

The trade-off is cost surface area. Hundreds of parallel calls can run up bills quickly, so dynamic workflows are gated and the default is to keep one supervisor on Opus 4.8 with subagents on cheaper tiers — but the API leaves that mix to you.

Effort control

A single knob that maps to extended-thinking budget plus a few internal heuristics:

Low — minimum thinking, fastest, cheapest. Good for chat-style help.
Medium — balanced.
High — Opus 4.8's default on claude.ai and Cowork. Best balance of quality and latency.
xHigh — Claude Code only, more thinking budget.
Max — "hardest tasks and long-running workflows". Most thinking, slowest, highest cost.

If you were tuning thinking.budget_tokens by hand on Opus 4.7, the effort enum gives you something closer to a Cursor-style preset.

How do I call Claude Opus 4.8 from the API?

Drop-in compatible with the Opus 4.7 messages API. The only change is the model id.

cURL

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 4096,
    "thinking": {"type": "enabled", "budget_tokens": 16000},
    "messages": [
      {"role": "user", "content": "Refactor this function for clarity and add type hints..."}
    ]
  }'

Python SDK

import anthropic

client = anthropic.Anthropic()

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=8192,
    thinking={"type": "enabled", "budget_tokens": 24000},
    # speed="fast",  # research preview — premium pricing
    messages=[
        {"role": "user", "content": "Plan and execute a migration from Express to Fastify on this repo."}
    ],
)
print(resp.content[0].text)

Batch API (50% off async)

curl https://api.anthropic.com/v1/messages/batches \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: message-batches-2024-09-24" \
  -H "content-type: application/json" \
  -d '{
    "requests": [
      {"custom_id": "review-001",
       "params": {"model": "claude-opus-4-8", "max_tokens": 2048,
                  "messages": [{"role": "user", "content": "Review PR #1234..."}]}}
    ]
  }'

Should I migrate from Opus 4.7 today?

For most production workloads, yes — same price, better agentic scores, better reliability. The migration is a model-id swap. Things to actually check before flipping a switch:

Prompt regression. Opus 4.8 is more conservative about claims and more likely to say "I'm not sure." If you parse free-text outputs for confident-sounding assertions, you'll see more hedging. Update your prompts to ask for explicit confidence labels rather than relying on tone.
Cost shape. Standard pricing is unchanged, but if you were paying for Opus 4.7 Fast mode at $30/$150, the move to 4.8 Fast at $10/$50 is a real 3x cost cut. Re-evaluate which calls actually need Fast mode.
Context strategy. 1M context is default on Anthropic API, Bedrock, and Vertex; if you're on Microsoft Foundry today, you're still capped at 200K — keep the existing chunking strategy.
Effort defaults. claude.ai and Cowork now default to High. If you previously over-tuned budget_tokens manually, simplify with the effort enum.
Dynamic workflows are gated. Don't design a production architecture on the research preview today; pilot first.

What changes for Claude Code users?

Claude Code is the primary surface Anthropic has pushed Opus 4.8 through, and the day-one updates land in three places.

Default model bump. New sessions land on claude-opus-4-8. The /claude-api skill in Claude Code ships with a 4.7 → 4.8 migration note covering the effort enum and the deprecated thinking.budget_tokens-only pattern.
MCP hardening. claude mcp list and claude mcp get now mark un-approved .mcp.json servers as "Pending approval" instead of silently auto-connecting. A long-standing bug where MCP servers with paginated tools/list responses dropped non-first-page tools is fixed in the same release. If your agent quietly lost access to tools after a server restart, that's the cause.
Effort presets in the TUI. Selecting Low / Med / High / xHigh / Max is now a top-level flag, not buried under thinking-budget tuning.

The official Claude Code releases page (github.com/anthropics/claude-code/releases) has the full changelog. If you're scripting Claude Code in CI, pin the version explicitly — dynamic workflows are gated behind a feature flag at launch and won't auto-enable in older binaries.

How does Opus 4.8 handle multimodal and vision?

Anthropic's launch materials don't lead with vision benchmarks for Opus 4.8 — and that absence is meaningful. The competitive frontier on multimodal is still Gemini 3.1 Pro for chart-heavy and image-grounded reasoning, and Anthropic appears to have prioritized agentic coding, computer use, and alignment over closing that gap in this release.

What you do get: the same vision input surface as Opus 4.7 (PDFs, images, screenshots passed via image content blocks), now riding on the more reliable judgment of the 4.8 base model. For agentic loops where Claude is reading screenshots as part of a computer-use task, the OSWorld-Verified bump to 83.4% is the relevant number — Opus 4.8 is the strongest computer-use model on the market. For pure chart reading or document QA, evaluate Gemini 3.1 Pro side by side.

How does Opus 4.8 fit into Anthropic's roadmap?

Opus 4.8 is the third minor bump in the Opus 4.5 family — 4.5 (Nov 2025) → 4.6 (Feb 2026) → 4.7 (Apr 2026) → 4.8 (May 2026). The HN reaction this morning is a mix of "capability deltas are getting fuzzy" and "the practical-reliability story is what matters." Anthropic's own framing makes the latter point: this is a quality + honesty release, not a scoreboard release.

The other signal is that Anthropic teased Mythos as the next generation "in coming weeks." Mythos Preview is already the model 4.8's alignment story is benchmarked against (and ties on the alignment dimension). If you're tracking Anthropic's release cadence, the 4.x line is likely closer to the end than the middle — see our Anthropic Mythos guide for what's been previewed.

If you're benchmarking across vendors, our side-by-side Mythos vs Opus 4.7 vs GPT-5.5 comparison is the framework to extend with 4.8's numbers.

FAQ

When did Claude Opus 4.8 launch?

May 28, 2026 — 41 days after Opus 4.7, Anthropic's fastest minor-version cadence yet. Announced at anthropic.com/news/claude-opus-4-8 and available immediately on claude.ai, the Claude API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry.

How much does Claude Opus 4.8 cost?

Standard pricing is $5 per million input tokens and $25 per million output tokens — unchanged from Opus 4.7. The new Fast mode is $10/$50 per million, three times cheaper than Opus 4.7's Fast mode ($30/$150). Cached reads are about $0.50 per million; batch API is 50% off on both sides.

What is the context window?

1 million tokens by default on the Anthropic API, Amazon Bedrock, and Google Vertex AI. Microsoft Foundry is capped at 200,000 tokens at launch. Max output is 128,000 tokens.

Is Claude Opus 4.8 better than GPT-5.5?

On most benchmarks, yes. Opus 4.8 wins on SWE-bench Pro (69.2 vs 58.6), OSWorld computer use (83.4 vs 78.7), and GDPval-AA Elo (1890 vs 1769, implying about 67% head-to-head win rate). GPT-5.5 still wins on Terminal-Bench 2.1 (78.2 vs 74.6) — pure command-line agent loops.

How does Opus 4.8 compare to DeepSeek V4-Pro on cost?

DeepSeek V4-Pro is about 12 times cheaper on input ($0.435 vs $5) and 29 times cheaper on output ($0.87 vs $25). Opus 4.8 wins on agentic coding (SWE-bench Pro 69.2 vs ~60), reasoning with tools, and computer use, but if your workload is high-volume routine generation, the V4-Pro cost story is compelling. A tiered architecture — Opus 4.8 as planner, V4-Pro as executor — is a common 2026 pattern.

What are dynamic workflows?

A research-preview feature in Claude Code where Opus 4.8 plans a large task, spawns hundreds of parallel subagents, and verifies their outputs against your test suite before returning. Designed for codebase-scale migrations across hundreds of thousands of lines of code. Gated at launch; expect availability changes.

What is effort control?

A new enum — Low, Medium, High, xHigh, Max — that sets how much extended thinking Opus 4.8 applies. Opus 4.8 defaults to High on claude.ai and Cowork. Higher settings produce deeper reasoning at higher cost and latency; lower settings are cheaper and faster. xHigh and Max are Claude Code only.

Should I migrate from Opus 4.7 today?

For most production workloads, yes — the model id is a drop-in swap, standard pricing is unchanged, and the agentic + reliability numbers are meaningfully better. Watch for prompt regression: Opus 4.8 is more honest about uncertainty, so outputs may read more hedged. Update parsers that depend on confident-sounding tone.

What safety tier is Opus 4.8 deployed under?

The same ASL-3 protections as the rest of the Opus 4.x line. Anthropic states 4.8 does not advance the capability frontier beyond Mythos Preview, so the deployment threshold is set by Mythos, not 4.8. The 4.8 system card reports misalignment incidence ~1.9, down from 2.5 on Opus 4.7 — effectively tied with Mythos Preview, Anthropic's best-aligned model.

Is there an open-source equivalent?

Not at Opus 4.8's quality tier yet. The closest open-weight alternatives in May 2026 are DeepSeek V4-Pro (best cost/quality on routine code), Qwen 3.7 Max (strong math reasoning, hosted-only), and Mistral Medium 3.5 (128B dense, sovereign deployments). All trail Opus 4.8 on SWE-bench Pro and computer use.

The bottom line

Claude Opus 4.8 is a quality release dressed as a minor version. The headline-grabbing numbers — SWE-bench Verified 88.6, SWE-bench Pro 69.2, GDPval Elo 1890, 4x fewer un-flagged code flaws — matter, but the under-the-hood story is more useful: standard pricing held, Fast mode dropped 3x, dynamic workflows ship the parallel-subagent pattern as a first-class feature, and the effort knob simplifies extended thinking. For teams already on Opus 4.7, the migration is a one-line change. For teams shopping the frontier, Opus 4.8 owns agentic coding and computer use, GPT-5.5 keeps a narrow terminal-agent edge, and DeepSeek V4-Pro / Grok 4.3 win on raw cost for routine work.

If Anthropic ships Mythos in the next few weeks as teased, Opus 4.8 may be the last Opus 4.x point release. It's a solid place to land.

What shipped with Claude Opus 4.8?

How fast are the benchmark jumps over Opus 4.7?

Is the "4x fewer code flaws" claim real?

How much does Claude Opus 4.8 cost?

How does this compare to the rest of the frontier?

What are dynamic workflows and effort control?

Dynamic workflows

Effort control

How do I call Claude Opus 4.8 from the API?

cURL

Python SDK

Batch API (50% off async)

Should I migrate from Opus 4.7 today?

What changes for Claude Code users?

How does Opus 4.8 handle multimodal and vision?

How does Opus 4.8 fit into Anthropic's roadmap?

FAQ

When did Claude Opus 4.8 launch?

How much does Claude Opus 4.8 cost?

What is the context window?

Is Claude Opus 4.8 better than GPT-5.5?

How does Opus 4.8 compare to DeepSeek V4-Pro on cost?

What are dynamic workflows?

What is effort control?

Should I migrate from Opus 4.7 today?

What safety tier is Opus 4.8 deployed under?

Is there an open-source equivalent?

The bottom line

Sign up for more like this.