AI Models

AI Models Released in May 2026 — Complete Roundup

Every major LLM and AI tooling release in May 2026 — Qwen 3.7-Max, DeepSeek V4-Pro permanent pricing, Gemini 3.5 Flash, Composer 2.5, Grok Build, Cherry Studio 1.9.6, Ollama 0.24, and what's still rumored.

Published 23 May 2026 • Updated 23 May 2026 • 9 min read

Quick answer. May 2026 was dominated by Alibaba (Qwen 3.7-Max preview, May 20), DeepSeek (V4-Pro 75% discount made permanent on May 22 at $0.435/$0.87 per 1M tokens), Google (Gemini 3.5 Flash, May 19), Cursor (Composer 2.5, May 18) and xAI (Grok Build CLI beta, May 14). Anthropic announced a major billing split for June 15 but shipped no new model. Claude Sonnet 4.8, GPT-5.6 and Llama 5 remain unannounced or unreleased.

This is the first entry in our monthly AI release roundup. The goal is simple: one place that tells you exactly what shipped, what was just announced, and what's still rumored — without the inflated launch copy and without pretending preview releases are GA.

Every claim below is anchored to the vendor's own announcement or a first-party launch artifact. Where something is preview-only, leaked, or speculative, we say so out loud.

TL;DR — Everything that shipped in May 2026

Date	Release	Type	Status	What's new
May 13	Cursor 3.4	Tooling	GA	Team-configurable agent environments, in-app PR review
May 14	Ollama 0.24	Tooling	GA	Codex App integration, reworked MLX sampler, Claude Desktop support
May 14	Grok Build CLI	Tooling	Early beta	xAI's first coding agent CLI; Grok 4.3 beta, 2M token context, up to 8 parallel subagents
May 14	Anthropic billing split announcement	Policy	Announced	Agent SDK credit pool separated from chat subscriptions, effective June 15
May 15	Cherry Studio 1.9.6	Tooling	GA	Knowledge-base URL preservation fix, GitCode sync improvements, multi-turn image editing
May 18	Composer 2.5	Model	GA	Cursor's in-house coding model; 79.8% SWE-Bench Multilingual, parity with Opus 4.7 and GPT-5.5
May 19	Gemini 3.5 Flash	Model	GA	Google I/O 2026 launch; outperforms 3.1 Pro on most agentic and coding benchmarks
May 20	Qwen 3.7-Max-Preview	Model	Preview	Apsara Summit; closed weights; 1M context; LM Arena #13 overall, #7 Math
May 22	DeepSeek V4-Pro permanent pricing	Pricing	GA	75% discount made permanent: $0.435 in / $0.87 out per 1M tokens

Two notable April releases still shaped the May conversation and are worth noting at the top of the timeline:

Mistral Medium 3.5 shipped April 30 — a 128B dense merged model that replaced Magistral, Pixtral and Devstral in one drop. It barely missed the May cut-off and pulled a lot of oxygen in early May coverage.
Kimi K2.6 dropped GA on April 20 — Moonshot's 1T-parameter MoE agentic model. K2.7 has not been announced.

Frontier model releases

Qwen 3.7-Max-Preview (Alibaba, May 20)

Alibaba announced Qwen 3.7-Max at the Apsara Cloud Summit in Hangzhou on May 20, 2026, formalizing what had already been live on LM Arena for a week as qwen3.7-max-preview. Key facts from the launch:

Status: preview only. The flagship Max variant is closed-weight; the smaller Plus variant is promised to ship under an open-weight license but has not yet been released.
Context window: 1 million tokens with an extended-thinking mode.
Agentic claim: Alibaba demoed a 35-hour autonomous run chaining over 1,000 tool calls without measurable degradation.
Benchmarks: Elo 1,475 on LM Arena text leaderboard (#13 overall), #7 on Math, #10 on Coding. Scored 56.6 on the Artificial Analysis Intelligence Index — the highest-ranked Chinese model on that index.
Pricing: $2.50 / $7.50 per 1M tokens (input / output) on OpenRouter at preview launch.

What this means in practice: Qwen 3.7-Max is now the strongest Chinese closed-weight model on public leaderboards, but the open-weight story everyone actually wants is still pending. Deep-dive: Qwen 3.7: Release Date, Status, and What's Real vs Rumored and Qwen 3.7 vs Qwen 3.6: What's actually different.

Gemini 3.5 Flash (Google, May 19)

Google announced Gemini 3.5 at I/O 2026 on May 19, with Gemini 3.5 Flash shipping immediately as the first member of the family. Highlights from DeepMind's announcement:

Positioning: agent-first, not chatbot-first — Google explicitly framed the launch around long-horizon tool-use and coding rather than chat quality.
Performance: Google claims 3.5 Flash beats the previous frontier model 3.1 Pro on "nearly all" benchmarks including coding, agentic tasks, and multimodal reasoning.
Availability: Gemini API in AI Studio and Android Studio, Google Antigravity for developers, Gemini Enterprise for businesses, and consumer Gemini app + AI Mode in Search.
What's next: Google confirmed Gemini 3.5 Pro is already in internal use and is expected to roll out in June.

Full status, pricing detail and what's confirmed vs rumored about the Pro tier: Gemini 3.5 Pro & Flash: Release Date, Status, and What's Real vs Rumored.

Composer 2.5 (Cursor, May 18)

Cursor shipped Composer 2.5 on May 18 as a frontier-grade in-house coding model — the first time the company has positioned its own model as competing head-to-head with Opus 4.7 and GPT-5.5 rather than as a faster fallback.

Benchmarks: 79.8% SWE-Bench Multilingual, 63.2% CursorBench v3.1 — Cursor claims parity with Claude Opus 4.7 and GPT-5.5 on coding tasks.
Pricing: standard tier $0.50 in / $2.50 out per 1M tokens; fast variant $3.00 / $15.00. Double usage credited for the first week as a launch promotion.
Status: rolled out to all Cursor users; selectable from the model picker.

For the practical "which model do I actually pick" question, our existing comparison still holds: Cursor Composer vs Claude Sonnet 4.6 — how to actually choose. If you're hitting the model-routing bug where Cursor silently downgrades to Composer 2 Fast: the fix is here.

Pricing & policy changes

DeepSeek V4-Pro permanent price cut (May 22)

This is the biggest pricing news of the month. DeepSeek had been running a 75% promotional discount on V4-Pro since April. On May 22, the company announced via its API docs that the discounted rate is now permanent — the original price will be officially adjusted to one quarter of its previous level when the promotional window ends on May 31.

The permanent rates:

Uncached input: $0.435 per 1M tokens
Output: $0.87 per 1M tokens
Cached input: $0.003625 per 1M tokens

For context: at these prices, DeepSeek V4-Pro is roughly 8x cheaper than Claude Opus 4.7 on input and 10x cheaper on output, while remaining one of the strongest open-weights coding models. If you've been waiting for a reason to switch a high-volume workload, this is it.

Background, architecture and benchmarks: DeepSeek V4 Pro Review. For when to use Pro vs Flash: DeepSeek V4 Pro vs V4 Flash. For Cursor integration: DeepSeek V4 in Cursor — complete setup.

Anthropic billing split (announced May 14, effective June 15)

On May 14, Anthropic announced a major restructuring of Claude subscriptions. Starting June 15, 2026, billing is split into two pools:

Chat / first-party tools: the existing Pro / Max 5x / Max 20x credit pool, unchanged.
Agent SDK credit: a separate monthly allowance for programmatic usage via Claude Agent SDK, claude -p, Claude Code GitHub Actions, and third-party agent frameworks. $20 for Pro, $100 for Max 5x, $200 for Max 20x.

The stated motivation is "subscription arbitrage" — third-party agent frameworks like OpenClaw running long-horizon tasks on $20/mo plans while consuming hundreds of dollars of tokens. The practical impact: for heavy programmatic users this is a meaningful price increase, and you need to decide before June 15 whether to enable overflow billing at full API rates.

Full pre-deadline checklist and cost math: Anthropic's June 15 Billing Change — what every Claude Code & Agent SDK user must do.

Tooling releases

Grok Build CLI beta (xAI, May 14)

xAI shipped its first agentic coding CLI on May 14, going head-to-head with Claude Code and Codex CLI. Key facts from the launch:

Status: early beta, gated to SuperGrok Heavy subscribers.
Model: Grok 4.3 beta with a 2M token context window.
Capabilities: terminal-based planning, clean diffs, parallel subagents (up to 8), worktree support, headless mode, ACP support.
Pricing: $299/mo SuperGrok Heavy; a SuperHeavy tier at $99/mo (67% off) for the first six months.
Platforms: macOS and Linux natively; Windows via WSL2.
Install: curl -fsSL https://x.ai/cli/install.sh | bash

How it actually compares against the competition: Grok Build vs Claude Code vs Codex CLI — which coding agent wins. Practical install guide: Grok Build — how to install and run.

Cursor 3.4 (May 13)

The release that quietly preceded Composer 2.5. Cursor 3.4 introduced team-configurable development environments for agents and a new in-app PR review experience where pull requests can be taken from creation to merge inside the editor. This is the infrastructure layer that Composer 2.5 builds on.

Ollama 0.24 (May 14)

Ollama's biggest release of the year so far. Three notable additions:

Codex App integration — Ollama now plays nicely with OpenAI's Codex App desktop experience, including parallel worktree support and built-in git functionality.
Reworked MLX sampler — improved generation quality specifically on Apple Silicon. If you're running local models on an M-series Mac, this is a quality bump.
Claude Desktop support — Claude Cowork and Claude Code now work inside the Claude Desktop App via Ollama Launch.

Cherry Studio 1.9.6 (May 15)

A maintenance release rather than a marquee one, but worth noting because Cherry Studio is becoming the default open-source frontend for hobbyist and prosumer Ollama users:

Preserves HTTP URLs in knowledge base documents (previously stripped).
Thread idle-timeout handling fix in the stream chunk adapter.
GitCode sync reliability improvements.
Multi-turn image editing — assistant image blocks now convert to base64 so they survive across turns.

If you're new to Cherry Studio, our platform-specific install guides cover the full setup: Windows, macOS, Linux / Ubuntu.

What's NOT released (despite what you may have read)

Three rumors got a lot of column inches in May. None of them shipped, and we think it's important to say so clearly:

Claude Sonnet 4.8 — referenced in a leaked Claude Code CLI source map in March, expected by some commentators in May. No Anthropic announcement, no API model id, no benchmarks, no date. Anthropic's May 6 developer event came and went without a Sonnet 4.8 announcement. See Claude Sonnet 4.8 — what's real vs rumored.
GPT-5.6 — surfaced briefly in OpenAI's internal Codex logs and prediction markets gave ~80–89% odds for a June 30 release. No OpenAI announcement. GPT-5.5 Instant on May 5 was the only actual OpenAI release this month. See GPT-5.6 — what's real vs rumored.
Llama 5 — some aggregator sites and a financial blog claimed an April 8 release. We can find no Meta announcement, no Hugging Face weights, no model card, no benchmark entries on LM Arena. Prediction markets remain split. Treat any "Llama 5 review" you see online as either premature or AI-generated noise.

None of these are guaranteed not to ship — but as of May 23, none of them have, and any benchmark numbers you see for them online are fabricated.

What to watch in June 2026

June 15: Anthropic's billing split takes effect. If you're a heavy Claude Code user, you have about three weeks to audit your usage and decide whether to switch some workloads to DeepSeek V4-Pro or Composer 2.5.
Gemini 3.5 Pro: Google confirmed it's already in internal use and expected to roll out in June.
GPT-5.6: prediction markets are clustered around June 30 with ~80% confidence. OpenAI typically does not pre-announce, so watch the API model list.
Qwen 3.7 open-weight variants: Alibaba promised that the Plus variant would ship under an open-weight license. Mid-to-late June is the most plausible window based on past Qwen cadences.
Kimi K3: r/LocalLLaMA chatter suggests Moonshot may be training a 3–4T parameter model. No official announcement.

Frequently asked questions

What's the single most important AI release in May 2026?

DeepSeek's permanent V4-Pro price cut on May 22. At $0.435 / $0.87 per 1M tokens, V4-Pro is now the price-performance leader for high-volume coding workloads by a meaningful margin, and the change is permanent rather than promotional.

Was Claude Sonnet 4.8 released in May 2026?

No. Sonnet 4.8 has not been announced or released by Anthropic. The name appeared in a leaked Claude Code CLI source map in March, but Anthropic has neither confirmed nor scheduled a release.

Is Qwen 3.7-Max open source?

No — the Max variant is closed-weight. Alibaba has indicated that the smaller Plus variant will ship under an open-weight license, but as of May 23 no Qwen 3.7 open weights have been published.

Did Llama 5 release in 2026?

No. There is no Meta announcement, no Hugging Face weights, and no model card for Llama 5 as of May 23, 2026. Articles claiming an April 2026 release appear to be either premature or AI-generated.

How much cheaper is DeepSeek V4-Pro than Claude Opus 4.7?

At the new permanent rates, DeepSeek V4-Pro is roughly 8x cheaper on input tokens and 10x cheaper on output tokens than Claude Opus 4.7, while remaining competitive on coding benchmarks.

What does Anthropic's June 15 billing change actually do?

It splits Claude subscriptions into two credit pools: one for chat and first-party tools (unchanged) and a new "Agent SDK credit" for programmatic usage via Claude Agent SDK, claude -p, and third-party agent frameworks. The Agent SDK pool is $20 / $100 / $200 per month for Pro / Max 5x / Max 20x.

Is Composer 2.5 a frontier model?

Cursor claims so. On SWE-Bench Multilingual (79.8%) and CursorBench v3.1 (63.2%) it benchmarks at parity with Claude Opus 4.7 and GPT-5.5 for coding. Independent reproductions are still landing, but Cursor's own numbers are credible.

When is GPT-5.6 coming out?

OpenAI has not announced GPT-5.6. Prediction markets clustered around a June 30, 2026 release with roughly 80–89% confidence as of mid-May, but no official date exists.

References

Alibaba Qwen 3.7-Max: MarkTechPost, BuildFastWithAI
DeepSeek V4-Pro permanent pricing: DeepSeek API docs, Hacker News discussion
Gemini 3.5 Flash: Google blog, MarkTechPost
Composer 2.5: Cursor blog
Anthropic billing split: The New Stack, Zed blog
Grok Build CLI: xAI announcement, install page
Cursor 3.4: Cursor changelog
Ollama 0.24: GitHub releases
Cherry Studio 1.9.6: GitHub release notes
Mistral Medium 3.5: Mistral docs

This is the first of a monthly series. The June 2026 roundup will land in the first week of July.