Qwen 3.7: Release Date, Status, and What's Real vs Rumored (2026)

Is Qwen 3.7 released? As of May 2026 it isn't — no weights, API, or benchmarks. Here's what's real, what's only rumored, and what to run today.

Updated May 23, 2026. Qwen3.7-Max was officially announced by Alibaba Cloud on May 20, 2026 at the Apsara Summit in Hangzhou. Two preview variants — Qwen3.7-Max-Preview (text) and Qwen3.7-Plus-Preview (vision) — are live free at chat.qwen.ai and lmarena.ai. OpenRouter listed qwen/qwen3.7-max on May 21, 2026 at $2.50 / 1M input, $7.50 / 1M output tokens; Alibaba Cloud Model Studio is the upstream. Open weights are still not on Hugging Face (verified huggingface.co/Qwen on May 23, 2026 — newest official upload remains a Qwen 3.5-based SAE artifact, no Qwen3.7-* repo yet).

This page is the working reference Codersera engineers use to track Qwen 3.7 in real time. It separates what Alibaba announced today from what is previewable, what is vendor-reported, and what is still rumored — because most coverage on the open web is conflating all four. If you landed here looking for a single honest snapshot of where Qwen 3.7 stands on May 20, 2026, this is it. We refresh in place as new artifacts ship.

The short version: Alibaba moved earlier than most analysts expected. Qwen3.7-Max is the announced flagship; previews of both the text and vision variants have been on chat.qwen.ai and lmarena.ai since approximately May 14 and are already producing neutral leaderboard data. Open-weight checkpoints, the parameter counts, the context window, and the API price sheet are not public yet, and we explicitly will not invent them.

What was announced today?

At the Apsara Summit 2026 in Hangzhou on May 20, 2026, Alibaba Cloud formally unveiled Qwen3.7-Max as the next generation of its flagship general-purpose model family. The keynote framed Qwen 3.7 less as a single model and more as a family rollout with three named SKUs at launch:

  • Qwen3.7-Max — the announced flagship general-purpose model (closed at launch; preview access through chat.qwen.ai and Model Studio API).
  • Qwen3.7-Max-Preview — the text variant with deep-thinking on by default, live for free testing since approximately May 14, 2026.
  • Qwen3.7-Plus-Preview — the multimodal/vision variant, also live for free testing on the same preview channels.

The official post on the Qwen blog (qwen.ai/blog?id=qwen3.7) and the @Alibaba_Qwen X account both confirmed availability on Model Studio "in the coming days" without committing to a public price sheet. Coverage from SCMP, Quartz, and Pandaily over the past 48 hours framed Qwen 3.7 as Alibaba's strongest agentic push to date — a positioning move against GPT-5.5, Claude 4.7, and Kimi K2.6.

Alibaba also co-launched the Zhenwu M890 AI chip (144 GB on-chip memory, 800 GB/s inter-chip bandwidth), the Panjiu AL128 supernode, and the ICN Switch 1.0 interconnect — signaling that the Qwen 3.7 generation is being co-designed with first-party silicon. We mention this for context; it doesn't change anything about how you'd consume Qwen 3.7 today, but it does signal a multi-year infra commitment.

What's actually released vs preview vs rumored?

This is the matrix the other ranking pages do not give you. Use it to decide what you can actually try, build on, or quote.

ArtifactStatus as of May 20, 2026How to access / what to watch
Qwen3.7-Max announcementReleased (Apsara Summit keynote, May 20)qwen.ai/blog?id=qwen3.7; @Alibaba_Qwen on X
Qwen3.7-Max-Preview (text)Preview live (since ~May 14)chat.qwen.ai (free); lmarena.ai (free, anonymous testing)
Qwen3.7-Plus-Preview (vision)Preview live (since ~May 14)chat.qwen.ai; lmarena.ai Vision Arena
Qwen3.7-Max stable APIRolling out on Alibaba Cloud Model Studio$2.50 in / $7.50 out per 1M on OpenRouter (May 21)
Hugging Face open weightsNot released. The Qwen HF org currently lists Qwen3.5 / Qwen3.6 onlyWatch huggingface.co/Qwen
Exact parameter countUnverified. Pandaily speculated "dual 72B" — treat as rumorWait for an official model card
Context window lengthUnverifiedWill appear on the Model Studio model card
License termsUnverified (pattern with 3.6-Max was closed flagship + open derivatives)Watch HF org and the Qwen GitHub
SWE-bench / GPQA / AIME / LiveCodeBench / Terminal-Bench scoresNot published by AlibabaNone — do not trust third-party scores until peer-confirmed
ArtificialAnalysis Intelligence IndexNot yet posted as of todayartificialanalysis.ai usually adds new flagships within ~1–2 weeks of preview

Any article you read today that quotes specific SWE-bench or GPQA numbers for Qwen 3.7 is either making them up or quoting a leaked internal slide. We treat both as unusable until Alibaba publishes a numbered, methodology-disclosed benchmark page.

What are the Qwen 3.7 variants?

Here is the variant breakdown you actually need to pick the right preview to test:

VariantModalityReasoning modeAccess pathStatus
Qwen3.7-MaxTextDefault flagship; deep thinking availableAlibaba Cloud Model Studio API (rolling out)Announced; $2.50/$7.50 per 1M (OpenRouter, May 21)
Qwen3.7-Max-PreviewTextDeep thinking on by defaultchat.qwen.ai, lmarena.ai — free, no login required at lmarena.aiLive for testing
Qwen3.7-Plus-PreviewVision + textMultimodal reasoningchat.qwen.ai (vision mode), lmarena.ai Vision ArenaLive for testing

Practical takeaway: if you're doing pure reasoning, math, or code probes, you want Max-Preview. If you're testing screenshot understanding, document parsing, UI agent flows, or image-grounded QA, you want Plus-Preview. There's no released "Mini" or "Flash" SKU at launch — if Alibaba follows the Qwen 3.6 pattern, distilled smaller variants typically arrive a few weeks after the flagship.

How do you try Qwen 3.7 today?

You don't need an API key or a credit card to test either preview right now. Here are the two channels that are working today, plus the evaluation prompts we've found genuinely separate signal from hype.

Option 1: chat.qwen.ai (official, free)

  1. Go to chat.qwen.ai.
  2. Sign in with an Alibaba Cloud / Qwen account (Google sign-in works; phone number optional).
  3. Open the model picker at the top of the conversation pane.
  4. Select Qwen3.7-Max-Preview for text reasoning, or switch to Qwen3.7-Plus-Preview for vision (the vision option appears when you attach an image).
  5. For agentic / tool-use evaluation, enable the deep-thinking toggle — it surfaces the chain-of-thought trace that's been one of the more talked-about preview behaviors.

Option 2: lmarena.ai (neutral, blind)

  1. Go to lmarena.ai.
  2. Use the Direct Chat tab and choose qwen3.7-max-preview or qwen3.7-plus-preview from the dropdown. Or use the Arena battle tab and vote blind — this is what's been feeding the public Elo numbers.
  3. For Plus-Preview specifically, switch to Vision Arena and attach an image.

Evaluation prompts worth running in the preview

If you have 20 minutes to actually pressure-test Qwen 3.7 rather than vibe-check it, run these classes of prompts in chat.qwen.ai with deep-thinking enabled, and compare side-by-side with whichever model you currently ship on:

  • Multi-step debugging probes. Paste a 400-line function with a subtle off-by-one and ask the model to find it without naming the bug class. Strong models follow control flow; weak models pattern-match to common bug names.
  • Long-horizon refactor specs. Give it a 6-step refactor (rename a type, propagate through three files, update tests, add a migration, write a CHANGELOG entry) and check whether it tracks state across all six steps in a single reply.
  • Tool-use plan probes. Describe a fake-but-plausible tool surface (3–5 tools with signatures) and an ambiguous user request. Ask the model to plan tool calls. Frontier models ask one clarifying question; weak ones either guess or refuse.
  • Vision document QA. Switch to Plus-Preview, upload a complex multi-column PDF page (a research paper or earnings filing), and ask for grounded extractions with citations to specific page regions.
  • Math reasoning under adversarial framing. Take a clean AIME-style problem, restate it with a misleading cover story, and check whether the model strips the cover story or gets dragged by it.

If you intend to actually wire Qwen 3.7 into a product, your only real path right now is the Model Studio API rollout. Pricing is not posted; treat any number you see in third-party blogs as speculation. For local-first developers, the play remains running Qwen 3.6 locally until 3.7 weights drop — the Qwen 3.6 family is genuinely strong and runs on consumer hardware with quantization.

What's new in Qwen 3.7 vs Qwen 3.6?

This is the section worth pinning. The 3.6 → 3.7 delta, as far as we can responsibly verify today, has four parts: math, vision, agentic depth, and sustained autonomy. We label every claim by who's making it.

AxisQwen 3.6 (today's baseline)Qwen 3.7 deltaSource label
Math reasoningStrong; competitive in open-weights tierMax-Preview ranks #7 in Math on LM ArenaLM Arena (neutral)
CodingStrong on long-context refactorsMax-Preview ranks #10 in Coding on LM ArenaLM Arena (neutral)
Software / IT tasksSolidMax-Preview ranks #9 Software/ITLM Arena (neutral)
Expert promptsSolidMax-Preview ranks #9 Expert PromptsLM Arena (neutral)
VisionQwen3.6-VL competitive but mid-tier on Vision ArenaPlus-Preview #16 overall; lifts Alibaba to #5 lab in Vision Arena globallyLM Arena (neutral)
Sustained autonomous runNot characterizedHeadline claim: 35-hour sustained autonomous run without performance degradationAlibaba-reported
Tool-call densityNot characterizedHeadline claim: 1,000+ tool calls per sessionAlibaba-reported
Agent framework optimizationGeneric tool useDeeply optimized for OpenClaw, Hermes Agent, Claude Code, Qwen Paw, QoderAlibaba-reported

Reading the agentic claims carefully

Alibaba's framing of Qwen 3.7 leans hard on the agent angle — the 35-hour run, the 1,000-tool-call session, the named optimization for OpenClaw, Hermes Agent, Claude Code, Qwen Paw, and Qoder. There's a real story in there worth unpacking before you take it at face value.

  • "35 hours" is a duration, not a benchmark. The interesting question is what the agent was doing for 35 hours, what counted as "without performance degradation," and how the evaluation was scored. Alibaba has not published the harness, the task set, or the rubric. Until they do, the figure tells you Alibaba is investing in long-horizon agent reliability — not that any specific task you care about will work for 35 hours.
  • "1,000+ tool calls" is a context-management story. Tool calls accumulate context, tool results pollute it, and most production agents fail not because the model is bad at calling tools but because context gets crowded and the planner loses the thread. If the 1,000-call claim is real and reproducible, the underlying improvement is probably about context compaction and trajectory recovery, not raw tool-calling accuracy.
  • The named-framework optimizations are integration moats. When a vendor says "deeply optimized for X, Y, Z," what they usually mean is they've shipped first-party integrations, runtime adapters, or fine-tuned tool-calling formats. Useful if you're already on those frameworks; uninteresting if you're not.

Two honest framings:

  • The neutral LM Arena ranks (text variant trailing GPT-5.5, Claude, and several Grok variants but holding strong top-10 specialty positions) are the most defensible data point you can cite right now. A #7 Math, #10 Coding, #13 overall debut for a brand-new preview is genuinely competitive at the frontier.
  • The 35-hour autonomous run and the 1,000+ tool-call claim are vendor-reported and currently unfalsifiable from the outside. Treat them as marketing-flavored signals of where Alibaba is investing, not as benchmarks. Whatever third-party reproducibility looks like will come from agent-framework maintainers (OpenClaw, Hermes, Claude Code, Qoder) over the coming weeks.

What do neutral benchmarks say?

Here is the only neutral leaderboard data with public methodology as of today (May 20, 2026, 18:00 UTC). All numbers from lmarena.ai's public boards, period May 14–20:

VariantBoardRankElo / signal
Qwen3.7-Max-PreviewText Arena (overall)#13Elo ~1,475
Qwen3.7-Max-PreviewMath#7Top-10
Qwen3.7-Max-PreviewExpert Prompts#9Top-10
Qwen3.7-Max-PreviewSoftware / IT#9Top-10
Qwen3.7-Max-PreviewCoding#10Top-10
Qwen3.7-Plus-PreviewVision Arena (overall)#16Lifts Alibaba to #5 lab in Vision

Two important callouts:

  • The ArtificialAnalysis Intelligence Index has not yet posted a Qwen 3.7 score. Their cadence is usually 1–2 weeks after a preview goes public; expect a number by early June. Until then, anyone quoting a specific "AA Index" position for Qwen 3.7 is fabricating it.
  • Alibaba has not published its own SWE-bench, GPQA, AIME, LiveCodeBench, or Terminal-Bench numbers for Qwen 3.7. This is unusual for a flagship launch and almost certainly intentional — they're letting LM Arena talk first. We'll add them here when (and only when) qwen.ai publishes a numbered methodology page.

Alibaba-published benchmarks (added May 22, 2026)

Two days after the Apsara keynote, Alibaba and Artificial Analysis published a composite set of benchmark numbers for Qwen3.7-Max. They are agentic and reasoning-heavy rather than the classic suite (no MMLU-Pro, GPQA, or SWE-bench Verified yet from Alibaba):

  • Artificial Analysis Intelligence Index v4.0: 56.6 — #5 globally, highest-ranked Chinese model (+4.8 vs Qwen3.6-Max-Preview's 51.8).
  • Terminal-Bench Hard: 50.8% (+6.9 vs 3.6-Max-Preview) — long-horizon shell-and-tooling tasks.
  • Humanity's Last Exam: 38.1% (+9.2) — frontier reasoning under uncertainty.
  • CritPt: 13.4% (+9.7) — graduate-level critical thinking.
  • GDPval-AA: 1546 Elo (+42) — real-world economic-task value.

Third-party numbers floating in some launch round-ups (e.g. an "MMLU-Pro 83.8" figure) are unsourced and should not be cited until Alibaba or a credible neutral benchmarker publishes them. LM Arena positioning is unchanged from launch day: #13 overall (Elo ~1475), #7 Math, #10 Coding.

When will open weights and API pricing land?

We are not going to invent a date. Here's the state of play:

  • API pricing: live as of May 21, 2026 on OpenRouter — $2.50 / 1M input, $7.50 / 1M output for qwen/qwen3.7-max. Alibaba Cloud Model Studio is the upstream provider; an official DashScope price sheet has not yet been posted, so the OpenRouter rate is the working number. That positions 3.7-Max well below GPT-5.5 ($5/$15) and Claude Opus 4.7 (frontier-tier), and above DeepSeek V4-Pro.
  • Hugging Face open weights: huggingface.co/Qwen currently lists Qwen3.5 and Qwen3.6 family checkpoints only. There is no Qwen3.7 repo. The Qwen 3.6 pattern was: closed-source flagship Max, open-weight smaller variants released over the following weeks. Whether that pattern repeats for 3.7 is unannounced.
  • Quantized GGUF / MLX builds: these depend entirely on the upstream weights landing first. None exist today.

The honest forecast: if Alibaba repeats the 3.6 playbook, expect smaller open-weight Qwen 3.7 variants within roughly 2–6 weeks of the Apsara announcement, with the flagship Max staying API-only. If they don't repeat the pattern — e.g. if Max ships open weights too — that itself would be the story. Either way, this page refreshes in place; bookmark it.

Companion guide

For the full Qwen-family deep dive — architecture, the 3.5/3.6 history, deployment patterns, and how the family compares against the rest of the open-weights frontier — see our Qwen 3.5 Complete Guide (2026).

How does Qwen 3.7 fit the broader open-weights landscape?

Qwen 3.7 lands into an unusually crowded frontier. The flagship-tier conversation in May 2026 is dominated by GPT-5.5, Claude 4.7, Grok 4, Gemini 2.5 Pro, and on the open / semi-open side, Kimi K2.6, DeepSeek V4, and GLM 5.1. The neutral LM Arena positioning — Qwen3.7-Max-Preview at #13 overall — puts it firmly in the second tier of closed-flagship contenders, with stronger specialty showings in math and software.

The more interesting comparison for most Codersera readers is on the open-weights axis, where Qwen historically punches above its weight precisely because the smaller variants are genuinely deployable on consumer hardware. The current best-in-class comparison frame — not Qwen-specific but useful for triangulating where 3.7 might land — lives in our Kimi K2.6 vs DeepSeek V4 vs GLM 5.1 comparison. For the full landscape view across labs, including how Qwen is positioned versus the rest of the open ecosystem, see the 2026 open-source LLMs landscape.

One under-discussed angle: the Zhenwu M890 chip + Panjiu AL128 + ICN Switch 1.0 stack co-announced today positions Alibaba as one of the only labs other than the hyperscalers with credible first-party silicon and interconnect under its frontier models. That doesn't change today's leaderboard, but it changes the unit economics of how Alibaba can price Qwen 3.7 over a 12-month horizon — relevant if you're betting an architecture on it.

What this means for the China-vs-US frontier narrative

The flat read of the LM Arena board today is that the top-5 overall is still GPT-5.5, Claude 4.7, Grok 4, and Gemini 2.5 Pro variants, with Qwen 3.7 sitting at #13 overall in its preview debut. Read literally, that says the US labs still lead the frontier on general-purpose reasoning. Read more carefully, three nuances matter:

  • Specialty leaderboards tell a different story. Qwen3.7-Max-Preview's #7 Math and #9 Software/IT ranks are within touching distance of the closed-flagship leaders, suggesting the gap is narrower on technical domains than on general-purpose chat.
  • Open-weights tier is where Qwen has always won. The interesting comparison isn't Qwen 3.7 vs GPT-5.5 — it's whether the eventual open Qwen 3.7 variants beat Kimi K2.6, DeepSeek V4, and GLM 5.1 on the same hardware envelope. That comparison can't be made yet.
  • Vision is now genuinely competitive. Plus-Preview lifting Alibaba to #5 lab in Vision Arena globally is a real shift — the multimodal gap between Chinese and US labs has been closing steadily through 2025 and Qwen 3.7 may be the moment it effectively closes for practical use cases.

Should you wait for Qwen 3.7 or use 3.6 now?

Direct guidance, by use case:

  • You're building production agent flows today. Run Qwen 3.6 (or a 3.6-derived fine-tune) now. It's stable, weights are public, the ecosystem is mature. Revisit when 3.7 weights drop — that's when the actual builder-side switch happens.
  • You're doing API-based prototyping and want frontier-tier reasoning. Try Qwen3.7-Max-Preview on chat.qwen.ai today, but don't wire it into a customer-facing product until Model Studio publishes stable pricing and an SLA.
  • You're doing vision work (document parsing, UI agents, screenshot understanding). Qwen3.7-Plus-Preview is worth a real evaluation now. Vision Arena #5 lab globally is a meaningful jump and the preview is free.
  • You're self-hosting on a workstation or single H100/H200. 3.7 is not your model yet — there are no weights. Stay on running Qwen 3.6 locally until the HF org lists 3.7 checkpoints.
  • You're a research lab comparing models. Use lmarena.ai's blind comparison, not vendor-reported headline claims. Wait for ArtificialAnalysis's Index before drawing strong conclusions.

The pragmatic call right now: Qwen 3.6 is the model you ship with this month; Qwen 3.7 is the model you're evaluating for next month. That gap closes the day Hugging Face lights up.

How will you know when open weights drop?

Three signals to watch, in priority order:

  1. huggingface.co/Qwen — the canonical drop point. Star the org or watch for new repos with the Qwen3.7 prefix.
  2. qwen.ai/blog — the Qwen team posts a release note with model card, benchmarks, and the license at the moment of any HF release. The Apsara announcement post (qwen.ai/blog?id=qwen3.7) will likely be appended or superseded.
  3. @Alibaba_Qwen on X — weight drops always get an X announcement, usually accompanied by a quantized GGUF tip-off from the community within hours.

For pricing specifically, watch the Model Studio console at alibabacloud.com/product/modelstudio. We'll update the relevant section above the moment any of these three sources publishes.

If you're hiring vetted remote developers experienced with Qwen, agentic frameworks, or building on China-based foundation models, codersera.com/hire matches you with engineers who've shipped production systems on Qwen, DeepSeek, and Llama-family stacks. Tell us the stack, the use case, and the timeline; we'll bring you candidates inside a week.

FAQ

Is Qwen 3.7 released today?

Qwen3.7-Max was officially announced on May 20, 2026 at the Apsara Summit in Hangzhou. Two preview variants — Qwen3.7-Max-Preview (text) and Qwen3.7-Plus-Preview (vision) — have been live for free testing on chat.qwen.ai and lmarena.ai since approximately May 14. The stable Max model is rolling out on Alibaba Cloud Model Studio. Open weights have not been released. OpenRouter posted public pricing on May 21 at $2.50 in / $7.50 out per 1M tokens.

What's the difference between Qwen3.7-Max-Preview and Plus-Preview?

Max-Preview is the text-only variant with deep-thinking reasoning enabled by default; it's the one debuting at LM Arena #13 overall (#7 Math, #10 Coding). Plus-Preview is the multimodal variant that adds vision: image understanding, document parsing, screenshot reasoning. Plus-Preview ranks #16 on LM Arena's Vision Arena board and lifts Alibaba to the #5 lab globally in vision.

Can I download Qwen 3.7 weights from Hugging Face?

No. As of May 20, 2026, huggingface.co/Qwen lists Qwen3.5 and Qwen3.6 family checkpoints only. There is no Qwen3.7 repository on the official Qwen Hugging Face org. Any "Qwen 3.7" upload you find on a third-party HF account today is not authoritative. Watch the official org for the canonical release.

Qwen 3.7 vs Qwen 3.6 — which should you use today?

Use Qwen 3.6 for any production deployment you ship this month — the weights are public, the ecosystem is mature, quantized builds run on consumer hardware. Use Qwen3.7-Max-Preview for API prototyping and reasoning evaluations on chat.qwen.ai or lmarena.ai. Switch your production stack to 3.7 only after the open weights and a stable Model Studio price sheet ship.

What is the 35-hour autonomous run claim?

It is an Alibaba-reported headline from the Apsara Summit keynote: Qwen 3.7 sustained a 35-hour autonomous agent run without performance degradation, executing 1,000+ tool calls per session. The claim is vendor-reported and not yet third-party reproducible. Treat it as a signal of where Alibaba is investing engineering effort, not as a benchmark to cite directly.

What is the Zhenwu M890?

Zhenwu M890 is the AI accelerator chip Alibaba co-launched with Qwen 3.7 at Apsara 2026. Headline specs: 144 GB on-chip memory, 800 GB/s inter-chip bandwidth. It pairs with the Panjiu AL128 supernode and ICN Switch 1.0 interconnect. The package signals that Qwen 3.7-generation models are being co-designed with first-party silicon, but it doesn't change how developers consume Qwen 3.7 today.

How much will Qwen 3.7 API cost?

OpenRouter posted the first public rate on May 21, 2026: $2.50 / 1M input tokens, $7.50 / 1M output tokens for qwen/qwen3.7-max, routing through Alibaba Cloud Model Studio. That positions 3.7-Max well below GPT-class flagship pricing (GPT-5.5 sits around $5/$15) and above DeepSeek V4-Pro. Alibaba has not yet published an official price sheet of its own on the DashScope console — when it does, the OpenRouter markup-vs-source delta will become visible. Treat the $2.50/$7.50 figures as the working number for budgeting today.

When is the Qwen 3.7 Hugging Face release?

Not announced. If Alibaba repeats the Qwen 3.6 pattern, open-weight smaller variants typically follow the flagship announcement by roughly 2–6 weeks, with the closed Max staying API-only. Whether that pattern holds for 3.7 is genuinely unknown today. Watch huggingface.co/Qwen, qwen.ai/blog, and @Alibaba_Qwen on X — releases always show up on all three within hours of each other.