Qwen

Qwen 3.7 vs Qwen 3.6: What's Actually Different (May 2026)

Qwen 3.6 is shipping with open weights today. Qwen 3.7-Max was announced May 20 with previews live but no weights yet. A grounded side-by-side.

Published 20 May 2026 • Updated 17 Jun 2026 • 11 min read

Quick answer. Use Qwen 3.6 today for anything production or self-hosted: weights are on Hugging Face under Apache 2.0 and it runs locally via Ollama, llama.cpp, vLLM, and MLX. Qwen 3.7-Max was officially announced on May 20, 2026 with previews on chat.qwen.ai and lmarena.ai, but no open weights or API pricing yet.

Last updated: June 17, 2026.

Alibaba officially announced Qwen3.7-Max at the Apsara Cloud Summit in Hangzhou today, May 20, 2026. Two preview variants — Qwen3.7-Max-Preview (text, deep-thinking) and Qwen3.7-Plus-Preview (multimodal) — have been live on chat.qwen.ai and lmarena.ai since roughly May 14. API access via Alibaba Cloud Model Studio is rolling out.

Meanwhile, Qwen 3.6 is still shipping: 27B dense, 35B-A3B MoE, and the hosted 3.6 Plus tier are all on huggingface.co/Qwen, deployable today, and the basis for the vast majority of real Qwen workloads in production. The honest question for any team picking between them right now isn't "which model is better on paper" — it's "what can I actually use today, and for what?"

That's what this comparison is for. Every number is labeled vendor-reported or neutral. If a benchmark isn't published yet, we say so instead of guessing. For broader context on the Qwen lineage and where 3.6/3.7 fit in, see our Qwen 3.7 release-date and what's new tracker, which we're updating in lockstep with this piece.

What changed between Qwen 3.6 and Qwen 3.7?

At a one-line level: Qwen 3.6 is the open-weight workhorse Alibaba shipped through the back half of 2025 and early 2026. Qwen 3.7 is the next-generation flagship, currently a closed (or at least not-yet-open) preview that Alibaba is positioning around long-horizon agentic workloads and a stronger multimodal stack.

The headline shifts Alibaba is talking about today:

Long-horizon agentic capability. Alibaba is publicly claiming Qwen3.7-Max can sustain a 35-hour autonomous run without measurable degradation and chain 1,000+ tool calls in a single session (Alibaba-reported, no third-party reproduction yet).
Deeper agent-stack integration. Qwen 3.7 is being marketed as optimized for OpenClaw, Hermes, Claude Code, Qwen Paw, and Qoder — Alibaba is leaning into the developer-agent ecosystem rather than just shipping a chat model.
Multimodal jump. Qwen3.7-Plus-Preview pushed Alibaba to the #5 lab in Vision Arena (LM Arena, neutral) within a week of the soft launch.
Neutral arena positioning. Qwen3.7-Max-Preview is ranked #13 overall (Elo ~1,475), #7 Math, #9 Expert Prompts, #9 Software/IT, #10 Coding on LM Arena as of May 14–20, 2026.

What hasn't changed: Qwen 3.6 is still the model you can actually pull from Hugging Face and run on your own hardware right now. That asymmetry — today vs. soon — drives most of the practical decisions below. If your goal is to run Qwen on a single workstation, our guide to running Qwen 3.6 locally still applies as-written, and our how-to-run-Qwen-3.7-locally tracker goes live the moment weights land.

What changed in the architecture

The most interesting technical shift between 3.6 and 3.7 isn't parameter count — it's the move toward a hybrid attention stack built around Gated DeltaNet, first surfaced in the Qwen3-Next preview line and now extended into the 3.7 family. Gated DeltaNet is a linear-attention variant that uses a delta-rule update with a data-dependent gate, giving the model long-context recall closer to softmax attention without the quadratic compute cost. Qwen 3.7 pairs blocks of Gated DeltaNet with periodic softmax-attention layers — a hybrid pattern that's become a quiet consensus across the frontier (Mamba-style hybrids, Jamba, MiniMax-01) for stretching the effective context window while keeping inference throughput tractable.

If you want to actually see the math instead of read about it, Sebastian Raschka (@rasbt) shipped a from-scratch Gated DeltaNet implementation as part of his LLMs-from-scratch repo — annotated PyTorch you can step through line by line. That's the cleanest path to internalizing the delta-rule + gate intuition before deciding how aggressively to bet on hybrid-attention models in your own stack.

What this means practically: Qwen 3.7 should hold up better than 3.6 on tasks with genuinely long contexts (multi-file repo reasoning, long-horizon agent traces, document comprehension over hundreds of pages) at comparable serving cost. The 35-hour autonomous run claim in Alibaba's launch material is downstream of this architectural shift, not an unrelated training trick — sustained agentic behavior over very long token windows is exactly what hybrid attention was designed to enable.

What's the release and availability status of each?

This is the most load-bearing column of the entire comparison — it's what should drive your decision today.

Dimension	Qwen 3.6 (today)	Qwen 3.7 (today)
Status	Released; weights on Hugging Face; open	Max officially announced May 20, 2026; previews live; weights not on HF yet
Access	Local download + hosted Plus tier	API via Alibaba Cloud Model Studio (rolling out) + free preview on chat.qwen.ai and lmarena.ai
License	Apache 2.0 (open variants — verify per model card)	Unknown; flagship pattern is closed for Max, open variants TBD
Sizes	27B dense, 35B-A3B MoE (~3B active), 3.6 Plus (hosted)	Max (params not disclosed), Max-Preview, Plus-Preview (multimodal)
Vision	Limited	Plus-Preview is the centerpiece; Vision Arena #5 lab
Local self-host	Yes — Ollama, llama.cpp, vLLM, MLX	No — no weights yet
Cost to try	Free locally (your hardware) or hosted Plus pricing	Free in chat.qwen.ai and lmarena.ai; OpenRouter API at $2.50/$7.50 per 1M (May 21)

The shape to internalize: Qwen 3.6 is a model. Qwen 3.7 is currently a product announcement plus two preview endpoints. Both are real, but you can only build against one of them right now without a fallback plan.

How do they compare on benchmarks?

Benchmark coverage for Qwen 3.7 is intentionally thin in this article, because most of the headline benchmarks the LLM community looks at — SWE-bench Verified, GPQA, AIME, LiveCodeBench, Terminal-Bench — have not been published by Alibaba for Qwen 3.7 as of today. ArtificialAnalysis hasn't posted an Intelligence Index entry for it yet either. We're not going to invent numbers.

What we do have is neutral arena data:

LM Arena category (May 14–20, 2026)	Qwen3.7-Max-Preview	Qwen3.7-Plus-Preview
Overall	#13 (Elo ~1,475)	#16
Math	#7	—
Expert Prompts	#9	—
Software / IT	#9	—
Coding	#10	—
Vision Arena (lab rank)	—	Pushed Alibaba to #5 lab
Qwen 3.7-Max (full, post-preview)	Reported (@HaaaaaaydenH): Elo ~1,498 overall, edges Plus-Preview on math + coding once the preview tag drops. Treat as community-reported until LM Arena publishes the official row.

For Qwen 3.6, the most-cited vendor claim is that the 27B-dense variant beats Qwen 3.5-397B-A17B on coding (Qwen-reported). That's a meaningful generational compression — a 27B dense model edging a 397B MoE on coding tasks — but it is a vendor claim, not a neutral measurement. Treat it as directional.

The honest synthesis: 3.7-Max-Preview is competitive with frontier models on neutral arena ranking, especially on math. Whether that translates to your specific evaluation harness is the entire question, and you'll need to test it yourself on chat.qwen.ai or wait for the API to mature before betting a workload on it.

How do they compare on agentic and long-horizon tasks?

Agentic workloads are the area Alibaba is leaning into hardest with Qwen 3.7. The headline claims:

35-hour sustained autonomous run without degradation (Alibaba-reported). No third-party reproduction has been published yet.
1,000+ tool calls in a single session (Alibaba-reported). Again, vendor claim — not independently verified.
Deep optimization for popular agent stacks including OpenClaw, Hermes, Claude Code, Qwen Paw, and Qoder (Alibaba marketing).

For Qwen 3.6, the agentic story is less aggressive but materially more tested: it works fine inside Claude Code, Aider, Cline, OpenCode, Roo Code, and most agent harnesses that accept an OpenAI-compatible endpoint. Long-horizon runs (multi-hour, hundreds of tool calls) are not its marketing pitch, but it gets used that way in production today.

If you're evaluating an agentic preview today, 3.7-Max-Preview on chat.qwen.ai is worth the hour. If you're shipping an agentic workload to customers this quarter, 3.6 is the only one you can deploy on your own infra. That's not a knock on 3.7 — it's the literal status.

How do they compare on vision and multimodal?

Vision is the cleanest 3.7 win on the board. Qwen3.7-Plus-Preview pushed Alibaba to #5 lab in Vision Arena (LM Arena, neutral) within a week of the soft launch — a meaningful jump for an Asia-headquartered lab on a neutral leaderboard dominated by US frontier labs.

Qwen 3.6's vision story is comparatively limited. The dense and MoE open-weight variants are primarily text-focused; multimodal use cases were not the generation's headline. If your workload is image understanding, OCR-at-scale, chart/diagram reasoning, or video frame analysis, 3.7-Plus-Preview is the experimental answer — with the caveat that it's preview-only and the production API and pricing are pending.

For an honest comparison against open-weight vision models you can self-host (Qwen-VL variants, Llama 4 multimodal, Gemma 4 vision), see our broader open-source LLMs landscape.

Which can you self-host today?

Qwen 3.6: yes. Pull weights from Hugging Face, run via Ollama for a one-command local server, llama.cpp for CPU-quantized deployments, vLLM for serving throughput, or MLX on Apple Silicon. Apache 2.0 on the open variants gives you commercial-friendly licensing — verify the specific model card before betting a product on it.

Qwen 3.7: no. As of May 20, 2026, huggingface.co/Qwen lists Qwen 3.5 and Qwen 3.6 variants; there is no Qwen 3.7 model card. Alibaba has not committed to a release date for open weights. The historical pattern for Qwen flagships has been closed-Max, open smaller variants — but "pattern" is not a release schedule, and we won't predict one.

Practical implication: every production workload that requires on-prem deployment, air-gapped inference, fine-tuning, or licensing certainty needs to use Qwen 3.6 (or an alternative open-weight model) today. There is no path to self-hosted Qwen 3.7 right now.

Companion guide

For the full Qwen lineage — architecture, sizes, benchmarks, and how 3.5/3.6/3.7 fit together — see our Qwen 3.5 complete guide (2026).

Pick X if — the decision matrix

Your situation	Pick	Why
Production workload deploying this quarter	Qwen 3.6	Weights are out, license is clear, the deployment stack (Ollama / vLLM / llama.cpp) is mature
Evaluating an agentic preview, not shipping	Qwen 3.7-Max-Preview	chat.qwen.ai is free; the 35h-run / 1000-tool-call claims are worth a real test
Vision / multimodal use case	Qwen 3.7-Plus-Preview	#5 lab in Vision Arena; 3.6 doesn't have a comparable open vision story
On-prem / air-gapped / fine-tunable	Qwen 3.6	Only one with open weights; 3.7 weights are not available
No-code team that just needs a chat UI	Qwen 3.7-Max-Preview	Free on chat.qwen.ai; nothing to install
Cost-sensitive at scale	Qwen 3.6	Self-hosted on your hardware = no per-token cost; 3.7 API pricing isn't published

How do they compare on cost?

Cost picture, updated May 22, 2026: OpenRouter began routing qwen/qwen3.7-max on May 21 at $2.50 in / $7.50 out per 1M tokens (Alibaba Cloud Model Studio is the upstream). What that means for budgeting:

Qwen 3.6 self-hosted costs whatever your hardware costs — commonly a single H100 / 2x4090 / single M3 Max box at the small end, scaling to vLLM clusters for higher throughput. No per-token fee.
Qwen 3.6 Plus hosted goes through Alibaba Cloud Model Studio at published rates (verify in Model Studio — rates change).
Qwen 3.7-Max-Preview / Plus-Preview are free to use today on chat.qwen.ai and lmarena.ai with reasonable rate limits.
Qwen 3.7-Max API: $2.50 / 1M input, $7.50 / 1M output via OpenRouter (live May 21, 2026). Routing through Alibaba Cloud Model Studio; an official DashScope price sheet has not yet posted, so the OpenRouter rate is the working number. That positions 3.7-Max well below GPT-5.5 ($5/$15) and Claude Opus 4.7 (frontier-tier), and above DeepSeek V4-Pro.

If your cost model needs predictable per-token economics today, the answer is 3.6 (self-hosted or Model Studio). If you can absorb the uncertainty of preview-tier pricing for a few weeks of evaluation, the answer is free 3.7 via chat.qwen.ai.

Which should you use today?

For everyone shipping anything: Qwen 3.6. Open weights, mature deployment story, predictable cost, no fundamental risk. For where it sits relative to Llama 4, DeepSeek V4, and the rest of the open field, see our open-source LLMs landscape (2026). The 27B dense model is genuinely competitive on coding, the 35B-A3B MoE is efficient at serving throughput, and the Plus hosted tier covers teams that don't want to operate inference.

For teams evaluating where Alibaba is going next: spend a week on Qwen 3.7-Max-Preview via chat.qwen.ai. Specifically stress-test the agentic claims — the 35-hour-run and 1000-tool-call numbers are vendor-reported and the most interesting differentiation in the entire 3.7 announcement. If they hold up on your workload, you'll want to be first in line when the API and weights mature.

For multimodal use cases specifically: Qwen 3.7-Plus-Preview is worth a serious look, and the Vision Arena #5 lab result is the strongest neutral signal in the launch. But again — preview, no production SLA, no published pricing.

If you're hiring vetted remote developers who actually ship LLM-backed agents, evaluators, or inference infra — not just prompt-tweakers — codersera.com/hire places senior engineers with Qwen, vLLM, llama.cpp, and agent-framework experience on your team in weeks, not months.

FAQ

Is Qwen 3.7 better than Qwen 3.6?

On neutral arena ranking, yes — Qwen3.7-Max-Preview sits at #13 overall and #7 Math on LM Arena (May 14–20, 2026), which is materially ahead of the published Qwen 3.6 positioning. But "better" depends on whether you can use it: 3.7 has no open weights and no published API pricing, while 3.6 is shipping with Apache 2.0 weights you can deploy today.

Can I download Qwen 3.7 weights?

Not as of May 20, 2026. The Hugging Face Qwen organization lists 3.5 and 3.6 variants only. Alibaba has not committed to a release date for Qwen 3.7 open weights.

Should I migrate from Qwen 3.6 to Qwen 3.7?

Not yet for self-hosted production — there's still nothing to migrate to: no Qwen 3.7 weights on Hugging Face. For hosted, OpenRouter began routing qwen/qwen3.7-max on May 21, 2026 at $2.50/$7.50 per 1M tokens, so the cost story is now concrete. The right move for most teams: stay on 3.6 (self-hosted) for production, run a parallel evaluation on 3.7-Max via OpenRouter, and reassess once Alibaba publishes its own DashScope price sheet (which will reveal the OpenRouter markup) and once 3.7 open weights land.

Qwen 3.7 vs Qwen 3.6 for coding?

Qwen3.7-Max-Preview ranks #10 on LM Arena Coding (neutral). Qwen 3.6's 27B dense is vendor-reported to beat Qwen 3.5-397B-A17B on coding. Both are credible. If you can self-host, 3.6 is the practical choice; if you're willing to use chat.qwen.ai for evaluation, 3.7 may edge it.

Qwen 3.7 vs Qwen 3.6 for vision?

Qwen 3.7 wins decisively here. Qwen3.7-Plus-Preview pushed Alibaba to #5 lab in Vision Arena (LM Arena, neutral). Qwen 3.6's open-weight variants are primarily text-focused.

Will Qwen 3.7 be open weights like Qwen 3.6?

Unknown. Alibaba's historical pattern for flagship Qwen releases has been closed-Max, open smaller variants — but that is a pattern, not a commitment. As of today, no Qwen 3.7 weights are published on Hugging Face.

What's the 35-hour autonomous run?

It's Alibaba's headline agentic claim for Qwen3.7-Max: the model can sustain a 35-hour autonomous run without measurable degradation, and chain over 1,000 tool calls in a single session. Both numbers are vendor-reported — no third-party reproduction has been published yet.

When will Qwen 3.7 hit Hugging Face?

No announced date. The Apsara Cloud Summit launch on May 20, 2026 focused on Qwen3.7-Max and the preview variants — open-weight timing was not part of the announcement. Watch huggingface.co/Qwen and the Qwen blog for the model card.