Qwen 3.7 vs Qwen 3.6: What's Actually Different (May 2026)

Qwen 3.6 is shipping with open weights today. Qwen 3.7-Max was announced May 20 with previews live but no weights yet. A grounded side-by-side.

Quick answer. Use Qwen 3.6 today for anything production or self-hosted: weights are on Hugging Face under Apache 2.0 and it runs locally via Ollama, llama.cpp, vLLM, and MLX. Qwen 3.7-Max was officially announced on May 20, 2026 with previews on chat.qwen.ai and lmarena.ai, but no open weights or API pricing yet.

Alibaba officially announced Qwen3.7-Max at the Apsara Cloud Summit in Hangzhou today, May 20, 2026. Two preview variants — Qwen3.7-Max-Preview (text, deep-thinking) and Qwen3.7-Plus-Preview (multimodal) — have been live on chat.qwen.ai and lmarena.ai since roughly May 14. API access via Alibaba Cloud Model Studio is rolling out.

Meanwhile, Qwen 3.6 is still shipping: 27B dense, 35B-A3B MoE, and the hosted 3.6 Plus tier are all on huggingface.co/Qwen, deployable today, and the basis for the vast majority of real Qwen workloads in production. The honest question for any team picking between them right now isn't "which model is better on paper" — it's "what can I actually use today, and for what?"

That's what this comparison is for. Every number is labeled vendor-reported or neutral. If a benchmark isn't published yet, we say so instead of guessing. For broader context on the Qwen lineage and where 3.6/3.7 fit in, see our Qwen 3.7 release-date and what's new tracker, which we're updating in lockstep with this piece.

What changed between Qwen 3.6 and Qwen 3.7?

At a one-line level: Qwen 3.6 is the open-weight workhorse Alibaba shipped through the back half of 2025 and early 2026. Qwen 3.7 is the next-generation flagship, currently a closed (or at least not-yet-open) preview that Alibaba is positioning around long-horizon agentic workloads and a stronger multimodal stack.

The headline shifts Alibaba is talking about today:

  • Long-horizon agentic capability. Alibaba is publicly claiming Qwen3.7-Max can sustain a 35-hour autonomous run without measurable degradation and chain 1,000+ tool calls in a single session (Alibaba-reported, no third-party reproduction yet).
  • Deeper agent-stack integration. Qwen 3.7 is being marketed as optimized for OpenClaw, Hermes, Claude Code, Qwen Paw, and Qoder — Alibaba is leaning into the developer-agent ecosystem rather than just shipping a chat model.
  • Multimodal jump. Qwen3.7-Plus-Preview pushed Alibaba to the #5 lab in Vision Arena (LM Arena, neutral) within a week of the soft launch.
  • Neutral arena positioning. Qwen3.7-Max-Preview is ranked #13 overall (Elo ~1,475), #7 Math, #9 Expert Prompts, #9 Software/IT, #10 Coding on LM Arena as of May 14–20, 2026.

What hasn't changed: Qwen 3.6 is still the model you can actually pull from Hugging Face and run on your own hardware right now. That asymmetry — today vs. soon — drives most of the practical decisions below. If your goal is to run Qwen on a single workstation, our guide to running Qwen 3.6 locally still applies as-written; a 3.7 self-host guide will follow weights, not announcements.

What's the release and availability status of each?

This is the most load-bearing column of the entire comparison — it's what should drive your decision today.

DimensionQwen 3.6 (today)Qwen 3.7 (today)
StatusReleased; weights on Hugging Face; openMax officially announced May 20, 2026; previews live; weights not on HF yet
AccessLocal download + hosted Plus tierAPI via Alibaba Cloud Model Studio (rolling out) + free preview on chat.qwen.ai and lmarena.ai
LicenseApache 2.0 (open variants — verify per model card)Unknown; flagship pattern is closed for Max, open variants TBD
Sizes27B dense, 35B-A3B MoE (~3B active), 3.6 Plus (hosted)Max (params not disclosed), Max-Preview, Plus-Preview (multimodal)
VisionLimitedPlus-Preview is the centerpiece; Vision Arena #5 lab
Local self-hostYes — Ollama, llama.cpp, vLLM, MLXNo — no weights yet
Cost to tryFree locally (your hardware) or hosted Plus pricingFree in chat.qwen.ai and lmarena.ai; API pricing not announced

The shape to internalize: Qwen 3.6 is a model. Qwen 3.7 is currently a product announcement plus two preview endpoints. Both are real, but you can only build against one of them right now without a fallback plan.

How do they compare on benchmarks?

Benchmark coverage for Qwen 3.7 is intentionally thin in this article, because most of the headline benchmarks the LLM community looks at — SWE-bench Verified, GPQA, AIME, LiveCodeBench, Terminal-Bench — have not been published by Alibaba for Qwen 3.7 as of today. ArtificialAnalysis hasn't posted an Intelligence Index entry for it yet either. We're not going to invent numbers.

What we do have is neutral arena data:

LM Arena category (May 14–20, 2026)Qwen3.7-Max-PreviewQwen3.7-Plus-Preview
Overall#13 (Elo ~1,475)#16
Math#7
Expert Prompts#9
Software / IT#9
Coding#10
Vision Arena (lab rank)Pushed Alibaba to #5 lab

For Qwen 3.6, the most-cited vendor claim is that the 27B-dense variant beats Qwen 3.5-397B-A17B on coding (Qwen-reported). That's a meaningful generational compression — a 27B dense model edging a 397B MoE on coding tasks — but it is a vendor claim, not a neutral measurement. Treat it as directional.

The honest synthesis: 3.7-Max-Preview is competitive with frontier models on neutral arena ranking, especially on math. Whether that translates to your specific evaluation harness is the entire question, and you'll need to test it yourself on chat.qwen.ai or wait for the API to mature before betting a workload on it.

How do they compare on agentic and long-horizon tasks?

Agentic workloads are the area Alibaba is leaning into hardest with Qwen 3.7. The headline claims:

  • 35-hour sustained autonomous run without degradation (Alibaba-reported). No third-party reproduction has been published yet.
  • 1,000+ tool calls in a single session (Alibaba-reported). Again, vendor claim — not independently verified.
  • Deep optimization for popular agent stacks including OpenClaw, Hermes, Claude Code, Qwen Paw, and Qoder (Alibaba marketing).

For Qwen 3.6, the agentic story is less aggressive but materially more tested: it works fine inside Claude Code, Aider, Cline, OpenCode, Roo Code, and most agent harnesses that accept an OpenAI-compatible endpoint. Long-horizon runs (multi-hour, hundreds of tool calls) are not its marketing pitch, but it gets used that way in production today.

If you're evaluating an agentic preview today, 3.7-Max-Preview on chat.qwen.ai is worth the hour. If you're shipping an agentic workload to customers this quarter, 3.6 is the only one you can deploy on your own infra. That's not a knock on 3.7 — it's the literal status.

How do they compare on vision and multimodal?

Vision is the cleanest 3.7 win on the board. Qwen3.7-Plus-Preview pushed Alibaba to #5 lab in Vision Arena (LM Arena, neutral) within a week of the soft launch — a meaningful jump for an Asia-headquartered lab on a neutral leaderboard dominated by US frontier labs.

Qwen 3.6's vision story is comparatively limited. The dense and MoE open-weight variants are primarily text-focused; multimodal use cases were not the generation's headline. If your workload is image understanding, OCR-at-scale, chart/diagram reasoning, or video frame analysis, 3.7-Plus-Preview is the experimental answer — with the caveat that it's preview-only and the production API and pricing are pending.

For an honest comparison against open-weight vision models you can self-host (Qwen-VL variants, Llama 4 multimodal, Gemma 4 vision), see our broader open-source LLMs landscape.

Which can you self-host today?

Qwen 3.6: yes. Pull weights from Hugging Face, run via Ollama for a one-command local server, llama.cpp for CPU-quantized deployments, vLLM for serving throughput, or MLX on Apple Silicon. Apache 2.0 on the open variants gives you commercial-friendly licensing — verify the specific model card before betting a product on it.

Qwen 3.7: no. As of May 20, 2026, huggingface.co/Qwen lists Qwen 3.5 and Qwen 3.6 variants; there is no Qwen 3.7 model card. Alibaba has not committed to a release date for open weights. The historical pattern for Qwen flagships has been closed-Max, open smaller variants — but "pattern" is not a release schedule, and we won't predict one.

Practical implication: every production workload that requires on-prem deployment, air-gapped inference, fine-tuning, or licensing certainty needs to use Qwen 3.6 (or an alternative open-weight model) today. There is no path to self-hosted Qwen 3.7 right now.

Companion guide

For the full Qwen lineage — architecture, sizes, benchmarks, and how 3.5/3.6/3.7 fit together — see our Qwen 3.5 complete guide (2026).

Pick X if — the decision matrix

Your situationPickWhy
Production workload deploying this quarterQwen 3.6Weights are out, license is clear, the deployment stack (Ollama / vLLM / llama.cpp) is mature
Evaluating an agentic preview, not shippingQwen 3.7-Max-Previewchat.qwen.ai is free; the 35h-run / 1000-tool-call claims are worth a real test
Vision / multimodal use caseQwen 3.7-Plus-Preview#5 lab in Vision Arena; 3.6 doesn't have a comparable open vision story
On-prem / air-gapped / fine-tunableQwen 3.6Only one with open weights; 3.7 weights are not available
No-code team that just needs a chat UIQwen 3.7-Max-PreviewFree on chat.qwen.ai; nothing to install
Cost-sensitive at scaleQwen 3.6Self-hosted on your hardware = no per-token cost; 3.7 API pricing isn't published

How do they compare on cost?

Cost is harder than usual to compare right now because Qwen 3.7 API pricing has not been announced. What we can say:

  • Qwen 3.6 self-hosted costs whatever your hardware costs — commonly a single H100 / 2x4090 / single M3 Max box at the small end, scaling to vLLM clusters for higher throughput. No per-token fee.
  • Qwen 3.6 Plus hosted goes through Alibaba Cloud Model Studio at published rates (verify in Model Studio — rates change).
  • Qwen 3.7-Max-Preview / Plus-Preview are free to use today on chat.qwen.ai and lmarena.ai with reasonable rate limits.
  • Qwen 3.7 API pricing: not announced. Treat any number you see in a third-party article as speculation until it lands on the Alibaba Cloud price sheet.

If your cost model needs predictable per-token economics today, the answer is 3.6 (self-hosted or Model Studio). If you can absorb the uncertainty of preview-tier pricing for a few weeks of evaluation, the answer is free 3.7 via chat.qwen.ai.

Which should you use today?

For everyone shipping anything: Qwen 3.6. Open weights, mature deployment story, predictable cost, no fundamental risk. The 27B dense model is genuinely competitive on coding, the 35B-A3B MoE is efficient at serving throughput, and the Plus hosted tier covers teams that don't want to operate inference.

For teams evaluating where Alibaba is going next: spend a week on Qwen 3.7-Max-Preview via chat.qwen.ai. Specifically stress-test the agentic claims — the 35-hour-run and 1000-tool-call numbers are vendor-reported and the most interesting differentiation in the entire 3.7 announcement. If they hold up on your workload, you'll want to be first in line when the API and weights mature.

For multimodal use cases specifically: Qwen 3.7-Plus-Preview is worth a serious look, and the Vision Arena #5 lab result is the strongest neutral signal in the launch. But again — preview, no production SLA, no published pricing.

If you're hiring vetted remote developers who actually ship LLM-backed agents, evaluators, or inference infra — not just prompt-tweakers — codersera.com/hire places senior engineers with Qwen, vLLM, llama.cpp, and agent-framework experience on your team in weeks, not months.

FAQ

Is Qwen 3.7 better than Qwen 3.6?

On neutral arena ranking, yes — Qwen3.7-Max-Preview sits at #13 overall and #7 Math on LM Arena (May 14–20, 2026), which is materially ahead of the published Qwen 3.6 positioning. But "better" depends on whether you can use it: 3.7 has no open weights and no published API pricing, while 3.6 is shipping with Apache 2.0 weights you can deploy today.

Can I download Qwen 3.7 weights?

Not as of May 20, 2026. The Hugging Face Qwen organization lists 3.5 and 3.6 variants only. Alibaba has not committed to a release date for Qwen 3.7 open weights.

Should I migrate from Qwen 3.6 to Qwen 3.7?

Not yet. There's nothing to migrate to in a self-hosted sense, and the Model Studio API is still rolling out without published pricing. Stay on 3.6 for production, run a parallel evaluation on 3.7-Max-Preview, and reassess when weights or stable pricing arrive.

Qwen 3.7 vs Qwen 3.6 for coding?

Qwen3.7-Max-Preview ranks #10 on LM Arena Coding (neutral). Qwen 3.6's 27B dense is vendor-reported to beat Qwen 3.5-397B-A17B on coding. Both are credible. If you can self-host, 3.6 is the practical choice; if you're willing to use chat.qwen.ai for evaluation, 3.7 may edge it.

Qwen 3.7 vs Qwen 3.6 for vision?

Qwen 3.7 wins decisively here. Qwen3.7-Plus-Preview pushed Alibaba to #5 lab in Vision Arena (LM Arena, neutral). Qwen 3.6's open-weight variants are primarily text-focused.

Will Qwen 3.7 be open weights like Qwen 3.6?

Unknown. Alibaba's historical pattern for flagship Qwen releases has been closed-Max, open smaller variants — but that is a pattern, not a commitment. As of today, no Qwen 3.7 weights are published on Hugging Face.

What's the 35-hour autonomous run?

It's Alibaba's headline agentic claim for Qwen3.7-Max: the model can sustain a 35-hour autonomous run without measurable degradation, and chain over 1,000 tool calls in a single session. Both numbers are vendor-reported — no third-party reproduction has been published yet.

When will Qwen 3.7 hit Hugging Face?

No announced date. The Apsara Cloud Summit launch on May 20, 2026 focused on Qwen3.7-Max and the preview variants — open-weight timing was not part of the announcement. Watch huggingface.co/Qwen and the Qwen blog for the model card.