Qwen 3.7 Locally: What You Can Actually Run (2026)

Quick answer. As of May 22, 2026, Qwen 3.7 weights are not on Hugging Face — you cannot run it locally yet. The only honest options today: free previews on chat.qwen.ai or lmarena.ai, or paid API access via OpenRouter (live May 21, 2026 at $2.50 in / $7.50 out per 1M tokens), routing through Alibaba Cloud Model Studio. To run a Qwen model locally right now, use Qwen 3.6 — full setup below.

Last updated: June 17, 2026 — added Qwen 3.7-Max benchmark notes vs Opus 4.7 / GPT-5.5 and the Agent-variant details from the Qwen team.

If you searched "how to run Qwen 3.7 locally," you probably saw a wave of SEO posts confidently publishing setup commands, VRAM tables, and Ollama snippets for it. Treat them with serious skepticism. As of today, the weights they pretend to install do not exist on Hugging Face. This page is the honest version: what's actually possible, what's not, and exactly what to run today.

Can you actually run Qwen 3.7 locally today?

No. Verified May 22, 2026:

The official Qwen organization on Hugging Face — huggingface.co/Qwen, the authoritative source for Qwen releases — has zero Qwen 3.7 repositories. The newest official model is Qwen 3.6 (variants up to Qwen3.6-35B-A3B-FP8).
Direct probes of every plausible model card — Qwen/Qwen3.7-Max, Qwen/Qwen3.7-Max-Preview, Qwen/Qwen3.7-Plus-Preview, Qwen/Qwen3.7-27B, Qwen/Qwen3.7-35B-A3B — return HTTP 401 (HF's response for a repo that doesn't exist).
The Ollama library — ollama.com/library/qwen3.7 and qwen3.7-max — returns 404. No GGUF, no quants, no ollama run target.
No llama.cpp conversion, no MLX build, no vLLM weights URL — because there are no weights to convert.

Alibaba officially announced Qwen 3.7-Max earlier today at the Apsara Cloud Summit in Hangzhou, and two preview variants (Max-Preview and Plus-Preview) have been live for testing since around May 14 — but that's a hosted preview, not a downloadable model. Anyone publishing local-setup steps for Qwen 3.7 right now is either guessing or copying from a model that doesn't exist.

What are the real ways to use Qwen 3.7 right now?

Three legitimate paths, in order of cost and friction:

Path	What you get	Cost	How
chat.qwen.ai	Qwen3.7-Max-Preview (text, deep-thinking on) and Qwen3.7-Plus-Preview (multimodal)	Free	Sign in, pick the model from the dropdown
lmarena.ai	Both previews via direct chat or model-vs-model battles	Free	Pick `qwen-3.7-max-preview` / `qwen-3.7-plus-preview` in Direct Chat
Alibaba Cloud Model Studio	API access to Qwen 3.7-Max	$2.50 in / $7.50 out per 1M tokens (OpenRouter, live May 21)	Apply via OpenRouter (instant) or Model Studio

For a hands-on read of what the previews actually feel like — strengths, weaknesses, where Max vs Plus matters — see our companion piece on what just launched with Qwen 3.7 (LM Arena ranks, the Alibaba-reported 35-hour autonomous-run claim, the variant matrix). None of these paths put weights on your machine — that's important to be clear about.

What should you run locally today instead?

Qwen 3.6. It is open-weights on Hugging Face, ships under Apache 2.0 for the core variants, and is a strong model in its own right — Alibaba reports the 27B dense beats the previous-generation Qwen 3.5-397B-A17B on coding (Alibaba-reported, validate on your own tasks).

The lineup, all live on huggingface.co/Qwen today:

Qwen 3.6-27B — dense, ~27B params. Single-GPU local inference; predictable latency. Best default for most setups.
Qwen 3.6-35B-A3B — Mixture-of-Experts, 35B total / ~3B active. Higher throughput at lower active compute.
Qwen 3.6 Plus — hosted/larger tier (not open-weight).

We have a full, command-by-command walkthrough — Ollama, llama.cpp, vLLM, MLX on Apple Silicon, VRAM tables per quant, real tok/s ranges — at how to run Qwen 3.6 locally (27B dense vs 35B MoE). That is the practical action today. Qwen 3.7 is not.

If the question on your mind is "is Qwen 3.7 worth waiting for, or should I deploy 3.6 now?" — that decision is the subject of the Qwen 3.7 vs Qwen 3.6 comparison. The short version: nothing is locally runnable for 3.7, so even if 3.7 is better, the only thing you can actually ship locally today is 3.6.

Qwen 3.7-Max benchmarks vs Opus 4.7 / GPT-5.5

Independent testing from third parties is starting to land alongside the official Alibaba numbers. The most-circulated comparison so far comes from @HaaaaaaydenH, who ran a three-way frontier-model evaluation:

"Qwen 3.7-max beats Opus 4.7 and GPT-5.5 — we tested three frontier models on a real engineering workload and Qwen came out ahead."

— @HaaaaaaydenH on X (~4.5k likes)

Two caveats before you act on this: (a) it is a single workload from a single tester, not a benchmark suite — treat it as a signal worth verifying, not a verdict; (b) Qwen 3.7-Max is the closed flagship served via OpenRouter and Alibaba Cloud Model Studio, so even if the head-to-head holds up, it doesn't change the local-running picture. The benchmark matters when you're choosing which hosted API to wire into production. It doesn't move the needle on which weights are sitting on Hugging Face today.

For the hardware side of this decision — what a hosted Qwen 3.7-Max call actually costs vs. running Qwen 3.6 on your own silicon, and where the break-even sits — see our local LLM hardware showdown (June 2026).

Running the Agent variant

Alibaba's @ChujieZheng — a core researcher on the Qwen team — framed the 3.7-Max release explicitly around agent workloads:

"📣 Meet Qwen3.7-Max — our latest flagship, made for the Agent."

— @ChujieZheng on X (~5k likes)

That positioning matches the 35-hour autonomous-run claim Alibaba demoed at the Apsara Cloud Summit — long-horizon tool-use and self-correction, not single-shot Q&A. Practically, if you're evaluating Qwen 3.7-Max as an agent backbone, the hosted preview on chat.qwen.ai (deep-thinking on) and the OpenRouter endpoint are the only two surfaces where the agent behavior is actually exercised end-to-end. There is no "Agent variant" weights file to download — agent capability is the same Max model, prompted and scaffolded for tool use.

Why are so many pages pretending you can run Qwen 3.7 locally?

The same reason you saw "Qwen 3.7 released in April" posts before today's actual announcement: SEO incentives reward speed over accuracy, and "how to run X locally" is a high-volume query that auto-publishing farms target the moment a model is announced — whether or not the weights exist. Common tells:

Confident parameter counts, context-window lengths, or VRAM tables for Qwen 3.7. Alibaba has not published any of those.
ollama run qwen3.7 snippets. The Ollama library does not contain Qwen 3.7. The command will fail.
Specific quantized file sizes (e.g. "Q4_K_M is 17.8 GB"). There is no quantized file because there is no source model.
Benchmark tables with SWE-bench Verified / GPQA / AIME numbers for Qwen 3.7. Alibaba has not published any of those either. The only neutral data point is LM Arena ranks for the previews.

If you see those signals, the page is filling in blanks. This page is the version that doesn't.

Companion guide

For the full Qwen family — capabilities, variants, benchmarks, and how to pick — see our Qwen complete guide for 2026.

How will you know when Qwen 3.7 is actually runnable locally?

Three concrete trigger conditions — any one of these flipping is the moment local-setup steps become real, not speculative:

A Qwen/Qwen3.7-* repository appears on huggingface.co/Qwen with a model card carrying real weights — params, context length, license, files. That is the definitive signal.
An ollama run qwen3.7 entry appears at ollama.com/library/qwen3.7. Ollama tracks first-party Qwen releases; an entry there means GGUF quants exist and a one-line install works.
An official qwen.ai/blog post announces open-weight Qwen 3.7 with download paths. The chat preview launch and the closed flagship announcement do not count — open-weight is a distinct event.

Bookmarking this page means you get the accurate local-setup guide the day weights ship, instead of the speculation that is circulating now. This URL refreshes in place into the real walkthrough when any of those three conditions are met.

Will Qwen 3.7 even be open weights?

Unknown. The pattern with the prior generation was a closed flagship — Qwen 3.6-Max stayed API-only — while smaller dense and MoE variants shipped open under Apache 2.0. If that pattern holds, expect a closed Qwen 3.7-Max alongside eventual open-weight sizes (e.g. a 3.7-27B dense and a 3.7-35B-A3B MoE), but Alibaba has not committed to any of this publicly. Plan as if the flagship may be API-only, and let the smaller open variants be the upside.

What hardware will you likely need when weights do land?

This is the only forward-looking section, and it is explicitly an extrapolation from Qwen's public 3.5 → 3.6 hardware trajectory, not specifications. If the pattern continues, expect roughly the same hardware budget as Qwen 3.6 for like-for-like sizes — a 27B dense workable on a single 24 GB card at INT4, a 35B-A3B MoE feasible on Apple Silicon via MLX, and any larger or vision variants pushing into multi-GPU territory. Real numbers come from the eventual model card; until then, plan for parity with 3.6 and add headroom for context-length growth.

Should you wait for Qwen 3.7 weights or use Qwen 3.6 now?

Use Qwen 3.6 now. "Wait for the next version" is almost always the wrong call when (a) there's no announced open-weight release date, (b) the current version is strong and openly available, and (c) your alternative is shipping nothing. Qwen 3.6 is a capable, open-weight model you can deploy today. If Qwen 3.7 ships open weights and is materially better, swapping a working Qwen 3.6 setup is a model-id and re-evaluation exercise — cheap compared to the cost of stalling.

If your team is moving fast on open-weight model adoption — evaluating each Qwen release, running your own benchmarks, and keeping self-hosted inference infrastructure current — that's real engineering work. Codersera matches you with vetted remote developers who have shipped LLM evaluation and self-hosting in production, with a risk-free trial so you can validate technical fit before committing.

FAQ

Can I download Qwen 3.7 weights from Hugging Face?

No. As of May 22, 2026 the official Qwen organization on Hugging Face has no Qwen 3.7 repositories. The newest model in the org is Qwen 3.6 (up to Qwen3.6-35B-A3B-FP8). A Qwen/Qwen3.7-* repo appearing on huggingface.co/Qwen is the definitive signal that local-running just became possible.

Can I run Qwen 3.7 with Ollama?

No. ollama.com/library/qwen3.7 and ollama.com/library/qwen3.7-max both return 404 — the entries don't exist yet. Any ollama run qwen3.7 snippet you see today will fail. Use ollama run qwen3.6 (or the specific 3.6 tag you want) until an Ollama entry exists.

Is Qwen 3.7 the same as Qwen 3.7-Max-Preview?

No. Qwen 3.7-Max-Preview is a hosted preview of the upcoming Max flagship, available only on chat.qwen.ai and lmarena.ai. It is not a downloadable model, and a preview label seen in a hosted UI is not the same as a released open-weight model you can run.

What's the cheapest way to use Qwen 3.7 today?

Free. Sign in to chat.qwen.ai and pick Qwen3.7-Max-Preview or Plus-Preview from the model selector, or use lmarena.ai's Direct Chat with the same identifiers. Both are free for the preview window. For paid API access, OpenRouter (qwen/qwen3.7-max) went live May 21, 2026 at $2.50 / $7.50 per 1M tokens (in/out), routing through Alibaba Cloud Model Studio. An official DashScope price sheet has not yet been posted, so the OpenRouter rate is the working number.

Will Qwen 3.7 be open weights like 3.6?

Unknown until Alibaba confirms. The prior pattern suggests a closed Qwen 3.7-Max with smaller open-weight variants (Apache 2.0) following — but Alibaba has not committed to either, and you should plan as if the flagship may be API-only.

Should I wait for Qwen 3.7 weights or use Qwen 3.6 now?

Use 3.6 now. There is no announced open-weight release date for 3.7, and Qwen 3.6 is a capable open-weight model you can deploy today. Migrating from 3.6 to a future 3.7 is a model-id and re-eval exercise — cheap compared to the cost of waiting on an unannounced release.

What about running Qwen 3.7-Max-Preview locally?

You can't. The Max-Preview is served only via chat.qwen.ai and lmarena.ai — Alibaba has not published Max-Preview weights. "Run the preview locally" is not on the table; the closest local equivalent today is Qwen 3.6.