Last updated April 2026 — refreshed for current model/tool versions.
Gemma 4 (Google DeepMind, April 2026) and Qwen3.6 (Alibaba, April 2026) are the current generation of open-weight LLMs. Both shipped under Apache 2.0, both run usefully on a single workstation GPU, and both have eclipsed their respective Gemma 3 / Qwen 3 predecessors on coding, math, and agentic benchmarks. This guide is the head-to-head comparison: architecture, sizes, benchmarks, multimodality, licensing, and deployment footprint — refreshed with 2026 numbers.
What changed in 2026: Google replaced Gemma 3 (1B–27B, Gemma license) with Gemma 4 (E2B / E4B / 26B MoE / 31B dense, Apache 2.0). Alibaba replaced Qwen 3 with Qwen3.5 (397B-A17B MoE, Feb 2026) and Qwen3.6 (27B dense + 35B-A3B MoE + 1M-context Plus Preview, March–April 2026). The earlier claim that "Qwen 3 has no vision" was always partial — the Qwen3-VL family (2B/4B/8B/32B dense and 30B-A3B / 235B-A22B MoE) shipped between September and October 2025 and Qwen3.6-27B is natively multimodal. Both 2026 flagships now process text, images, and video out of the box.
Want the full picture? Read our continuously-updated Qwen 3.5 Complete Guide (2026) — flavors, licensing, benchmarks, and on-device usage.
TL;DR (2026)
- Pick Gemma 4 (31B dense) if you want a single-GPU, Apache-2.0 frontier model with native video + image + audio, 256K context, and 140+ languages — strong general reasoning and the broadest multimodal envelope.
- Pick Qwen3.6-27B (dense) if your priority is agentic coding, terminal/SWE-bench performance, or the new Thinking Preservation reasoning mechanism. It outperforms Qwen3.5-397B-A17B on SWE-bench Verified at ~14× fewer total parameters.
- Pick Qwen3.6-35B-A3B if VRAM is the constraint: ~3.1B active parameters per token (top-4-of-64 routing) means it runs on a used RTX 3090, while still scoring 73.4 on SWE-bench Verified and 92.7 on AIME 2026.
- Pick Qwen3.6 Plus Preview when you need a 1M-token context window and reportedly ~3× the throughput of Claude Opus 4.6 in tokens/sec.
- Skip Gemma 3 / Qwen 3 for new builds. Both families are end-of-line; their successors ship under more permissive licenses with measurably better benchmarks.
If you also need to weigh closed-weight competition for coding (Claude 4.6/4.7, GPT-5.5, DeepSeek V4), our pillar guide covers that directly: DeepSeek V4 vs Claude vs GPT-5: AI coding model comparison (2026).
Overview
Gemma 4 (Google DeepMind, released April 2, 2026)
Gemma 4 is Google's latest open-weight LLM family, the successor to Gemma 3. The headline change is licensing: Gemma 4 ships under Apache 2.0, ending the restrictive Gemma license era. The 31B dense model is currently ranked #3 on the Arena AI text leaderboard among open models; the 26B MoE sits at #6.
Key features:
- Architecture: Decoder-only transformer (dense), plus a 26B Mixture-of-Experts variant
- Parameter sizes: Effective 2B (E2B), Effective 4B (E4B), 26B MoE, 31B dense
- Multimodality: All models natively process text, images, and video at variable resolutions; E2B and E4B also accept native audio input
- Context window: 128K tokens (E2B / E4B), up to 256K tokens (26B MoE / 31B dense)
- Multilingual: 140+ languages, natively trained
- License: Apache 2.0 (commercial use, fine-tuning, redistribution permitted)
- Tooling: Native function calling, structured JSON output, system instructions for autonomous agents
Qwen3.6 / Qwen3.5 (Alibaba, February–April 2026)
Qwen3.6 is Alibaba's current flagship series. Qwen3.5 (Feb 16, 2026) introduced the 397B-A17B MoE; Qwen3.6 (March–April 2026) added the dense 27B, the 35B-A3B MoE, and the 1M-context Plus Preview. The series introduces a Thinking Preservation mechanism that retains reasoning state across multi-turn agent runs, plus a hybrid Gated DeltaNet + self-attention architecture in the 27B dense model.
Key features:
- Architecture: Dense (27B) and MoE (35B-A3B, 397B-A17B); hybrid Gated DeltaNet linear attention + self-attention in the 27B
- Parameter sizes (current): Qwen3.6-27B dense, Qwen3.6-35B-A3B (≈3.1B active), Qwen3.5-397B-A17B (≈17B active)
- Multimodality: Native image + video in Qwen3.6-27B; the Qwen3-VL line (2B / 4B / 8B / 32B dense and 30B-A3B / 235B-A22B MoE, released Sep–Oct 2025) covers vision-language workloads explicitly
- Context window: 262K native, extensible to 1M tokens (27B and 35B-A3B); 1M tokens stock on Qwen3.6 Plus Preview
- Multilingual: 100+ languages
- License: Apache 2.0
- Reasoning modes: Thinking / Non-Thinking modes carried over from Qwen 3, plus the new Thinking Preservation mechanism
Specifications Side by Side
| Feature | Gemma 4 | Qwen3.6 |
|---|---|---|
| Release | April 2, 2026 | March–April 2026 (Qwen3.6); Feb 2026 (Qwen3.5) |
| Architecture | Decoder-only dense + 26B MoE | Dense 27B + MoE 35B-A3B / 397B-A17B; hybrid linear+self-attention |
| Sizes | E2B, E4B, 26B MoE, 31B dense | 27B dense, 35B-A3B MoE, 397B-A17B MoE |
| Active params (MoE) | 26B MoE — Google has not disclosed active count | ~3.1B (35B-A3B), ~17B (397B-A17B) |
| Context window | 128K (small), 256K (large) | 262K native, extensible to 1M; 1M stock on Plus Preview |
| Vision | Native (all sizes) | Native in 27B; Qwen3-VL family for dedicated vision |
| Video | Native (all sizes) | Yes (Qwen3-VL and 27B) |
| Audio input | Yes (E2B, E4B) | No (text + vision focus) |
| Languages | 140+ | 100+ |
| License | Apache 2.0 | Apache 2.0 |
| Function calling | Native | Native (with Thinking Preservation) |
2026 Benchmarks
All scores below are from the official model cards / launch reports for Gemma 4 (April 2026) and Qwen3.6 (April 2026). Compare cautiously: Qwen3.6 reports SWE-bench / Terminal-Bench prominently; Gemma 4's launch led with AIME 2026 and LiveCodeBench v6.
| Benchmark | Gemma 4 (31B dense) | Qwen3.6-27B | Qwen3.6-35B-A3B |
|---|---|---|---|
| AIME 2026 (math) | 89.2% | — | 92.7% |
| MMLU Pro | 85.2% | — | — |
| LiveCodeBench v6 | 80.0% | — | — |
| SWE-bench Verified | — | 77.2% | 73.4% |
| SWE-bench Pro | — | 53.5% | — |
| Terminal-Bench 2.0 | — | 59.3% | — |
| GPQA Diamond | — | — | 86.0 |
| HMMT February 2026 | — | — | 83.6 |
| Arena AI (text leaderboard, open) | #3 open | — | — |
Reference points: on AIME 2026 specifically, Gemma 4 (89.2%) edges Llama 4 (88.3%) and is far ahead of DeepSeek V4 (42.5%) and the GPT family (37.5%) per Google's launch numbers. Qwen3.6-27B's 77.2% on SWE-bench Verified beats Qwen3.5-397B-A17B (76.2%) at ~14× fewer total parameters — a strong case for the dense small-flagship pattern.
How to read these scores
- Gemma 4 dominates pure math and contest reasoning at the 31B dense tier, plus it has the most balanced multimodal stack.
- Qwen3.6 dominates real-software-engineering benchmarks. SWE-bench Verified, SWE-bench Pro, and Terminal-Bench 2.0 are closer to "what actually breaks in production" than HellaSwag or GSM8K, and the Qwen3.6 dense 27B wins all three among open models at this size.
- The MoE vs dense tradeoff is now clearer. Qwen3.6-35B-A3B activates ~3.1B parameters per token, fitting on a 24 GB GPU; Gemma 4 31B dense needs more VRAM but has stronger general-purpose multimodal coverage.
Multimodal Capabilities
| Capability | Gemma 4 | Qwen3.6 / Qwen3-VL |
|---|---|---|
| Text | Yes | Yes |
| Image input | Yes (all sizes) | Yes (Qwen3.6-27B + Qwen3-VL family) |
| Video input | Yes (all sizes) | Yes (Qwen3-VL, Qwen3.6-27B) |
| Audio input | Yes (E2B / E4B) | No |
| OCR / chart understanding | Yes (highlighted in launch) | Yes (Qwen3-VL native) |
| Spatial reasoning / agents | Native function calling | Strong agentic tooling, Thinking Preservation |
Note this contradicts what older Gemma 3 vs Qwen 3 comparisons (including the prior version of this article) said. Qwen 3 always had vision-capable variants — the Qwen-VL line predates Qwen3 and the Qwen3-VL series shipped through Sep–Oct 2025 with sizes from 2B to 235B-A22B. With Qwen3.6-27B, multimodality is in the base dense model rather than a separate "VL" SKU.
Deployment and Hardware Footprint
| Profile | Recommended model | Notes |
|---|---|---|
| Edge / browser / mobile | Gemma 4 E2B or E4B | Built explicitly for ultra-mobile and on-device; 128K context |
| Single 24 GB consumer GPU (RTX 3090 / 4090) | Qwen3.6-35B-A3B | ~3.1B active params per token; coding-tuned |
| Workstation / single H100 | Gemma 4 31B dense or Qwen3.6-27B dense | Both fit comfortably; pick by task profile |
| Cluster / high throughput | Gemma 4 26B MoE or Qwen3.5-397B-A17B | MoE shines for batched inference |
| 1M-context workloads | Qwen3.6 Plus Preview | Currently the only stock-1M open option; ~3× Claude Opus 4.6 throughput per user reports |
Quantization and runtimes
- Both families ship official GGUF / AWQ / GPTQ quantizations day-of-release on Hugging Face.
- Gemma 4 has first-party LM Studio, Ollama, and Vertex AI integration; the E-sizes are explicitly tuned for browser-side WebGPU.
- Qwen3.6-27B and 35B-A3B are supported in vLLM, SGLang, llama.cpp, and Ollama; the 1M-context Plus Preview is currently free via OpenRouter.
Licensing
This is the simplest comparison in the article in 2026: both Gemma 4 and Qwen3.6 are Apache 2.0. The old "Gemma license is restrictive, Qwen is permissive" tradeoff no longer applies. Either model can be used commercially, fine-tuned, and redistributed without per-MAU caps or use-case restrictions. The remaining differentiation is technical, not legal.
Reasoning, Coding, and Math
- Math contests (AIME 2026): Qwen3.6-35B-A3B (92.7%) edges Gemma 4 31B dense (89.2%). Both are well above the closed-weight competition cited in Google's launch.
- Software engineering (SWE-bench Verified, Terminal-Bench 2.0): Qwen3.6-27B is the open-weight leader at this size class.
- General coding (LiveCodeBench v6): Gemma 4 31B at 80.0% is competitive with Llama 4 (77.1%) and far ahead of DeepSeek V4 (52.0%) and GPT (44.0%) on Google's reported numbers.
- Agentic workflows: Both have native function calling and structured-output support. Qwen3.6 adds Thinking Preservation, which the team reports improves multi-turn agent stability.
Use Cases: Which to Pick
Gemma 4 best for:
- Multimodal apps spanning text, image, video, and audio
- On-device / browser deployments (E2B and E4B)
- Wide-language coverage (140+ languages, including low-resource)
- STEM and contest-style reasoning
- Teams already invested in Vertex AI / LM Studio / Ollama
Qwen3.6 best for:
- Agentic coding pipelines (SWE-bench / Terminal-Bench)
- Long-context workloads up to 1M tokens (Plus Preview)
- VRAM-constrained deployments via 35B-A3B MoE
- High-throughput cloud serving via Qwen3.5-397B-A17B
- Vision-language pipelines via the Qwen3-VL SKUs (still actively maintained)
How This Compares to Closed-Weight Models
Both Gemma 4 and Qwen3.6 are competitive with — and on specific benchmarks beat — closed-weight peers in 2026. The pillar comparison breaks the closed-weight side down in detail: DeepSeek V4 vs Claude vs GPT-5: AI coding model comparison (2026). Short version: Claude 4.6/4.7 still leads on long-horizon agentic coding, GPT-5.5 leads on tool-use latency, and DeepSeek V4 is the closed-weight price-performance leader; Qwen3.6 is the strongest open-weight equivalent, and Gemma 4 is the strongest open-weight multimodal equivalent.
FAQs
Which model is best for coding in 2026?
- Among open weights, Qwen3.6-27B (dense) wins SWE-bench Verified at 77.2%. For VRAM-tight setups, Qwen3.6-35B-A3B is the practical pick.
Can either model run on a single consumer GPU?
- Yes. Qwen3.6-35B-A3B runs on a 24 GB card (RTX 3090/4090) thanks to ~3.1B active params. Gemma 4 E4B runs on most laptops; the 31B dense needs a workstation-class card or quantization.
Does Qwen support vision now?
- Yes — and it always did. The Qwen3-VL family (2B/4B/8B/32B dense, 30B-A3B / 235B-A22B MoE) covers dedicated vision-language workloads, and Qwen3.6-27B has native multimodal support in the base dense model.
Are both Apache 2.0?
- Yes, as of April 2026. The old Gemma license restriction is gone with Gemma 4.
Which has the longer context?
- Qwen3.6 — 262K native, extensible to 1M; Plus Preview is 1M out of the box. Gemma 4 caps at 256K on the larger sizes.
What about Gemma 3 / Qwen 3 if I'm already running them?
- Both still work, but for new builds the successors are strictly better on benchmarks, more permissively licensed, and have broader multimodal support. Plan a migration.
Final Comparison Summary
| Feature | Gemma 4 (31B dense) | Qwen3.6-27B |
|---|---|---|
| Architecture | Decoder-only dense | Hybrid Gated DeltaNet + self-attention, dense |
| Max context | 256K | 262K (extensible to 1M) |
| Vision / video | Yes, native | Yes, native |
| Audio | Yes (E2B / E4B) | No |
| Languages | 140+ | 100+ |
| Math (AIME 2026) | 89.2% | — (Qwen3.6-35B-A3B: 92.7%) |
| SWE-bench Verified | — | 77.2% |
| License | Apache 2.0 | Apache 2.0 |
| Function calling | Native | Native + Thinking Preservation |
| Best for | Multimodal + multilingual | Agentic coding, long context |
Conclusion
Choose Gemma 4 when your workload is multimodal-heavy, multilingual, or needs to run on edge / mobile / browser devices, and especially if you want a single model that handles text, image, video, and audio under a permissive license.
Choose Qwen3.6 when your workload is agentic coding, long-context retrieval-augmented generation, or VRAM-constrained inference. The 27B dense or the 35B-A3B MoE will out-perform Gemma 4 on real software-engineering benchmarks while remaining Apache 2.0.
For most teams in 2026, the practical answer is "both" — Gemma 4 for product surfaces that touch users and media, Qwen3.6 for backend coding and tool-using agents.