Quick answer. Xiaomi MiMo-V2.5-Pro is an open-weight 1.02T-parameter MoE coding model that gets close to Claude Opus 4.7 on agentic-coding tasks while costing far less per token — but it does not match Claude. On Xiaomi's own benchmarks it trails Claude Opus 4.6 (73.7 vs 77.1 on Xiaomi's internal coding bench), and Claude Opus 4.7 leads it on independent SWE-bench. Its real edge is token efficiency and an MIT license.
On April 22, 2026, Xiaomi released two open-weight models that landed with very little blog coverage and a very loud vendor claim: that MiMo-V2.5-Pro "sits next to Claude on coding." Both models — MiMo-V2.5 and the larger MiMo-V2.5-Pro — are fully open-sourced under an MIT license, which alone makes them worth a serious look for any team running agentic coding at scale.
This article does the unglamorous part: separating what Xiaomi claims from what independent measurement shows, and answering the only question that matters for an engineering team — can you actually replace Claude with this for production coding work, and at what cost? Short version: it is the strongest open-weight agentic coder released so far, the efficiency story is real, and it still trails Claude Opus 4.7 on the hard benchmarks. The interesting decision is not "is it as good as Claude" — it is "is the gap worth the price difference for your workload."
What are MiMo-V2.5 and MiMo-V2.5-Pro?
They are two distinct open-weight models in the same release, not a small/large pair of the identical architecture:
- MiMo-V2.5 — a 310B-parameter sparse Mixture-of-Experts (MoE) model with ~15B active parameters per token. It is omnimodal: native text, image, video, and audio understanding in one model. Trained on roughly 48T tokens. 1M-token context. MIT license.
- MiMo-V2.5-Pro — a 1.02-trillion-parameter MoE with 42B active parameters per token, 384 routed experts, 8 experts activated per token. Text-focused, tuned for long-horizon agentic coding. Trained on 27T tokens. 1M-token context. MIT license.
The Pro variant is the one Xiaomi positions against Claude. The smaller MiMo-V2.5 is positioned as the efficient daily driver — Xiaomi states it matches V2.5-Pro on everyday coding at roughly half the inference cost, with multimodal capability the Pro model does not have.
Both are downloadable from Hugging Face (XiaomiMiMo/MiMo-V2.5 and XiaomiMiMo/MiMo-V2.5-Pro) with weights, tokenizer, and full model cards. GGUF quantizations (community, via Unsloth) exist for both.
How is MiMo-V2.5-Pro architected for efficiency?
The efficiency angle is the most technically interesting part of the release, and it is the thing that makes the cost story credible rather than just a price cut.
MiMo-V2.5-Pro uses a hybrid attention architecture: it interleaves Sliding Window Attention (SWA) and full Global Attention (GA) layers at a 6:1 ratio (per the model card, 60 SWA layers to 10 full-attention layers, SWA window size 128, GQA with 8 KV heads). Xiaomi reports this reduces KV-cache storage by roughly 7x versus an all-global-attention model of the same size while preserving long-context performance through a learnable attention-sink bias.
Why this matters in practice: for long agentic coding trajectories — the kind where an agent reads a repo, edits, runs tests, re-reads, and iterates for hours — KV-cache memory and per-token cost are the binding constraints, not raw parameter count. A 7x smaller KV cache is the difference between a trajectory that fits on a given GPU configuration and one that does not.
| Spec | MiMo-V2.5 | MiMo-V2.5-Pro |
|---|---|---|
| Total parameters | 310B | 1.02T |
| Active parameters / token | ~15B | 42B |
| Architecture | Sparse MoE, omnimodal | MoE + hybrid SWA/GA attention |
| Routed experts | 256 | 384 (8 per token) |
| Context window | 1M tokens | 1M tokens |
| Training tokens | ~48T | 27T |
| License | MIT | MIT |
How does MiMo-V2.5-Pro benchmark against Claude Opus 4.7?
This is where the careful reading matters. There are three categories of number in circulation, and they tell different stories.
1. Xiaomi's vendor-reported benchmarks (treat as vendor-claimed): Xiaomi reports MiMo-V2.5-Pro at 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, and 68.4 on Terminal-Bench 2.0, with ~64% on the general ClawEval subset. Xiaomi also reports a head-to-head on its own internal coding benchmark of 73.7 for MiMo-V2.5-Pro vs 77.1 for Claude Opus 4.6 — i.e. by Xiaomi's own measure it trails the previous Claude generation. Note all of these are Xiaomi's internal numbers; the "sits next to Claude on coding" framing is Xiaomi's marketing, not an independent finding.
2. Anthropic's reported numbers for Claude Opus 4.7 (released April 16, 2026, vendor-reported): 87.6% on SWE-bench Verified and 64.3% on SWE-Bench Pro. Even comparing vendor-claim to vendor-claim across labs — which is methodologically loose — Claude Opus 4.7 leads MiMo-V2.5-Pro on both SWE-bench tracks.
3. Independent third-party measurement (Artificial Analysis): MiMo-V2.5-Pro scores 54 on the Artificial Analysis Intelligence Index — well above the open-weight median (~30) and ranked near the top of its open-weight class, but below frontier closed models. Artificial Analysis did not publish a direct MiMo vs Claude Opus 4.7 head-to-head as of this writing, so the cleanest honest statement is: strong for an open-weight model, not at the frontier closed-model tier.
| Benchmark | MiMo-V2.5-Pro | Claude Opus 4.7 | Source / status |
|---|---|---|---|
| SWE-bench Verified | 78.9 | 87.6% | Both vendor-reported |
| SWE-Bench Pro | 57.2 | 64.3% | Both vendor-reported |
| Xiaomi internal coding bench | 73.7 | 77.1 (vs Opus 4.6) | Xiaomi-reported |
| AA Intelligence Index | 54 | not directly compared here | Independent (Artificial Analysis) |
The honest read: MiMo-V2.5-Pro does not match Claude Opus 4.7 on the hard, widely-cited coding benchmarks — even on Xiaomi's own numbers it trails Claude Opus 4.6. What it does is land within striking distance while being open-weight and dramatically cheaper per token. "Can it match Claude?" — on raw capability, no. On capability-per-dollar for many agentic workloads, it is genuinely competitive.
What is the real cost and efficiency difference vs Claude?
This is where MiMo-V2.5-Pro is most defensible, and it is a two-part story: lower per-token price and fewer tokens per task.
- Per-token price. Independent pricing tracked by Artificial Analysis puts MiMo-V2.5-Pro at roughly $1.00 per 1M input tokens and $3.00 per 1M output tokens (with a ~70% cache-hit discount). Claude Opus 4.7's published list pricing is several times higher per token ($5 input / $25 output per 1M, per Anthropic's rate card). To compare like-for-like on the cache-hit scenario: Anthropic prices cached input at $0.50 per 1M tokens (10% of list, per Anthropic's rate card), so even discounting both sides for caching, MiMo-V2.5-Pro remains materially cheaper per token.
- Tokens per task. Xiaomi reports MiMo-V2.5-Pro completes ClawEval agentic trajectories using ~70K tokens each — roughly 40–60% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable capability (vendor-claimed; not independently verified). Because the hybrid-attention KV-cache reduction is architectural, the "fewer tokens" claim is at least plausible rather than purely marketing.
Multiply those two effects together and the cost gap for a high-volume agentic-coding workload becomes large — frequently cited as an order of magnitude or more cheaper per completed task. The flip side: Xiaomi's token-efficiency comparisons are against Claude Opus 4.6, and they are Xiaomi's measurements. Treat the direction (much cheaper) as solid; treat the exact multiplier as vendor-flavored until independent harnesses confirm it on your task distribution.
Companion guide
For how MiMo-V2.5 fits alongside Claude, GPT, and the other agentic coders — selection criteria, harnesses, and where each one wins — see our AI coding agents complete guide for 2026.
Can you realistically self-host MiMo-V2.5?
Yes — and the MIT license means you can self-host, fine-tune, and commercially deploy without negotiation. But "can you" depends heavily on which variant.
MiMo-V2.5 (310B / 15B active) is the realistic self-host target for most teams. It runs under SGLang with FP8 quantization on a multi-GPU node. Per the model card, a representative SGLang launch:
python3 -m sglang.launch_server \
--model-path XiaomiMiMo/MiMo-V2.5 \
--served-model-name mimo-v2.5 \
--pp-size 1 --dp-size 2 --tp-size 8 \
--enable-dp-attention \
--moe-a2a-backend deepep \
--quantization fp8 \
--context-length 262144 \
--attention-backend fa3MiMo-V2.5-Pro (1.02T / 42B active) is realistically a cluster, not a workstation. The model card's reference SGLang config uses tensor parallelism of 16 and expert parallelism of 16 (e.g. --tp-size 16 --ep-size 16 --quantization fp8 --context-length 1048576). That is a serious multi-node deployment. vLLM is also supported via prebuilt images (vllm/vllm-openai:mimov25-cu130 / cu129) with tensor parallelism and expert parallelism enabled; the vLLM Recipes site documents the current config.
Practical guidance: if you want to own the model on your own hardware, target MiMo-V2.5 (the 310B). If you want MiMo-V2.5-Pro's capability without a cluster, use one of the hosted API providers serving it — you still get the open-weight pricing benefit (~$1/$3 per 1M tokens via tracked providers) and the MIT license still means no vendor lock-in on the weights themselves. Recommended sampling for both: temperature=1.0, top_p=0.95.
What are the limitations and caveats?
- It is not at Claude Opus 4.7's capability tier. On the hard benchmarks it trails — by Xiaomi's own numbers it trails Claude Opus 4.6. For the most demanding autonomous-engineering work, Claude Opus 4.7 is still ahead.
- Most headline numbers are vendor-reported. The 78.9 SWE-bench Verified, the 40–60% token savings, and the "sits next to Claude" framing are Xiaomi's. Independent measurement (Artificial Analysis Intelligence Index 54) confirms it is strong-for-open-weight, not frontier-tier.
- Generation speed is mid-pack. Independent tracking puts MiMo-V2.5-Pro around 55.7 tokens/sec median across providers — below average. For latency-sensitive interactive use this matters; for batch/background agentic sweeps it usually does not.
- Pro is a cluster-scale self-host. The 1.02T variant is not a single-box deployment. The economically realistic self-host is MiMo-V2.5 (310B).
- Xiaomi's Claude comparison uses Opus 4.6, not 4.7. Anytime you see "matches Claude," check which Claude. The gap to the current Claude generation (4.7) is wider than the gap to 4.6.
What's the verdict for an engineering team?
MiMo-V2.5-Pro is the most credible open-weight agentic coding model released so far, and the efficiency engineering behind it is real, not just a price tag. But "can it match Claude?" deserves a straight answer: no, not on raw capability — it trails Claude Opus 4.7 on the hard benchmarks, and trails even Claude Opus 4.6 on Xiaomi's own coding bench.
Where it wins is the economics. For high-volume, cost-sensitive agentic workloads — large refactor sweeps, CI-driven agents, internal tooling that runs thousands of trajectories — the combination of ~$1/$3 token pricing, fewer tokens per task, and an MIT license can make MiMo-V2.5-Pro the rational choice even though it is the less capable model. For frontier-difficulty autonomous engineering where capability dominates cost, stay on Claude Opus 4.7. Many teams will end up running both: MiMo for volume, Claude for the hard 10%.
FAQ
Is Xiaomi MiMo-V2.5 actually open source?
Yes. Both MiMo-V2.5 and MiMo-V2.5-Pro are released under the MIT license with weights, tokenizer, and full model cards on Hugging Face. MIT permits commercial use, fine-tuning, and redistribution without a separate agreement — it is one of the most permissive licenses an open-weight model of this size has shipped under.
Does MiMo-V2.5-Pro beat Claude Opus 4.7 at coding?
No. On the widely-cited coding benchmarks, Claude Opus 4.7 leads — Anthropic reports 87.6% on SWE-bench Verified and 64.3% on SWE-Bench Pro versus Xiaomi's reported 78.9 and 57.2 for MiMo-V2.5-Pro. By Xiaomi's own internal coding benchmark it even trails Claude Opus 4.6 (73.7 vs 77.1). It competes on cost-efficiency, not raw capability.
What is the difference between MiMo-V2.5 and MiMo-V2.5-Pro?
MiMo-V2.5 is a 310B-parameter omnimodal MoE (text, image, video, audio) with ~15B active parameters — the efficient daily driver and the realistic self-host target. MiMo-V2.5-Pro is a 1.02T-parameter text-focused MoE with 42B active parameters and a hybrid attention architecture, tuned for long-horizon agentic coding and positioned against Claude.
How much does MiMo-V2.5-Pro cost vs Claude?
Independent tracking puts MiMo-V2.5-Pro at roughly $1.00 per 1M input tokens and $3.00 per 1M output tokens. Claude Opus 4.7 list pricing is $5 input / $25 output per 1M tokens (Anthropic's published rate) — roughly 5x the input and 8x the output cost per token, before factoring MiMo's claimed lower token count per task.
Can I self-host MiMo-V2.5-Pro?
Technically yes (the MIT license allows it), but the 1.02T-parameter Pro variant realistically needs a multi-node cluster — the reference SGLang config uses tensor and expert parallelism of 16. For most teams the practical self-host is the 310B MiMo-V2.5; for Pro, use a hosted provider serving the open weights.
Are the MiMo-V2.5 benchmarks independently verified?
Mostly not. The SWE-bench, Terminal-Bench, ClawEval, and token-efficiency figures are Xiaomi-reported. The main independent data point is Artificial Analysis's Intelligence Index score of 54, which confirms it is strong among open-weight models but below frontier closed models. Treat vendor benchmarks as directional and validate on your own task distribution before switching production workloads.
Should I replace Claude with MiMo-V2.5 for agentic coding?
For high-volume, cost-sensitive workloads where the capability gap is tolerable — yes, it can be the rational choice. For frontier-difficulty autonomous engineering, keep Claude Opus 4.7. The common pattern is a tiered setup: MiMo-V2.5 for the bulk of trajectories, Claude for the hardest tasks where capability outweighs cost.
If you are building this kind of multi-model agentic infrastructure — routing, harnesses, cost guardrails, and evals across open and closed models — and want engineers who have shipped it in production, Codersera matches you with vetted remote developers experienced with agentic coding stacks. We run a risk-free trial so you can validate technical fit before committing.