MiniMax M3: First Open-Weights Reasoning + Agent Model (2026)
Quick answer. MiniMax M3 is out. Announced June 1, 2026 and live on Hugging Face by June 7, MiniMax positions it as the first open-weights model to combine reasoning and agent capabilities in a single release — framing shared by AI researcher @arankomatsuzaki and cross-confirmed by MiniMax Head of Engineering @SkylerMiao7. Per the official model card: 428B total / 23B active MoE, 1M-token context, native multimodal; per published API pricing: $0.30 / $1.20 per M tokens; per vendor-reported benchmarks picked up by VentureBeat: 59.0% on SWE-Bench Pro. Weights are on the official MiniMaxAI/MiniMax-M3 Hugging Face repo. Run M3 today if you need open-weight agentic reasoning; M2.7 remains the stable fallback.
Last updated: June 19, 2026.
MiniMax M3 has shipped. If you landed here looking for a release date, the answer is now concrete: M3 is the company's first generation that combines reasoning and agent behavior in a single open-weights release. This article was previously a "not out yet" tracker — it has been rewritten to reflect the launch and what M3 actually is.
Below: the launch context, the architectural step, how M3 compares to the M2.x line you may already be running, and a practical recommendation on whether to migrate now or stay on M2.7.
Is MiniMax M3 released?
Yes. M3 was announced June 1, 2026 and the launch was picked up immediately by the open-source ML community; weights landed on the official Hugging Face repository by June 7, 2026. The most-shared launch post came from researcher Aran Komatsuzaki:
"Introducing MiniMax M3: The First Open-Weights Model to combine reasoning + agent."
— @arankomatsuzaki on X
That framing was cross-confirmed by @SkylerMiao7, MiniMax's Head of Engineering. Treat the X posts as a useful launch signal, but treat the Hugging Face model card and MiniMax's own platform channels as the primary verification:
- Official Hugging Face org (
MiniMaxAI): theMiniMax-M3repository is now published alongside the existing M2-series weights. - MiniMax news / platform: the M3 announcement and API availability are reflected on minimax.io/news.
What's actually new in MiniMax M3?
The headline claim is architectural: MiniMax positions M3 as the first open-weights model to combine reasoning and agent capabilities in one release. Prior open-weights work has had to choose: a strong reasoning model (long-chain math, code, planning) or a strong agent model (tool-use, multi-step task execution, environment interaction). M3 ships both behaviors out of the same base.
What that means in practice:
- Reasoning behavior baked in. Long-context multi-step reasoning is a first-class capability, not a fine-tune you apply after the fact.
- Agent behavior baked in. Tool-calling, planning loops, and multi-turn task execution are part of the base release rather than a separate "agent variant."
- One open-weight model to evaluate. Teams that previously stitched together a reasoning model + an agent harness can now benchmark a single artifact against their workload.
- Continuity with M2.x. M3 builds on the M-series' agentic-era positioning that started with M2 and was sharpened across M2.1, M2.5, and M2.7 — the lineage is consistent, the step is in capability combination.
The numbers, sourced from the MiniMax-M3 model card and the Unsloth deployment docs: M3 is a 428B-parameter Mixture-of-Experts model that activates ~23B parameters per token, ships native multimodality (text, image, video from step one of training rather than bolted on later), and serves a 1M-token context window. The architectural step that pays for the long context is MiniMax Sparse Attention (MSA): per the Unsloth M3 deployment docs, MSA delivers ~15.6× faster decoding and ~9.7× faster prefill versus M2 at 1M context.
On benchmarks, MiniMax reports 59.0% on SWE-Bench Pro (1,865 real pull requests from actively-maintained repos), 66% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, and 28.8% on KernelBench Hard per the model card. VentureBeat's launch coverage walks through how that puts M3 above GPT-5.5 (58.6% SWE-Bench Pro) and Gemini 3.1 Pro (54.2%), while trailing Claude Opus 4.8 (69.2%) on the same suite. All numbers in this section are vendor-reported until independent reruns land; treat them as a strong directional signal, not the final word, and consider running your own evaluation on the queries you actually serve.
M3 vs M2.7 — which do I run?
If your workload involves agentic reasoning — tool-use plus planning plus chain-of-thought — M3 is the natural target. The pitch is exactly that combination. Pull the weights from the official org, run your own evaluation against the queries you actually serve, and decide on the data.
If your workload is text-only or you've already standardized on M2.7 and you don't need agent behavior, there is no urgency. M2.7 is the stable, well-understood point release of the previous generation; it is not deprecated by M3 landing, and a calm migration window is a reasonable plan. That said, M3's 1M-context window and native multimodality may still be worth evaluating even for non-agent use cases if you're processing long documents or mixed-media inputs.
| Model | Type | Best for |
|---|---|---|
| MiniMax-M3 | Open-weight reasoning + agent, 428B/23B-active MoE, 1M ctx, multimodal | Agentic workloads needing both reasoning and tool-use in one model |
| MiniMax-M2.7 | Open-weight text / agentic (229B) | Stable flagship of the prior generation; reasonable to stay on |
| MiniMax-M2.5 / M2.1 / M2 | Open-weight text (229B) | Older M2-line releases; downgrade only if you need the specific version |
And the cross-vendor view for teams weighing M3 against the rest of the June 2026 frontier (SWE-Bench Pro figures and prices are vendor-published; treat as directional):
| Model | License | Context | SWE-Bench Pro | Output $ / M tokens |
|---|---|---|---|---|
| MiniMax-M3 | Open weights | 1M | 59.0% | $1.20 |
| Claude Opus 4.8 | Closed | 200K | 69.2% | $25.00 |
| GPT-5.5 | Closed | — | 58.6% | — |
| Gemini 3.1 Pro | Closed | — | 54.2% | — |
| Kimi K2.7 Code | Modified MIT | 256K | not yet independently tested | $0.95 |
M3 trades roughly ten SWE-Bench Pro points for an output token that is around twenty times cheaper than Opus 4.8, plus open weights you can self-host — though raw bench scores aren't the whole story (latency, tool-use accuracy, and your task mix will matter as much). Kimi K2.7 Code (Moonshot AI, shipped June 12, 2026) is the other open-weight coding-focused alternative to evaluate against M3, but it has not yet posted independent SWE-Bench scores as of mid-June.
For setup, integration, and a practical evaluation playbook against M3, our MiniMax M3 developer guide walks through pulling weights, wiring up an agent harness, and benchmarking against M2.7.
How to verify the release yourself
You don't have to take any third-party article at face value, including this one. Two checks, both first-party:
- huggingface.co/MiniMaxAI — the official org. Look for the
MiniMax-M3repo with a model card, weights, and a license file. - minimax.io/news — MiniMax's own announcement channel. The platform release notes will show M3 API availability separately if you want managed-endpoint rather than self-hosted access.
Together those two pages are the ground truth. Social posts (including the launch tweet quoted above) are useful as signals but should always be confirmed against the org and the platform.
Running M3 on your own hardware
The community quants for self-hosters are tracked at unsloth/MiniMax-M3-GGUF and the deployment notes live in the Unsloth M3 docs. Per those sources (community / vendor docs):
- UD-IQ1_M — smallest GGUF quant, ~128 GB disk footprint.
- UD-IQ3_XXS — ~159 GB, recommended balance between file size and quality on systems with sufficient RAM.
- 5-bit M3 has been demonstrated running locally on a single Apple M3 Ultra 512GB via Unsloth — see the Unsloth GGUF discussion thread for setup notes.
Tool support is moving fast: llama.cpp support for M3 is preliminary and not yet in a released build — per the Unsloth docs you'll need to compile from llama.cpp PR #24523 to run these GGUFs today.
Companion guide
For where M3 sits among open-weight models — alternatives, trade-offs, and how to choose what to self-host in 2026 — see our open-source LLMs landscape for 2026.
Should you migrate to M3 now?
Three honest takes by team shape:
- Building an agent product? Evaluate M3 this sprint. The "reasoning + agent in one open-weights model" framing is precisely the gap teams were stitching together with multiple models. If the benchmarks on your traffic look good, the consolidation is worth the migration.
- Running M2.7 for plain text inference? No rush. M2.7 isn't deprecated by M3 landing. Pencil in an M3 evaluation, but don't pause delivery for it.
- Planning a 2026 roadmap? Treat M3 as the new baseline for "what an open-weights agentic model looks like." Even if you stay on M2.7 this quarter, your inference layer should already be model-agnostic so M3 is a swap, not a rewrite.
FAQ
Is MiniMax M3 out?
Yes. M3 was announced June 1, 2026 and weights landed on the official Hugging Face repository by June 7, 2026. MiniMax positions it as the first open-weights model to combine reasoning and agent capabilities in a single release. The launch was shared widely by AI researcher @arankomatsuzaki and cross-confirmed by MiniMax's Head of Engineering @SkylerMiao7.
What makes MiniMax M3 architecturally different?
Per the model card, M3 is a 428B-parameter MoE that activates ~23B parameters per token, ships native multimodality (text, image, video from first-step training), and serves a 1M-token context window via a new attention design — MiniMax Sparse Attention (MSA). MiniMax positions it as the first open-weights release to ship reasoning and agent behavior together in one base model. Previously you'd combine a reasoning-tuned LLM with a separate agent harness or a separately-tuned agent model; M3 collapses that into one artifact.
Is MiniMax M3 open-weight?
Yes. M3 weights are published on the official MiniMaxAI Hugging Face org, continuing the M-series' open-weights distribution model. Community GGUF quants (Unsloth) are already live.
M2.7 vs M3 — which should I build on?
If you need agentic reasoning (tool-use plus planning plus chain-of-thought), M3 is the natural target. If you're on M2.7 for plain text generation and you don't need agent behavior, there's no urgency — M2.7 is still the stable point release of the previous generation. Keep your inference layer model-agnostic so future generations are a swap, not a rewrite.
How do I get started with MiniMax M3?
See our MiniMax M3 developer guide for pulling weights, setting up an agent harness, and benchmarking against M2.7. If you want to run M3 locally, the Unsloth GGUF builds are the fastest path — UD-IQ3_XXS (~159 GB) is the recommended balance for systems with sufficient RAM. The model card on huggingface.co/MiniMaxAI/MiniMax-M3 is the canonical source for licensing and integration details.
Where is the official MiniMax M3 announcement?
The model card and weights are on huggingface.co/MiniMaxAI/MiniMax-M3; the press / platform announcement is at minimax.io/news. Treat both as ground truth above any aggregator coverage.
If your team is moving fast on open-weight model adoption — tracking every MiniMax point release, running your own benchmarks instead of trusting aggregator headlines, and keeping inference infrastructure ready to swap models — that is real engineering work. Codersera matches you with vetted remote developers who have shipped LLM evaluation and self-hosting in production, with a risk-free trial so you can validate technical fit before you commit.