Quick answer. MiniMax M3, launched June 1, 2026, is a frontier open-weight model from Shanghai-based MiniMax that combines coding, a 1M-token context window, and native multimodal input (text, image, video). It uses a new MiniMax Sparse Attention (MSA) architecture that delivers ~15.6x faster decoding and ~9.7x faster prefill at 1M context vs MiniMax M2. API pricing on OpenRouter is $0.30 / $1.20 per million input/output tokens during launch promo.
What is MiniMax M3?
MiniMax M3 is a frontier large language model released by Shanghai AI lab MiniMax on June 1, 2026 (with OpenRouter listing the model under the minimax-m3-20260531 dated slug). MiniMax describes it as “the first and only open-weight model” that brings together three traits that have so far lived in separate models: frontier-level coding, a 1-million-token context window, and native multimodality (text, image, and video input).
The model is available now through the MiniMax API at https://api.minimax.io/v1/text/chatcompletion_v2 and through aggregators like OpenRouter. MiniMax has committed to publishing the technical report and open weights on Hugging Face and GitHub within roughly 10 days of the launch announcement — at the time of writing, the MiniMaxAI organization on Hugging Face still lists MiniMax-M2.7 (229B parameters) as the latest published checkpoint, with the M3 weights forthcoming.
The headline architectural change is MiniMax Sparse Attention (MSA), a new sparse-attention scheme that drops per-token compute at 1M context to roughly 1/20th of the previous-generation M2 model while preserving accuracy. M3 is also natively multimodal — it was trained with mixed-modality data “from Step 0,” per MiniMax’s launch blog, rather than having vision bolted on after the fact.
What’s new since MiniMax M2.7?
MiniMax M2.7 (released April 2026) was a 229B-parameter dense / MoE text model with a much shorter context window and no native multimodal input. M3 is a clean break across four dimensions:
- Architecture — MSA replaces standard attention. M3 keeps a Grouped-Query Attention (GQA) backbone but layers MiniMax Sparse Attention on top, doing block-level selection on real, uncompressed key-values. This is deliberately different from DeepSeek’s Multi-head Latent Attention (MLA), which compresses KV state and trades off some long-context precision — MSA sidesteps that compression-precision tradeoff.
- Context window: 256K → 1M tokens. The platform guarantees a minimum of 512K and extends to 1M for the long-context tier. MiniMax reports a 9.7x prefill speedup and 15.6x decoding speedup at 1M context vs M2 thanks to MSA.
- Native multimodality. M3 accepts text, images, and video as input (text-only output). M2.7 did not. The model can also “operate a desktop computer,” which MiniMax positions as a computer-use capability comparable to Anthropic’s computer-use API.
- Coding and agentic uplift. M3 scores 59.0% on SWE-Bench Pro (vs GPT-5.5 at 58.6% and Claude Opus 4.7 at 64.3%) and 83.5 on BrowseComp (vs Opus 4.7 at 79.3), per benchmarks reported in MiniMax’s launch blog and replicated by independent outlets.
How does MiniMax M3 compare to DeepSeek V4, Kimi K2.6, and Claude Opus 4.8?
The comparable peer set right now is the open-weight frontier (DeepSeek V4, Kimi K2.6, Qwen 3.5) plus closed-source frontier (Claude Opus 4.8, GPT-5.5). Where MiniMax has published numbers:
| Benchmark | MiniMax M3 | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Pro (coding) | 59.0% | 58.6% | 64.3% | 54.2% |
| Terminal-Bench 2.1 | 66.0% | 78.2% | n/a | 70.0% |
| SVG-Bench | 63.7% | n/a | 62.3% | 59.2% |
| BrowseComp (agentic browsing) | 83.5% | n/a | 79.3% | n/a |
| MCP Atlas | 74.2% | n/a | 77.0% | n/a |
A few honest caveats. MiniMax has not yet published head-to-head numbers against DeepSeek V4 or Kimi K2.6 on the same evaluation harness, so any “M3 vs DeepSeek V4” coding claim you read today is extrapolated from different runs. Likewise, Anthropic released Claude Opus 4.8 after MiniMax finalized M3’s evaluation suite, so the published comparisons are against Opus 4.7; expect an Opus 4.8 update once the field re-benchmarks. On ARC-AGI-2, M3 scores below 12%, in line with other Chinese frontier models — a real gap vs US labs on abstract-reasoning evals that MiniMax has not yet addressed publicly.
The honest positioning: M3 is competitive with closed frontier on coding and agentic browsing, ahead of every open-weight peer on long-context multimodality, and behind on raw abstract reasoning. If you need an open-weight model you can self-host and that natively handles 1M-token multimodal inputs, M3 is the only option in that exact intersection as of June 2026.
Pricing and access
M3 is available through three channels:
- MiniMax direct API at
platform.minimax.io— pay-as-you-go with tiered pricing by input length (≤512K context vs >512K). During the launch week, ≤512K usage is priced at roughly 2.1 yuan (~$0.30) per million input tokens and 8.4 yuan (~$1.20) per million output tokens at the standard tier. - OpenRouter — lists the model as
minimax/minimax-m3at $0.30/M input and $1.20/M output tokens during a 50%-off launch promo (regular pricing is $0.60 / $2.40). See the OpenRouter listing for live pricing. - Open weights — not yet uploaded to Hugging Face at the time of writing; MiniMax has committed to a release within ~10 days of the June 1 announcement. Check huggingface.co/MiniMaxAI — once M3 lands it will sit alongside M2.7, M2.5, and the earlier MiniMax-Text-01.
MiniMax also offers subscription plans for end-user product access (Plus $20/mo ≈ 1.7B tokens, Max $50/mo ≈ 5.1B tokens, Ultra $120/mo ≈ 9.8B tokens) for the consumer surface, but for engineering use you almost certainly want pay-as-you-go or OpenRouter, not the seat-based plans.
When should you use MiniMax M3 (and when shouldn’t you)?
Use M3 when:
- You need genuine 1M-token context with multimodal input. Most frontier models advertising 1M context are text-only at that length, or degrade sharply past 256K. M3’s MSA architecture is specifically engineered to keep the wall-clock cost of long-context inference manageable.
- You want an open-weight model for self-hosting. Once the weights land on Hugging Face, M3 will be the only open-weight model in its tier with native multimodality. For regulated industries or air-gapped deployments, that’s a real moat.
- You’re building an agent that browses, codes, or operates a desktop. The 83.5 BrowseComp and 59% SWE-Bench Pro are real agentic numbers, not just chat-quality numbers.
- You care about API cost per 1M tokens. At $0.30 / $1.20 promo (or $0.60 / $2.40 regular), M3 is materially cheaper than Claude Opus 4.8 or GPT-5.5 for equivalent agentic work.
Skip M3 (for now) when:
- You’re benchmarking against Opus 4.8 for pure code reasoning. Anthropic still has a ~5-point lead on SWE-Bench Pro.
- Your workload is abstract-reasoning heavy. ARC-AGI-2 results suggest Chinese frontier models including M3 lag US labs on novel-pattern tasks.
- You need text-out into multiple languages with the same fluency as English. Multilingual quality has not been independently audited on M3 yet.
- You need open weights today. Wait ~7–10 days from the June 1 launch for Hugging Face availability.
How to call MiniMax M3 from your code
The MiniMax API is OpenAI-compatible at the chat-completions level. The fastest path is via the official endpoint or via OpenRouter (which lets you keep one API key across providers).
Python via the OpenAI SDK pointing at MiniMax:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_MINIMAX_KEY",
base_url="https://api.minimax.io/v1",
)
response = client.chat.completions.create(
model="MiniMax-M3",
messages=[
{"role": "system", "content": "You are a senior backend engineer."},
{"role": "user", "content": "Refactor this Python function for clarity: ..."},
],
)
print(response.choices[0].message.content)
Or via OpenRouter (recommended if you want failover and one bill):
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax/minimax-m3",
"messages": [{"role": "user", "content": "Write a unit test for this function..."}]
}'
For multimodal calls, pass image or video URLs in the message content as a structured list (the same shape OpenAI’s vision API uses); MiniMax’s endpoint accepts both URL and base64-encoded inputs. Full automatic prompt caching is enabled by default — you don’t need to wire it explicitly — which materially lowers cost on agentic workflows that re-send the same system prompt many times.
FAQ
When was MiniMax M3 released?
MiniMax M3 was publicly announced on June 1, 2026, with the API live the same day. OpenRouter lists it under the dated slug minimax/minimax-m3-20260531.
Is MiniMax M3 open source?
The weights and technical report are scheduled for release on Hugging Face and GitHub within roughly 10 days of the June 1 launch. As of the launch date, the API is live but the weights are not yet published. The earlier MiniMax-M2.7 is already on Hugging Face for reference.
How much does the MiniMax M3 API cost?
On OpenRouter, M3 is currently $0.30 per million input tokens and $1.20 per million output tokens during a 50%-off launch promotion (regular pricing $0.60 / $2.40). The MiniMax direct API uses tiered pricing by input length, with the ≤512K-context tier priced similarly during the launch week.
What is MiniMax Sparse Attention (MSA)?
MSA is a new sparse-attention architecture that operates on a Grouped-Query Attention backbone but does block-level selection over real, uncompressed key-values. Compared to MiniMax M2, MSA delivers about 9.7x faster prefill and 15.6x faster decoding at 1M-token context with per-token compute around 1/20th of the previous generation.
How does MiniMax M3 compare to DeepSeek V4 and Kimi K2.6?
MiniMax has not published head-to-head numbers against DeepSeek V4 or Kimi K2.6 on the same evaluation harness, so any direct comparison today is extrapolated. M3 is the only open-weight model in this peer group with native multimodality at 1M context; DeepSeek V4 and Kimi K2.6 currently lead on different sub-benchmarks but lack equivalent multimodal coverage.
Can MiniMax M3 process images and video?
Yes. M3 accepts text, image, and video as input (text-only output) and was trained with mixed-modality data from the start, not with a vision adapter bolted on later. It also has a published “operate a desktop computer” capability.
What is the context window of MiniMax M3?
Up to 1 million tokens, with a guaranteed minimum of 512K. The 1M tier uses MSA to keep per-token compute and wall-clock latency feasible — that’s the headline architectural change vs MiniMax M2.
Where is the official MiniMax M3 documentation?
Official model page: https://minimax.io/models/text/m3. API docs: https://platform.minimax.io. Once published, open weights and the technical report will live on the MiniMaxAI Hugging Face organization.
Related guides
- Open-source LLMs landscape (2026) — how M3 fits into the broader open-weight frontier
- DeepSeek V4 complete guide — the other Chinese open-weight frontier model
- Kimi K2.6 complete guide — Moonshot’s competing long-context model
- Qwen 3.5 complete guide — Alibaba’s open-weight family
- Claude Opus complete guide — the closed-source coding benchmark to beat
- Self-hosting LLMs complete guide — once M3 weights ship, this is the deployment playbook
- AI coding agents complete guide — where M3’s SWE-Bench Pro / BrowseComp results actually matter