Cohere Command A+: Launch Guide (May 2026)

Cohere released Command A+ on May 20, 2026: a 218B sparse Mixture-of-Experts model with 25B active parameters, Apache 2.0 licensed, that runs on as few as 2 H100 GPUs. Built for sovereign, on-prem enterprise agents with native citations.

Published 26 May 2026 • Updated 26 May 2026 • 7 min read

Quick answer. Cohere Command A+ is a 218B-parameter sparse Mixture-of-Experts model (25B active) released May 20, 2026 under Apache 2.0. It runs on as few as 2 H100 GPUs at W4A4 quantization, supports 48 languages and a 128K context window, ships with native citation grounding, and is positioned for sovereign, on-prem enterprise agentic workflows. Weights live on Hugging Face as CohereLabs/command-a-plus-05-2026.

Cohere shipped Command A+ on May 20, 2026, and it is the most consequential open-weight release the company has ever made. It is the first Cohere model published under a true Apache 2.0 license, the first to break the 200B-parameter barrier from the lab, and the first to unify reasoning, vision, multilingual, and tool-use behaviour into one set of weights. This guide covers what it is, how it performs, how to deploy it, and when it is the right choice over Llama 4, Qwen 3.5, or DeepSeek V4.

What is Command A+?

Command A+ is a decoder-only sparse Mixture-of-Experts (MoE) transformer with 218 billion total parameters and 25 billion active parameters per token. The model uses 128 experts with 8 activated per token plus one shared expert that runs on every token. It accepts text and images and emits text, reasoning traces, tool calls, and grounded citations.

The headline release specs:

Total / active parameters: 218B / 25B
Context window: 128K input tokens, 64K max generation
Languages: 48 (up from 23 in Command A)
Modalities: Text + image in; text out
License: Apache 2.0 (commercial use, no revenue caps, no non-compete)
Weights: CohereLabs on Hugging Face in BF16, FP8, and W4A4 quantizations
Minimum hardware: 2x H100 (W4A4) or 1x B200

Command A+ replaces and consolidates four earlier Cohere models in one set of weights: Command A (text), Command A Reasoning, Command A Vision, and Command A Translate. That consolidation is the practical reason most existing Cohere customers will move to it on the next deployment cycle.

What changed versus Command A (March 2025)?

Command A landed in March 2025 as a 111B dense model with 256K context, 23 languages, and a Cohere-source-available license that capped commercial use. Command A+ is a different beast:

Architecture: 111B dense → 218B sparse MoE (25B active). Bigger total capacity, similar serving cost.
Context: 256K → 128K input + 64K output. A deliberate trade: shorter context but much stronger long-output reasoning and tool-use traces.
Languages: 23 → 48, including all official EU languages and improved non-European efficiency.
License: Source-available with revenue caps → Apache 2.0, no caps, no restrictions.
Modalities: Text → text + vision (charts, PDFs, slides, spreadsheets) in one set of weights.
Citations: Optional RAG add-on → native grounded citations in every output.
Hardware: 2x A100/H100 → 2x H100 at W4A4, or 1x B200.

The MoE move is the structural change. Cohere is now spending parameter budget the same way DeepSeek and Qwen do: scale total capacity, keep active-per-token small, and let routing handle specialisation.

What makes Command A+ different from other open models?

There are three things Command A+ does that no other open frontier model does end-to-end today.

1. Native citation grounding. Every generated claim can be tied to an explicit grounding span in the source documents. This is not RAG-on-top; it is trained into the model. For regulated industries (legal, healthcare, finance, public sector) this is the single biggest selling point: you can show your auditor exactly which sentence in which retrieved document supports each output token.

2. Lossless W4A4 quantization at 218B scale. Cohere claims (and benchmarks confirm) imperceptible quality differences between BF16, FP8, and the W4A4 4-bit weight / 4-bit activation variant. That is what compresses the model to fit on 2x H100s. Most other open MoE models in this class either don't ship a 4-bit-activation variant at all or take a measurable benchmark hit when they do.

3. Apache 2.0 with no asterisks. Llama 4 ships under the Llama Community License (revenue cap, naming clauses). Qwen 3.5 ships under the Tongyi Qianwen License (use-case restrictions). DeepSeek V4 is MIT-leaning but still has model-card limitations. Command A+ is the first model at this capability tier under a permissive, OSI-approved license with zero downstream conditions. For sovereign deployments, that matters more than two benchmark points.

How does Command A+ compare to Llama 4 and Qwen 3.5?

Cohere's published numbers position Command A+ at the top of the open-weight pack on agentic and reasoning benchmarks, while staying competitive on raw math. Headline jumps versus Command A Reasoning:

τ²-Bench Telecom (agentic): 37% → 85%
Terminal-Bench Hard (agentic coding): 3% → 25%
AIME 2025 (math): 57% → 90%
Artificial Analysis Intelligence Index: 37

Versus the broader open-weight field:

vs Llama 4 Maverick: Command A+ wins decisively on agentic / tool-use benchmarks and citation grounding. Llama 4 Scout still wins on raw context length (10M tokens vs 128K) for needle-in-haystack tasks.
vs Qwen 3.5: Qwen 3.5 ties or edges out Command A+ on pure GPQA Diamond reasoning (88.4% with only 17B active) and is cheaper to host. Command A+ is materially better on multilingual coverage (48 vs ~29) and is the only one of the two with native citations.
vs DeepSeek V4: DeepSeek V4-Pro still leads on raw code generation (93.5% LiveCodeBench) and on cost-per-token at API providers. Command A+ wins on enterprise features: vision, citations, sovereign deployment story, and Apache 2.0.

If you are picking by use case: raw coding agent → DeepSeek V4. Cheapest hosted inference → Qwen 3.5. Long-context retrieval → Llama 4 Scout. Auditable enterprise RAG / on-prem → Command A+.

One latency note worth flagging: independent measurements put Command A+ in the lowest-latency tier of open MoE models in its weight class, alongside the smaller Qwen 3.5 variants and NVIDIA's Nemotron Nano. That's a direct consequence of 25B active parameters plus W4A4 — for agentic loops that issue many short tool calls, the tokens-per-second matter more than peak benchmark scores.

How do I deploy Command A+?

There are four shipping paths, in increasing order of operational lift.

1. Cohere's hosted API. Easiest for prototyping. The model is available through Cohere's API and a free Hugging Face Space demo. Use this to validate the prompt-shape and the citation output, then move to your own infra.

2. Hugging Face Inference Endpoints. Pick the CohereLabs/command-a-plus-05-2026-w4a4 repo, attach 2x H100s, deploy. Roughly 30 minutes from zero to a private endpoint.

3. vLLM self-host. The recommended pattern for production. Requires vllm >= 0.21.0, transformers, and cohere_melody >= 0.9.0 (Cohere's parser library for tool calls and reasoning spans). Launch with:

vllm serve CohereLabs/command-a-plus-05-2026-w4a4 \
  -tp 2 \
  --tool-call-parser cohere_command4 \
  --reasoning-parser cohere_command4 \
  --enable-auto-tool-choice

Adjust -tp (tensor parallelism) to match your GPU count. For 2x H100s, -tp 2 is the canonical setting.

4. Air-gapped / sovereign. Pull the weights once, mirror to an internal registry, deploy in a VPC, on-prem rack, or fully air-gapped environment. The Apache 2.0 license is what makes this clean: no phone-home, no telemetry, no licensing server.

Command A+ is also listed in Microsoft Foundry's model catalogue for customers who already standardised on Azure AI infrastructure.

What's the licensing?

Apache 2.0. Full stop. You can use, fine-tune, distil, deploy commercially, sell access to, and air-gap Command A+ with no revenue caps, no field-of-use restrictions, and no acceptable-use addenda. Cohere has explicitly framed this as their commitment to sovereign infrastructure: customers running classified, regulated, or critical workloads cannot accept a model whose license can change underneath them. Apache 2.0 closes that risk vector.

The practical impact for builders: Command A+ weights can be fine-tuned on confidential internal data, redistributed inside an organisation, baked into commercial products, or vendored into an on-prem appliance. None of those required a separate license conversation, and none can be revoked.

When should you choose Command A+?

Pick Command A+ when at least two of the following are true:

You are building RAG with auditability requirements — legal, healthcare, finance, public sector — and need citation spans on every claim.
You need true sovereign deployment (VPC, on-prem, air-gapped) and cannot ship data to a hosted API.
You are building a multilingual agent serving 25+ languages, especially across EU official languages.
You want one set of weights for reasoning + vision + tool use + translation rather than maintaining a model zoo.
You need an Apache 2.0 license, not a community / commercial-with-caveats license.

Skip it if you are optimising for the cheapest possible hosted inference (Qwen 3.5 is cheaper), pure coding agent quality (DeepSeek V4 still leads), or massive single-document context windows beyond 128K (Llama 4 Scout's 10M context wins). For most enterprise teams building agents on private data, Command A+ is now the default open-weight choice.

FAQ

Is Command A+ truly open source?

Yes. It is released under the Apache 2.0 license, which is OSI-approved and imposes no revenue caps, naming requirements, or use-case restrictions. This is a stronger open-source posture than Llama 4 (community license) or Qwen 3.5 (Tongyi Qianwen license).

What hardware do I need to run Command A+?

Minimum 2x NVIDIA H100 GPUs at W4A4 4-bit quantization, or a single NVIDIA Blackwell B200. FP8 needs 4x H100 or 2x B200. BF16 needs 8x H100 or 4x B200. Cohere recommends W4A4 for most deployments because benchmark quality is essentially identical to BF16.

How does Command A+ handle citations?

Citations are produced natively, not bolted on. The model generates explicit grounding spans that map each claim in the output to specific text spans in the retrieved source documents. This is what makes Command A+ usable in regulated industries where every claim needs an audit trail.

Can I fine-tune Command A+ on private data?

Yes. The Apache 2.0 license explicitly permits modification, fine-tuning, and redistribution. You can fine-tune on confidential internal data, deploy the fine-tuned weights to your own infrastructure, and use the resulting model commercially without any conversation with Cohere.

How does Command A+ compare to GPT-5.5 or Claude Opus 4.7?

It does not match the absolute frontier of closed models on every benchmark, particularly on the hardest reasoning tasks. But it closes most of the gap to within single-digit benchmark points, and it gives you full weight ownership, sovereign deployment, and Apache 2.0 licensing — none of which closed-weight providers can match.

What languages does Command A+ support?

48 languages, including all official EU languages plus Arabic, Chinese, Japanese, Korean, Hindi, Hebrew, Persian, Turkish, Vietnamese, Indonesian, and more. Cohere specifically improved Arabic dialect matching and non-European-language efficiency in this release.

Where can I download Command A+?

From Cohere Labs on Hugging Face. The W4A4 quantization is at CohereLabs/command-a-plus-05-2026-w4a4. BF16 and FP8 variants are published alongside. Also available via Cohere's API and Microsoft Foundry.