Cohere North Mini Code 1.0: Open 30B Coding Model Guide
Cohere has spent most of its life shipping closed, enterprise-focused models. North Mini Code 1.0 is a notable turn: a compact, fully open-weight coding model built specifically for agentic workflows — the kind that plan, edit files, run tools, and self-correct across a long task rather than answering a single prompt. Here is what it is, how it performs, and how to run it.
What is Cohere North Mini Code 1.0?
North Mini Code 1.0 is a 30-billion-parameter, open-weight coding model released by Cohere in June 2026 under the Apache 2.0 license — the company's first fully open-source model aimed squarely at developers. It uses a sparse Mixture-of-Experts (MoE) design so that only 3B parameters are active per token, giving you the knowledge capacity of a larger model at the inference cost of a small one.
The "North" name ties it to Cohere's North agent platform; the "Mini Code" designation signals its focus: compact enough to self-host, specialized for code and tool use rather than general chat.
What is North Mini Code's architecture?
Under the hood it is a decoder-only Transformer with a sparse MoE feed-forward stack — 128 experts, 8 activated per token. Attention is interleaved, mixing sliding-window (with RoPE) and global attention in a 3:1 ratio to keep long-context inference efficient. Feed-forward blocks use SwiGLU activations with a sigmoid router. Cohere trained it with two-stage cascaded supervised fine-tuning followed by reinforcement learning with verifiable rewards, explicitly optimizing for agentic coding rather than one-shot code completion.
What are North Mini Code's specs and benchmarks?
| Attribute | North Mini Code 1.0 |
|---|---|
| Developer | Cohere |
| Release | June 2026 |
| Total parameters | 30B (sparse MoE) |
| Active parameters | 3B per token |
| Experts | 128 total, 8 active |
| Context window | 256K input |
| Max output | 64K tokens |
| Hardware | Single H100 (FP8) |
| License | Apache 2.0 |
On Cohere's published agentic-coding evaluations, North Mini Code posts the following:
| Benchmark | Score |
|---|---|
| SWE-Bench Verified | 67.6 |
| SWE-Bench Pro | 40.2 |
| Terminal-Bench v2 | 36 |
Those numbers are strong for a model with only 3B active parameters — a SWE-Bench Verified score in the high 60s from something you can serve on one GPU is the whole point of the release. As always, treat vendor-reported benchmarks as a starting point and validate on your own repository before committing.
How do you use North Mini Code 1.0?
You have two paths, matching the open-weight-plus-hosted-API pattern:
- Open weights on Hugging Face — the model card is
CohereLabs/North-Mini-Code-1.0. In FP8 it fits on a single H100, so a self-hosted deployment is realistic for one high-memory GPU. - Cohere Chat V2 API — the hosted endpoint uses the model identifier
north-mini-code-1-0if you would rather not run infrastructure. - Ollama — available as
north-mini-code-1.0for quick local experimentation.
Because it is Apache 2.0, there are no commercial strings attached — you can fine-tune it, ship it inside a product, or wire it into an in-house coding agent. If you are evaluating where it fits alongside other assistants, our complete guide to AI coding agents covers how to slot a model like this into a real agent loop, and the open-source LLMs landscape puts it in context against the rest of the 2026 field.
Who is North Mini Code for?
The sweet spot is teams that want an owned, self-hostable coding model for agentic tasks — code review bots, CI-triage agents, refactoring pipelines, or IDE integrations — without paying per-token to a frontier vendor or sending source code to a third party. The 3B active footprint keeps inference cheap; the 256K context handles large repositories; and Apache 2.0 means you control the deployment end to end.
Frequently asked questions
Is Cohere North Mini Code free to use?
The weights are released under Apache 2.0, so self-hosting is free and commercial use is permitted. Using Cohere's hosted Chat V2 API is metered like any API. Ollama and Hugging Face give you no-cost local access.
How many parameters does North Mini Code have?
30B total parameters in a sparse Mixture-of-Experts design, with only 3B active per token (128 experts, 8 activated).
What hardware do I need to run it?
Cohere states it runs on a single H100 GPU in FP8 precision. That makes it one of the more accessible strong coding models to self-host on a single accelerator.
How does North Mini Code score on SWE-Bench?
Cohere reports 67.6 on SWE-Bench Verified and 40.2 on SWE-Bench Pro, plus 36 on Terminal-Bench v2. These are the vendor-published figures; validate against your own codebase before relying on them.
What context window does it support?
256K tokens of input and up to 64K tokens of output — enough to hold large files or multiple modules in a single agentic session.
Hiring engineers to build with open coding models?
Open, self-hostable models like North Mini Code only pay off when you have engineers who can integrate them into agent loops, evaluate them honestly, and ship. Codersera connects you with vetted remote developers who extend your team and reduce hiring risk. Hire vetted remote developers with Codersera.