Kimi K2.7 Code: The Complete Guide — Benchmarks, Pricing & How to Use (2026)
Quick answer. Kimi K2.7 Code is Moonshot AI's newest open-weight coding model, which appeared on Hugging Face on June 12, 2026. It's a 1-trillion-parameter Mixture-of-Experts model (32B active, 384 experts) with a 256K-token context window, released under a Modified MIT license. It's specialised for long-horizon, agentic software engineering and uses roughly 30% fewer “thinking” tokens than Kimi K2.6 while scoring higher on Moonshot's coding benchmarks. API pricing is $0.95 / $4.00 per million input/output tokens; the model ID is kimi-k2.7-code.
Moonshot AI's Kimi line has become one of the most-watched open-weight families of 2026, and Kimi K2.7 Code is its newest release — landing on Hugging Face on June 12, 2026, hot on the heels of Kimi K2.6. This is a focused, coding-and-agents release rather than a general model bump: it keeps the trillion-parameter MoE architecture, sharpens long-horizon software-engineering ability, and — notably — cuts reasoning-token usage by about 30%, which directly lowers the cost of every agentic task. This guide covers what's confirmed, the benchmark picture (and its caveats), pricing, how to call it, and how to run it yourself.
Freshness note: this guide was last updated June 12, 2026, the day Kimi K2.7 Code appeared. At the time of writing, Moonshot had not yet published a standalone announcement blog or submitted K2.7 to independent benchmark suites (SWE-bench, GPQA, etc.), so the numbers below come from the official model card. We'll update as third-party benchmarks land.
What is Kimi K2.7 Code?
Kimi K2.7 Code is the coding-specialised variant in Moonshot's K2.7 generation. Architecturally it's a large Mixture-of-Experts (MoE) transformer — only a fraction of its parameters activate per token, which is how a 1-trillion-parameter model stays affordable to serve. The headline specs, straight from the official moonshotai/Kimi-K2.7-Code model card:
| Spec | Kimi K2.7 Code |
|---|---|
| Total parameters | 1T (≈1.1T on disk) |
| Active parameters / token | 32B |
| Experts | 384 (8 selected + 1 shared per token) |
| Layers | 61 (1 dense) |
| Attention | MLA (Multi-head Latent Attention) |
| Context window | 256K (262,144 tokens) |
| Vocabulary | 160K |
| Vision | MoonViT encoder (400M) — image and video input |
| License | Modified MIT (open weights) |
| Recommended engines | vLLM, SGLang, KTransformers |
| API model ID | kimi-k2.7-code |
One important behavioural detail: K2.7 Code forces “thinking” and preserve_thinking on, and you can't turn them off. The model always reasons before answering, and it keeps its full reasoning chain across multi-turn conversations. Moonshot says this “preserve thinking” mode is what boosts performance in coding-agent scenarios where context builds up over many steps.
What's new in Kimi K2.7 vs Kimi K2.6?
K2.7 Code is an iteration on the same family, tuned for real-world, multi-step engineering work. The model card frames it directly: “substantial improvements on real-world long-horizon coding tasks… while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6.” Two things matter here:
- ~30% fewer thinking tokens. Because thinking is always on and reasoning tokens are billed as output, a 30% reduction is a direct cost cut on every task — not just a quality claim.
- Preserve-thinking across turns. The reasoning chain is retained between turns, which helps coding agents that iterate on the same task without re-deriving context each step.
Moonshot's own benchmark deltas from K2.6 to K2.7 Code (from the model card):
| Benchmark (Moonshot-reported) | Kimi K2.6 | Kimi K2.7 Code |
|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 |
| Program Bench | 48.3 | 53.6 |
| MLS Bench Lite | 26.7 | 35.1 |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 |
| MCP Atlas | 69.4 | 76.0 |
| MCP Mark Verified | 72.8 | 81.1 |
The biggest jump is Kimi Code Bench v2 (+11.1 points), and the MCP benchmarks — which measure tool-calling and Model Context Protocol workflows — show strong gains, consistent with the “Code” agentic focus.
Kimi K2.7 benchmarks: what we actually know
Here's the honest picture, because it matters for how much weight to put on the numbers. Every benchmark published for K2.7 so far is one of Moonshot's own proprietary benchmarks (Kimi Code Bench v2, Program Bench, MLS Bench Lite, MCP Atlas, MCP Mark Verified). As of June 12, 2026, there are no independent third-party numbers for K2.7 on the standard public suites — SWE-bench Verified, SWE-bench Pro, Terminal-Bench, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro.
That means two things. First, treat the scores above as vendor-reported and directional, not independently verified. Second, for a sense of where the family sits, K2.6 — the prior version — posted competitive numbers on standard suites (around 80% on SWE-bench Verified by some reports, though K2.6's own SWE-bench figures were contested across sources), and led on agentic and terminal benchmarks. K2.7 Code is positioned to improve on that for coding specifically. We'll publish a full third-party benchmark table here the moment independent results are available.
Kimi K2.7 vs Claude Opus 4.8, GPT-5.5, and DeepSeek V4
Without standard third-party benchmarks, a precise head-to-head isn't possible yet — so here's the honest, qualitative read based on confirmed specs and Moonshot's reported results:
- vs Claude Opus 4.8 / Claude Fable 5: Anthropic's flagships remain the frontier for raw capability and ship a 1M-token context (vs K2.7's 256K). But K2.7 Code is open-weight and roughly 5× cheaper per token ($0.95/$4.00 vs Opus 4.8's $5/$25), and Moonshot reports strong MCP tool-use scores — a category where coding agents live. For cost-sensitive, high-volume agentic coding, K2.7 is compelling; for the hardest single-shot reasoning, the Claude flagships still lead.
- vs GPT-5.5: GPT-5.5 is a closed generalist frontier model; K2.7 is an open, coding-specialised one. Expect GPT-5.5 to lead broad reasoning; K2.7's pitch is open weights, lower cost, and a tuned agentic-coding profile.
- vs DeepSeek V4: the closest comparison — both are open-weight Chinese MoE models targeting coding and agents. DeepSeek V4-Flash is cheaper still; the choice comes down to your own evals on your codebase. See our Kimi vs DeepSeek V4 comparison For the new-generation matchup, see Kimi K2.7 vs GPT-5.5 vs Claude Opus 4.8. for the K2.6-era head-to-head, which we'll refresh for K2.7.
Kimi K2.7 pricing and cost
Moonshot's official API pricing for kimi-k2.7-code (per 1M tokens):
| Price / 1M tokens | |
|---|---|
| Input (cache miss) | $0.95 |
| Input (cache hit) | $0.19 |
| Output | $4.00 |
The base input/output rates are unchanged from K2.6 ($0.95 / $4.00); automatic context caching gives a ~$0.19 hit rate for repeated context. The real cost story, though, is the ~30% reduction in thinking tokens: because K2.7 always reasons and those reasoning tokens bill as output at $4/1M, using fewer of them per task lowers your effective cost per completed task versus K2.6 — even though the sticker price is identical. On a forced-thinking model, token efficiency is the price cut.
How to use Kimi K2.7 (API)
The Moonshot API is OpenAI- and Anthropic-compatible, so you can point existing tooling at it with a base-URL swap. Get a key at platform.moonshot.ai (keys start with sk-).
OpenAI-compatible (Python):
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
base_url="https://api.moonshot.ai/v1",
)
resp = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[{"role": "user", "content": "Refactor this module and add tests."}],
)
print(resp.choices[0].message.content)For coding agents (Claude Code, Cline, Roo Code) via the Anthropic-compatible endpoint — set three environment variables and your agent talks to Kimi instead of Claude:
export ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
export ANTHROPIC_MODEL=kimi-k2.7-code
export ANTHROPIC_API_KEY=sk-...Cursor has no native Moonshot provider yet; connect it through the OpenAI-compatible base URL, or wait for third-party hosts like OpenRouter to mirror K2.7 (they listed K2.6 within weeks of its launch). You can also try the model family free in the web chat at kimi.com, though the free tier doesn't run the paid flagship.
How to run Kimi K2.7 locally
K2.7 Code is open-weight, so you can self-host it — but it's a genuinely large model, so be realistic about hardware. The weights ship with native INT4 quantization, and Moonshot recommends vLLM, SGLang, or KTransformers for serving (a deployment guide is on the Hugging Face repo). Key points:
- It's a 1T-parameter MoE. Full-precision weights are on the order of ~600GB on disk; even aggressive community quantizations of the K2.6-class models land around ~240GB. This is multi-GPU-server or heavy-RAM-offload territory, not a laptop model.
- INT4 is native, which helps memory, but you'll still want serious VRAM plus large system RAM if you offload experts.
- No official GGUF / Ollama / llama.cpp build for K2.7 yet. Community GGUFs existed for K2.6 (e.g. Unsloth dynamic quants) and will likely follow for K2.7 — but as of June 12, 2026 they're not out. For now, vLLM or SGLang on a GPU server is the path.
If you don't have the hardware, the API is far cheaper than the engineering time to host a trillion-parameter model — self-host mainly when data residency or privacy requires it.
Strengths and weaknesses
Strengths: open weights under a permissive-ish Modified MIT license; strong, improving agentic-coding and MCP tool-use profile; ~30% better token efficiency than K2.6 (a real cost lever); 256K context; multimodal image/video input; OpenAI- and Anthropic-compatible APIs that drop into existing agents.
Weaknesses (honest): this is the Code variant — there's no general-purpose “Kimi K2.7” or Instruct sibling at launch, so it's tuned for engineering, not broad chat. The 256K context trails the 1M of the Claude flagships. Standard third-party benchmarks aren't published yet, so capability claims rest on Moonshot's own numbers for now. It's heavy to self-host. And forced, non-disableable thinking means you can't run it in a cheap, no-reasoning mode for trivial calls.
Is Kimi K2.7 free, open source, and safe to use?
Open weights, yes — “free” with nuance. The weights are published under a Modified MIT license and you can run them yourself at no licence cost; the hosted API is paid per token. As with earlier Kimi releases, check the licence text for any attribution requirement at very large commercial scale before you build on it.
Privacy / trust. Moonshot AI is a China-based lab, and the hosted API processes your prompts on Moonshot's infrastructure. For sensitive or proprietary source code, review Moonshot's data-handling terms — or use the open weights to self-host, which keeps everything in your environment. For general and open-source work, the hosted API is convenient and inexpensive.
Who should use Kimi K2.7?
Use it when you want a low-cost, open-weight model for high-volume agentic coding and tool-use workflows, you're cost-sensitive about a forced-thinking model (the token efficiency pays off), or you need to self-host a capable coding model for data-residency reasons. Look elsewhere when you need the absolute frontier on hard reasoning (Claude Opus 4.8 / Fable 5, GPT-5.5), a 1M-token context, a proven third-party benchmark record today, or a general-purpose chat model rather than a coding specialist.
FAQ
Is Kimi K2.7 better than Kimi K2.6?
For coding and agentic work, yes on Moonshot's own benchmarks — Kimi Code Bench v2 rises from 50.9 to 62.0, MCP Mark Verified from 72.8 to 81.1, and it uses about 30% fewer thinking tokens than K2.6. It's a coding-focused (“Code”) release, so for general chat the difference is less relevant.
Is Kimi K2.7 free and open source?
The weights are open under a Modified MIT license, so you can download and self-host them at no licence cost. The hosted Moonshot API is paid per token. There's a free web chat at kimi.com, but it doesn't run the paid flagship.
How much does the Kimi K2.7 API cost?
$0.95 per million input tokens (cache miss), $0.19 per million on a cache hit, and $4.00 per million output tokens — the same base rates as K2.6, but with ~30% fewer thinking tokens per task, so effective cost per task is lower.
Can I run Kimi K2.7 locally, and what hardware do I need?
Yes — it's open-weight with native INT4 quantization, served via vLLM, SGLang, or KTransformers. But it's a 1T-parameter MoE (≈600GB full precision, ~240GB heavily quantized in the K2.6 class), so it needs a multi-GPU server or heavy RAM offload, not a laptop. No official GGUF/Ollama build exists yet as of June 12, 2026.
What is Kimi K2.7's context window?
256K tokens (262,144). That's large, though smaller than the 1M-token windows on the current Claude flagships.
Does Kimi K2.7 support MCP and tool calling?
Yes — tool use and the Model Context Protocol are a core focus of the “Code” variant, and its biggest reported gains are on MCP benchmarks (MCP Atlas 76.0, MCP Mark Verified 81.1). It works with coding agents like Claude Code, Cline, and Roo Code via the Anthropic-compatible endpoint.
How do I access Kimi K2.7 and what's the model ID?
Use the Moonshot API at https://api.moonshot.ai/v1 (OpenAI-compatible) or https://api.moonshot.ai/anthropic (Anthropic-compatible), with the model ID kimi-k2.7-code. Get a key at platform.moonshot.ai. The open weights are on Hugging Face at moonshotai/Kimi-K2.7-Code.
Is there a general-purpose (non-coding) Kimi K2.7?
Not at launch. As of June 12, 2026, only Kimi-K2.7-Code is published — there's no separate Instruct, Thinking, or generalist K2.7 sibling yet. For general use, Kimi K2.6 remains the broader model.
New to the Kimi family? Start with our Kimi K2.6 complete guide, then compare the lineup in Kimi vs GPT-5.5 vs Claude Opus 4.8 and Kimi vs DeepSeek V4.