Kimi K2.7 vs GLM 5.2: Two Chinese Open-Weights Flagships Compared (2026)

Moonshot's Kimi K2.7 Code and Z.ai's freshly-released GLM 5.2 are both Chinese open-weights coding flagships, both shipped in June 2026, and they trade on opposite axes. K2.7 leads on MCP tool use and pricing; GLM 5.2 leads on 1M context. We pick per workload.

Published 15 Jun 2026 • Updated 15 Jun 2026 • 10 min read

Quick answer. Both shipped in June 2026 as open-weights coding flagships from Chinese labs. Kimi K2.7 Code (Moonshot) leads on MCP tool use (76.0 MCP Atlas, 81.1 MCP Mark Verified), per-token pricing ($0.95 / $4 with $0.19 cache-hit), and proven inference engineering. GLM 5.2 (Z.ai) leads on context window (1M vs 256K) and the GLM Coding Plan flat-rate subscription model. Pick K2.7 for MCP-heavy agents; pick GLM 5.2 for repo-scale full-codebase awareness.

For the first time, the most interesting open-weights coding showdown in mid-2026 is between two Chinese flagships shipped in the same month. Moonshot's Kimi K2.7 Code and Z.ai's GLM 5.2 are both genuinely usable, both ship permissive open weights, and they take very different paths to the same agentic-coding destination. K2.7 bets on MCP tool use, reasoning-token efficiency, and prompt-cache economics. GLM 5.2 bets on a usable 1M-token context window and Coding Plan flat-rate pricing. Here's how they compare where it matters.

Kimi K2.7 vs GLM 5.2: at a glance

Dimension	Kimi K2.7 Code	GLM 5.2
Maker	Moonshot AI (China)	Zhipu Z.ai (China)
Released	June 2026	June 13, 2026
License	Modified MIT (open weights)	MIT (open weights, week after launch)
Architecture	1T-param MoE (32B active), 384 experts	700B+ MoE class
Context window	256K	1,000,000 (usable)
Max output	~64K	131,072
API pricing	$0.95 / $4.00 per M (cache-hit input $0.19)	Coding Plan flat-rate sub; standalone API in week of launch
Multi-modal	Vision (MoonViT 400M) + code	Text + code only
Coding positioning	Agentic + MCP-first + cache-friendly	Agentic + 1M repo-scale

What do we actually know about their coding quality?

Both shipped with vendor benchmarks only. As of mid-June 2026, neither has independent third-party numbers on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, Terminal-Bench, or AIDER Polyglot.

Kimi K2.7 — Moonshot's published numbers:

Kimi Code Bench v2: 62.0 (up from 50.9 on K2.6)
Program Bench: 53.6 (up from 48.3)
MLS Bench Lite: 35.1 (up from 26.7)
MCP Atlas: 76.0 (up from 69.4)
MCP Mark Verified: 81.1 (up from 72.8)
~30% fewer thinking tokens vs K2.6 on equivalent tasks

GLM 5.2 — no benchmarks at launch. Zhipu hasn't published any. The parent (GLM 5.1) set the bar high: 58.4 on SWE-Bench Pro (state-of-the-art at the time, narrowly ahead of GPT-5.4 and Claude Opus 4.6), 63.5 on Terminal-Bench 2.0, 68.7 on CyberGym, 70.6 on τ³-Bench. If GLM 5.2 holds those gains while adding a 1M context window, it has the makings of a serious flagship. The honest answer: until independent numbers land, that “if” is load-bearing.

So “which is better at coding” depends on what you trust. K2.7 has vendor benchmarks that show clear gains over K2.6 on MCP tool use. GLM 5.2 inherits GLM 5.1's well-documented strengths on long-horizon agentic + Terminal-Bench-style work, with the new 1M window as the headline addition. Neither has been re-run by third parties at the time of writing.

Is the 1M context window actually useful?

It depends on whether you're doing repo-scale or task-scale work.

For a typical SaaS bug fix — read three files, change two, write a test — 256K is plenty. Kimi K2.7 handles this all day at materially lower per-task cost than GLM 5.2's Coding Plan tier. The 1M window is wasted on this workload.

For an agent that needs to understand a 200-file service before proposing a refactor, the math flips. At ~3K tokens per file, 200 files is 600K tokens — over 2× K2.7's standard window. You either use RAG (which costs context fidelity), prune aggressively (which costs correctness), or use a model that fits the input. GLM 5.2 is now the only credible open-weights answer in that range.

The right question to ask first: does your agent need to think across the entire codebase, or against a focused slice of it? That determines which model fits before pricing even enters the picture.

What do the token economics look like?

K2.7 ships with simple per-token pricing: $0.95 / $4.00 per M on Moonshot native, $0.75 / $3.50 on OpenRouter. The killer feature is cache-hit input at $0.19 per M — for agents that re-traverse the same codebase repeatedly, input cost is nearly free.

GLM 5.2 ships behind the GLM Coding Plan (Lite / Pro / Max / Team), a flat monthly subscription. Standalone per-token API arrives in the week after launch; based on GLM 5.1's API rates, expect $1-2 input / $3-6 output range — meaningfully more than K2.7 native pricing.

The structural difference: K2.7 wins clean on pay-per-token economics. GLM 5.2's flat-rate plan caps total monthly spend predictably regardless of usage spikes — for shops with bursty inference (one engineer kicks off a refactor at 3pm and burns 50M tokens), that's structurally cheaper than per-token. For steady predictable inference, K2.7's per-token rate plus cache-hit discount likely wins.

How does the MCP tool-use axis compare?

K2.7 was explicitly tuned against MCP Atlas and MCP Mark Verified, with the +6-8 point gains over K2.6 reflecting real workflow improvements. If your agent uses MCP-style multi-server tool orchestration, K2.7 is the better-aimed model.

GLM 5.2 inherits GLM 5.1's strong tool-use story (Asynchronous Agent RL training across 10,000+ verifiable environments, nine languages) and was designed for sustained 8-hour autonomous coding sessions. Its strength is long-horizon planning across a wide context.

The shapes are different: K2.7 is best for MCP-orchestrated multi-tool loops at moderate context. GLM 5.2 is best for long-horizon agent loops with full-codebase awareness.

Self-hosting: the two paths compared

Both ship open weights, both target a similar GPU class (8×H100 for serviceable serving), but the maturity timelines differ.

K2.7 weights are available at release. Moonshot's inference engineering is mature; vLLM, TensorRT-LLM, and SGLang support arrives reliably within 1-2 weeks of past Kimi releases.

GLM 5.2's MIT-licensed open weights land the week after the June 13 launch. Engine support follows by 1-2 weeks. Hosted endpoints (Together, Fireworks, DeepInfra, Groq) typically light up Z.ai releases within 7-14 days.

For the broader self-hosting playbook see our self-hosting LLMs guide.

Who should pick Kimi K2.7?

MCP-orchestrated agent stacks. The 76.0 MCP Atlas / 81.1 MCP Mark Verified scores reflect real workflow improvements over K2.6.
Workloads with high prompt-cache reuse. $0.19 per M cache-hit input tokens makes re-traversal of the same codebase nearly free on input cost.
Thinking-token-bound runs. 30% fewer thinking tokens vs K2.6 compounds over a fleet of agentic runs.
Pay-per-token economics. Flat predictable per-token rates without a subscription floor.
Mixed text + image inputs. MoonViT vision encoder is included.

Who should pick GLM 5.2?

Repo-scale agents on monorepos. 1M-token context is the only credible open-weights answer for full-codebase awareness without RAG.
Shops already on the GLM Coding Plan. The flat-rate predictability + bundled Z.ai tooling is the package.
Long-horizon autonomous coding sessions. 8-hour sustained-execution claims inherited from GLM 5.1.
Teams that want the freshest open-weights research target. MIT license, fresh release, full fine-tuning leverage.

The decision in a line

Repo-scale agent on a 500-file monorepo? GLM 5.2. MCP-orchestrated multi-tool loop on a focused codebase slice? K2.7. Workload in between? Run both side-by-side on 100 representative tasks; per-task cost and per-task quality gaps both reveal themselves in fewer than 10 runs.

Post-launch reality (June 15, 2026)

Two days after Z.ai shipped GLM 5.2 on June 13, here is what is actually confirmed vs still pending. We are pulling from the launch announcement, the Hacker News reception thread, vendor docs, and early third-party reviewers.

What is live today on the Coding Plan

GLM 5.2 access ships included on every Coding Plan tier at no extra cost: Lite $10/mo, Pro $30/mo, Max $80/mo, plus seat-based Team pricing. Quarterly billing drops the same tiers to roughly $27 / $81 / $216 per quarter.
Drop-in tool integrations confirmed at launch: Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code — all via the OpenAI-compatible endpoint (three settings.json changes for Claude Code; nothing custom needed).
Cursor, Continue and Aider are NOT yet wired. Cursor has an open community thread requesting GLM-5 support but no merged work; expect community config repos in the weeks after the open-weights drop.
Two thinking-effort levels exposed: High and Max — no Low/Auto. Thinking adds roughly 30-80% to first-token latency and roughly halves throughput on long runs.

What is still pending (as of June 15)

Standalone per-token API not yet live on open.bigmodel.cn / z.ai/pricing. Z.ai said "next week" on launch day. For sizing, GLM 5.1 standalone runs $1.40 input / $4.40 output per M tokens; expect GLM 5.2 to land near or below that.
MIT-licensed open weights not yet on Hugging Face. Promised "next week" — track huggingface.co/zai-org for the GLM-5.2 repo and a matching GLM-5.2-FP8 companion, mirroring the 5.1 release pattern.
Hosted-provider endpoints (Together, Fireworks, DeepInfra, Groq, OpenRouter) — none list GLM 5.2 yet because the weights are not public. Expect 3-10 day catch-up after the MIT drop based on the GLM 5.1 cadence; Fireworks and DeepInfra were first on 5.1.
chat.z.ai still serves GLM 5.1 in the free chatbot tier; 5.2 chatbot rollout is part of the same "next week" batch.

What independent benchmarks exist

Honest answer: none on the standard suites yet. As of 48 hours post-launch no third party has published SWE-bench Verified, SWE-bench Pro, LiveCodeBench, Terminal-Bench 2.0, AIDER Polyglot, GPQA Diamond, or HumanEval scores specifically for 5.2. Artificial Analysis, vals.ai, lmcouncil.ai and the SWE-bench Pro Leaderboard all show GLM 5.1 as the most recent Zhipu entry. Anyone quoting a SWE-bench number for 5.2 right now is conflating it with 5.1.

What we DO have: the GLM 5.1 baseline holds well — 58.4 on SWE-Bench Pro (state-of-the-art at that time, narrowly ahead of GPT-5.4 and Claude Opus 4.6), 63.5 on Terminal-Bench 2.0 standalone (66.5 with Claude Code scaffolding), 68.7 on CyberGym, 70.6 on τ³-Bench, 71.8 on MCP-Atlas Public Set. If 5.2 holds these gains while extending to 1M context, it is a peer-class flagship; that is the bet community devs are taking until the third-party runs land.

Community sentiment after the first 48 hours

The Hacker News reception thread (269+ points, 146 comments within hours) split into two consistent camps:

Positive — "punches above its weight" on UI/design code, code taste, and modern conventions. One commenter described shipping a non-trivial GTK/Rust/Lua app where "GLM wrote 93%." Another flagged 1M context as the upgrade most likely to matter in practice: stop chunking files, just dump the relevant subset.
Cautious — "about six months behind the frontier labs, similar to Opus in January" on architecture-heavy, multi-file reasoning. Run-to-run variance and harness sensitivity (Terminal-Bench swung 40.4% → 48.3% on GLM 5 depending on agent wrapper) are unresolved carry-overs from earlier GLM releases.

The HN top comment captures the practical verdict: "Test it today if you are already on the Coding Plan; do not rebuild your stack around it until third-party benchmarks land next week."

Architecture details that matter for capacity planning

Same architecture family as GLM 5/5.1: 744B total parameters / ~40B active per token, 384 experts, 61 layers with Multi-head Latent Attention, DeepSeek Sparse Attention for the long context, 28.5T pretrain tokens. For self-host capacity planning the practical numbers are:

BF16 weights: ~1.65 TB on disk
FP8 weights: ~800 GB on disk
AWQ/GPTQ INT4: ~200 GB on disk
Production sweet spot: 8× H200 SXM (1,128 GB HBM) at FP8 with room for the 1M-token KV cache. 8× H100 80GB (640 GB) is too tight for FP8 + long context — works only at ≤128K with aggressive KV offload.
vLLM and SGLang already have GLM 5/5.1 recipes that 5.2 will load on the same code paths once the config drops. TensorRT-LLM lags by a few weeks on new architectures.

Legal and compliance notes

The MIT license, when it ships, has no field-of-use restrictions, no MAU threshold, and no acceptable-use clause. The only obligations are the standard copyright-notice + no-warranty boilerplate.
Zhipu has been on the US BIS Entity List since January 15, 2025. Downloading and using MIT-licensed open weights is not a regulated export under current EAR readings, BUT US federal customers and most defense primes will not approve a Chinese-origin model regardless of license — treat as effectively blocked for FedRAMP, DoD, and IC workloads.
EU AI Act: GLM 5.2 is a GPAI model with likely systemic-risk-tier compute (10^25 FLOPs). Zhipu has not signed the GPAI Code of Practice and has not published a model card or training-data summary, which leaves the full Article 53 burden on downstream EU deployers. Finance, health and critical-infrastructure use cases need to wait for Annex XI documentation.

Bottom line vs Kimi K2.7 Code: different bets, both credible. K2.7 wins on MCP tool use (76.0 MCP Atlas / 81.1 MCP Mark Verified — currently the strongest open-weight MCP scorer), cache-hit input pricing ($0.19/M), and immediately-available open weights. GLM 5.2 wins on context window (1M vs 256K), the flat-rate Coding Plan for bursty workloads, and the inherited GLM 5.1 SWE-Bench Pro / Terminal-Bench leadership. Repo-scale agent on a monorepo? GLM 5.2. MCP-orchestrated multi-tool loop on a focused codebase slice? K2.7.

FAQ

Is Kimi K2.7 or GLM 5.2 better for coding?

Both shipped in June 2026 with vendor-only benchmarks. K2.7's strongest claims are around MCP tool use and reasoning-token efficiency. GLM 5.2 inherits GLM 5.1's well-documented long-horizon agentic strengths plus the new 1M context window. Until independent third-party benchmarks land for both, the “better” verdict is workload-shape dependent.

Which model has the bigger context window?

GLM 5.2 at 1,000,000 tokens vs K2.7's 256K — roughly 4× larger. For repo-scale agents on large monorepos, GLM 5.2 is the only realistic open-weights option in that range.

Which is cheaper to run?

On per-token API pricing, K2.7 native ($0.95 / $4) is cheaper than GLM 5.2's expected standalone API rates ($1-2 / $3-6 based on GLM 5.1 baseline). On flat-rate subscription, the GLM Coding Plan can be materially cheaper for bursty heavy usage.

Can I self-host both models?

Yes. Both ship MIT-class open weights and need an 8×H100 node for serviceable serving at full context.

Does either support image inputs?

K2.7 has MoonViT (400M vision encoder). GLM 5.2 is text + code only. Z.ai has separate multi-modal models (GLM-Vision family).