Published: June 16, 2026. We refresh this guide whenever Z.ai ships a GLM-5.2 patch, OpenRouter pricing moves, or a major agent framework changes its integration.
Quick answer. GLM-5.2 is Z.ai's frontier open-weights model, live on OpenRouter as z-ai/glm-5.2. It's a reasoning model focused on coding and agentic tasks. Our hands-on Python test returned clean, runnable code in ~75 seconds (821 reasoning tokens) for $0.0041 per call. Nous Research shipped support inside days. Source: openrouter.ai, @jun_song, @Teknium.
GLM-5.2 is the model Z.ai dropped without much pre-launch fanfare and that the open-source AI community absorbed within days. It ships as open weights, lands on OpenRouter at parity with GLM-4.7 pricing, and per Z.ai's launch claim brings “significant improvements in coding and agentic tasks” plus “strong long-horizon capabilities.” Nous Research wired it into Hermes Agent the same week. This page is the single landing surface we point engineering teams to when they need to evaluate, deploy, or migrate to GLM-5.2.
TL;DR — Should you care?
- If you ship code with an LLM in the loop: yes. Z.ai positions coding as the headline improvement over GLM-5.1, and our first-impression test (below) returned clean, runs-as-is Python on the first try.
- If you run an agent loop: very much yes. Z.ai calls out agentic + long-horizon work as the second big jump, and Nous Research adopting it inside Hermes Agent within days is the strongest external signal we have.
- If you self-host: the weights are open. That's the entire pitch — you don't have to take Z.ai's word for anything, you can run the model on your own hardware and verify.
- If you're price-sensitive: at $0.0041 per non-trivial coding call on OpenRouter (our measured spend, 912 completion tokens), GLM-5.2 sits in roughly the same pricing tier as GLM-4.7. It's not the cheapest model on OpenRouter, but the price-per-quality math is competitive with anything else in the open-weights tier.
What's new vs GLM-5.1
The launch claim from Z.ai is short and specific. Three deltas vs GLM-5.1:
- Significant improvements in coding — the headline pitch.
- Significant improvements in agentic tasks — tool use, multi-step planning, the loop you actually run in production.
- Strong long-horizon capabilities — staying coherent across long task chains rather than drifting after a few steps.
Here is the launch announcement verbatim:
“Introducing GLM-5.2: Frontier Intelligence, Open Weights — Significant improvements in coding and agentic tasks. Strong long-horizon capabilities.”
— @jun_song (Z.ai), June 2026 · 1,048 likes · 166 retweets
Z.ai hasn't published a benchmark-by-benchmark comparison versus GLM-5.1 in the launch material, so we're treating those three lines as the canonical claims and verifying them ourselves rather than citing numbers we can't trace to a primary source.
Hands-on first impressions
We pinged GLM-5.2 on OpenRouter (model id z-ai/glm-5.2) with a small but representative coding task — the kind of thing a developer would actually ask:
“Write a 12-line Python function that fetches a webpage with requests, parses it with BeautifulSoup, and extracts all <h2> headings. Include error handling for network failure. No explanation, just the code.”The response came back in ~75 seconds. Worth flagging up front: GLM-5.2 is a reasoning model, so that latency includes 821 reasoning tokens of internal thinking before any output token streams. If you compare it head-to-head with a non-reasoning model like GLM-4.7-flash, GLM-5.2 will feel slower. That's not a regression; it's the architecture.
What we got back, verbatim and runnable:
import requests
from bs4 import BeautifulSoup
def fetch_h2_headings(url):
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
headings = [h2.get_text(strip=True) for h2 in soup.find_all('h2')]
return headings
except requests.exceptions.RequestException:
return None
Notes from running it: the code is clean, ships with a sensible 10-second network timeout, raises for HTTP errors before parsing (the correct order — you don't want BeautifulSoup chewing on a 500 page), strips whitespace on the extracted text, and returns None on network failure rather than raising. That is what an experienced Python developer would write. It ran first try, no edits.
The call cost $0.0041 — 912 completion tokens, of which 821 were reasoning tokens that don't appear in the visible output. Roughly the same tier as GLM-4.7. For a one-off small task that's noise; for a high-volume agent loop you'd want to budget against the reasoning token share, because you pay for those whether or not you see them.
The honest summary: quality felt frontier-class for the task we threw at it, latency is reasoning-model latency, cost is what you'd expect. We'd reach for it again on coding work without hesitation.
Adoption velocity — Hermes Agent integration in days
The external signal worth tracking on any new open-weights model is how fast the open-source agent community absorbs it. For GLM-5.2 the answer was: within days. Teknium of Nous Research posted:
“GLM 5.2 is now available in Hermes Agent from Nous Portal and OpenRouter :)”
— @Teknium (Nous Research), June 2026 · 326 likes · 15 retweets
For a model that just shipped, getting picked up by a serious open-source agent framework like Hermes Agent inside the same week is the kind of validation that benchmarks can't manufacture. Nous Research doesn't bolt on a new model unless their team has reason to think it'll actually hold up in their loop.
How to use it
Three practical routes today.
1. OpenRouter (easiest, recommended for first eval)
GLM-5.2 is live on OpenRouter as z-ai/glm-5.2. If you already have an OpenRouter key, this is one curl away:
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "z-ai/glm-5.2",
"messages": [
{"role": "user", "content": "Write a 12-line Python function that fetches a webpage with requests, parses it with BeautifulSoup, and extracts all <h2> headings. Include error handling for network failure. No explanation, just the code."}
]
}'
OpenRouter speaks the OpenAI Chat Completions shape, so anything that already targets the OpenAI SDK works with a base-URL swap and a model-id change. That includes most agent frameworks, IDE plugins, and homegrown loops.
2. Hermes Agent via Nous Portal
If you want a hosted agent loop rather than a raw API and you're already in the Nous Research ecosystem, GLM-5.2 is selectable as the underlying model inside Hermes Agent directly through Nous Portal. Use this route when you want someone else to own the tool-calling scaffolding, retries, and memory layer.
3. Self-host (the open-weights play)
Because the weights ship open, you can pull them down and serve GLM-5.2 yourself. Realistically this is a meaningful infrastructure commitment: a frontier-class reasoning model is not single-GPU friendly, and you'll want the same FP8/FP4 quantization workflow you'd use for any modern open-weights flagship. The win is full control, zero per-token vendor cost, and no rate limit other than your own hardware. For the broader playbook on running models like this, see our self-hosting LLMs guide.
Where it fits in the open-source LLM landscape
GLM-5.2 lands in the part of the open-weights map that DeepSeek V4, Qwen 3.6, and Llama 4 also occupy — the frontier-class, coding-and-agent-focused tier — rather than the small-and-cheap end where Gemma and the smaller Qwens live. The positioning question that matters is “why pick GLM-5.2 over those others,” and the honest answer is: it depends on your evaluation. The open-source frontier tier is now crowded enough that “just pick one” is leaving real quality on the table. We recommend running the same handful of tasks against two or three of these models on OpenRouter, eyeballing the outputs, and watching the cost-per-task math — this is a one-afternoon job and the differences will surface immediately.
Three useful adjacencies:
- The open-source LLM landscape as a whole. If you're trying to make sense of the full open-weights ecosystem — who ships what, when, and under what license — start with our Open-Source LLMs Landscape (2026). GLM-5.2 is one entry in a now-very-busy map.
- DeepSeek V4 for direct comparison. The closest like-for-like is DeepSeek V4 Pro: another frontier-class open-weights reasoning model, also strong on code, also self-hostable. Pricing on V4 Pro is roughly $0.435 / $0.87 per million tokens in/out on the DeepSeek API — cheaper per token than GLM-5.2 on OpenRouter, but you're on a different upstream. See our DeepSeek V4 complete guide.
- If you're picking a model for an agent. The selection criteria for coding agents are different from chat — tool-call reliability, multi-turn coherence, refusal behavior. Our AI coding agents complete guide walks through the trade-offs.
FAQ
What is GLM-5.2?
GLM-5.2 is Z.ai's June 2026 frontier open-weights model, positioned as a coding and agentic upgrade over GLM-5.1. It's a reasoning model — it generates internal reasoning tokens before producing visible output — with open weights, listed on OpenRouter as z-ai/glm-5.2.
When should I choose GLM-5.2 over GLM-4.7?
Reach for GLM-5.2 when output quality matters more than latency, especially for non-trivial coding and multi-step agent tasks. GLM-4.7 (and the glm-4.7-flash variant on OpenRouter) is still a defensible pick for high-throughput, latency-sensitive workloads where the reasoning overhead doesn't pay for itself.
Can I self-host GLM-5.2?
Yes — the weights are open. That's the whole point of Z.ai shipping it as an open-weights release. Practically you'll need GPU capacity sized for a frontier-class reasoning model, the same way you would for DeepSeek V4 Pro or Llama 4 405B. Quantization (FP8, FP4) brings the footprint down considerably.
What's the OpenRouter pricing?
Our measured spend was $0.0041 for a 912-completion-token coding response (821 of those being reasoning tokens). That's in line with GLM-4.7's pricing tier on OpenRouter. Check OpenRouter's z-ai/glm-5.2 model page for the current per-million-token rates — they update those over time.
Does GLM-5.2 work with Claude Code or Cursor?
It works with anything that lets you point at an OpenAI-compatible endpoint and pick a model id. OpenRouter exposes GLM-5.2 through the standard Chat Completions API, so a base-URL + model-id swap is generally enough. Cursor has first-class custom-model support; Claude Code is more opinionated about which providers it talks to, so check the current docs there.
Is GLM-5.2 good for agents?
Z.ai's launch message calls out agentic improvements specifically, and Nous Research integrating GLM-5.2 into Hermes Agent within days of release is meaningful external validation. We'd still run your own agent's task suite against it before swapping in production, but it earned the benefit of the doubt faster than most new releases this year.
How does it compare to DeepSeek V4?
Both are frontier-class open-weights reasoning models with strong coding pedigrees. DeepSeek V4 Pro is meaningfully cheaper on its native API ($0.435/$0.87 per million in/out) than GLM-5.2 on OpenRouter, and has a longer track record at this point. GLM-5.2 is newer and Z.ai is claiming agentic-task wins specifically. The right move is to run your own evals; both are credible picks.
How does it compare to Qwen 3.6?
Qwen 3.6 and GLM-5.2 occupy adjacent positions in the open-weights frontier tier. Qwen 3.6 has the deeper multilingual story and a bigger family of size variants; GLM-5.2's pitch is the coding + agentic + long-horizon combination. For a code-focused agent loop, we'd test both head-to-head with your actual prompts rather than pick on reputation.
Is it really “frontier”?
That's a marketing word, but the substance behind it — clean code on first try, fast adoption by Nous Research, open weights so anyone can verify — lines up with the claim. As Z.ai's @jun_song put it bluntly on launch: “If anyone posts that open-source AI is falling behind by more than 3 months, it simply means they're stupid and have no clue what they're talking about.” The gap is small or non-existent depending on the task.
Where do I get help integrating it?
If you'd rather not build the integration yourself, Codersera has vetted remote developers shipping LLM integrations every day — OpenRouter, Hermes Agent, self-hosted inference, the lot. Talk to us about hiring.
Want the full picture? Read our Open-Source LLMs Landscape (2026) — the canonical guide to the open-weights ecosystem with every model in this space ranked, compared, and updated quarterly.
Need help evaluating or deploying GLM-5.2?
Codersera connects you with vetted remote developers who ship LLM integrations daily — OpenRouter routing, agent loops, self-hosted inference. Hire a developer or partner with us.