GLM Image: Complete 2026 Guide (CogView4 / GLM-4V Verified)

Master GLM-Image with our comprehensive 2026 guide covering installation, VRAM requirements, CVTG-2K benchmarks, pricing at $0.015/image, and detailed comparisons with FLUX, Midjourney, and DALL-E 3.

Published 15 Jan 2026 • Updated 11 May 2026 • 14 min read

GLM- Image

Quick answer. GLM-Image is Z.ai's open-source 16B hybrid image model — a 9B autoregressive generator plus a 7B diffusion decoder, Apache 2.0, released January 2026. It hits 91% word accuracy on CVTG-2K, beating FLUX.1 Dev and GPT Image 1 at text-in-images. API runs about 1.5 to 3 cents per image. Not GLM-4V or GLM-5.1.

Last updated April 2026 — refreshed for current model versions, accurate VRAM figures, and corrected GitHub statistics.

GLM-Image is Z.ai's 16-billion-parameter hybrid image generation model that combines a 9B autoregressive generator with a 7B diffusion decoder, making it the strongest open-source model for generating images with accurate, legible text. Released January 14, 2026, it achieves 91.16% word accuracy on the CVTG-2K benchmark — significantly ahead of FLUX.1 Dev (49.65%) and GPT Image 1 (85.69%). If your workflow involves infographics, technical diagrams, Chinese text, or commercial posters requiring readable typography, GLM-Image is the current best open-weights option.

What changed in 2026 — things a 2025 reader needs to knowRelease date confirmed: January 14, 2026 — Z.ai (formerly Zhipu AI, now publicly traded on Hong Kong Stock Exchange) released GLM-Image under MIT license.VRAM corrected: Peak usage on H100 at 1024×1024 is approximately 37–38 GB, not 80 GB. CPU offload mode works at ~23 GB, making the model accessible on a single A100 40 GB or dual A6000 setup.Pipeline class name changed: The correct import is GlmImagePipeline from diffusers.pipelines.glm_image, not GLMImagePipeline. Use torch.bfloat16 (not float16) per official documentation.Midjourney v8 launched March 2026: The comparison table previously used Midjourney v7. V8 Alpha launched March 17, 2026 with improved text rendering and 5× faster generation. Benchmarks in this guide reflect v8.Z.ai SDK available: pip install zai-sdk==0.2.2 provides a clean Python client; the API endpoint is https://api.z.ai/api/paas/v4/images/generations.Image quality limitation acknowledged: GLM-Image leads on text accuracy but lags on photorealistic aesthetics (skin, fur, natural textures) versus Midjourney v8 and FLUX.1 Pro. Know where to use it and where not to.

Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.

TL;DR

Question	Answer
What is GLM-Image?	16B open-source hybrid AR+diffusion model by Z.ai, best for text-in-image tasks
Release date	January 14, 2026
License	MIT (Apache 2.0 for VQ tokenizer/VIT weights)
API cost	$0.015/image via Z.ai API; free tier: 2 images free
VRAM (self-hosted)	~37–38 GB peak on H100; ~23 GB with CPU offload
Best use case	Posters, infographics, technical diagrams, Chinese-text content
Not ideal for	Photorealistic portraits, nature photography, artistic/creative images
HuggingFace repo	`zai-org/GLM-Image`

Architecture: Why It Works

GLM-Image's core innovation is a two-stage pipeline that separates semantic understanding from visual detail rendering — a departure from pure diffusion models like FLUX and Stable Diffusion 3.5.

Component	Parameters	Function	Token Processing
Autoregressive Generator	9B	Semantic planning, layout, text positioning	~256 compact tokens
Diffusion Decoder	7B	Detail refinement, texture, text stroke rendering	1,000–4,000 expanded tokens
Glyph Encoder (ByT5-based)	Embedded in decoder	Character-level typography accuracy	Per-region text encoding
Total Model	16B	End-to-end text-to-image / image-to-image	Two-stage pipeline

Key Technical Innovations

Compact Token Encoding: The autoregressive component generates approximately 256 semantic tokens representing layout, composition, and text placement before handing off to the diffusion decoder. This reduces computational overhead while preserving semantic integrity.
Glyph-ByT5 Encoder: A character-aware text encoder (originally from the ECCV 2024 Glyph-ByT5 paper) is embedded in the diffusion decoder. It encodes character-level glyph information and aligns it with visual signals, enabling precise typography — including Chinese, Arabic, and other scripts — at high accuracy.
MRoPE (Multi-dimensional Rotary Position Embedding): Handles interleaved text-image understanding, allowing the model to reason about spatial relationships between text elements and visual components.
Block-Causal Attention: Enables native image-to-image editing by attending to specific image regions while maintaining causal generation order.
GRPO Post-Training: Decoupled reinforcement learning using the GRPO algorithm — the autoregressive module receives aesthetic/semantic feedback, while the diffusion decoder receives high-frequency feedback for detail fidelity and text accuracy.

Installation: Two Paths

Method 1: HuggingFace Diffusers (Self-Hosted)

Prerequisites:

Python 3.10 or higher
CUDA-compatible GPU: minimum ~23 GB VRAM (with CPU offload) or ~37–38 GB peak without offload. NVIDIA H100 (80GB) or A100 (40GB/80GB) recommended for production use.
Install from source (stable releases do not yet include GLM-Image):

# Create isolated environment
conda create -n glm-image python=3.10
conda activate glm-image

# Install PyTorch with CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install diffusers and transformers from source (required — not yet on stable PyPI)
pip install git+https://github.com/huggingface/transformers.git
pip install git+https://github.com/huggingface/diffusers.git
pip install accelerate

Text-to-Image Generation:

import torch
from diffusers.pipelines.glm_image import GlmImagePipeline

# Note: use GlmImagePipeline (not GLMImagePipeline) and bfloat16 (not float16)
pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)

# Important: text to be rendered must be enclosed in quotation marks
# Resolution must be divisible by 32
image = pipe(
    prompt='A scientific poster titled "Water Cycle" showing evaporation, condensation, and precipitation with labels',
    height=1024,    # must be divisible by 32
    width=1024,     # must be divisible by 32
    num_inference_steps=50,
    guidance_scale=2.5,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

image.save("water_cycle.png")

Image-to-Image (Editing) Example:

from PIL import Image

pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)

source = Image.open("product_photo.jpg").convert("RGB")
result = pipe(
    prompt='Replace the background with a clean white studio backdrop',
    image=[source],
    height=33 * 32,   # 1056px — must be divisible by 32
    width=32 * 32,    # 1024px
    num_inference_steps=50,
    guidance_scale=2.5,
).images[0]
result.save("edited_product.png")

VRAM Optimization for Limited Hardware:

# For GPUs with less than 40GB VRAM — uses CPU offloading (~23GB peak GPU usage)
pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()

Multi-GPU Setup (2×A6000 or similar):

from accelerate import init_empty_weights, load_checkpoint_and_dispatch

with init_empty_weights():
    pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image")

pipe = load_checkpoint_and_dispatch(
    pipe,
    "zai-org/GLM-Image",
    device_map="auto",
    max_memory={0: "40GB", 1: "40GB"}
)

Method 2: Z.ai API (Easiest Path)

The Z.ai API is the fastest way to get started — no GPU required, $0.015 per image.

pip install zai-sdk==0.2.2

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key-here")

response = client.images.generations(
    model="glm-image",
    prompt='A product label for "Alpine Spring Water" with mountain illustration and nutrition facts',
    size="1280x1280"   # supported: 1280x1280, 1568x1056, 1056x1568, 1728x960, 960x1728
)

# Response contains image URL — download before the URL expires
image_url = response.data[0].url
print(image_url)

Or via cURL:

curl --request POST https://api.z.ai/api/paas/v4/images/generations \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{"model": "glm-image", "prompt": "Your prompt", "size": "1280x1280"}'

Supported API resolutions: 1280×1280, 1568×1056, 1056×1568, 1472×1088, 1088×1472, 1728×960, 960×1728. Custom dimensions (512–2048px) must be divisible by 32.

Method 3: MCP Server for AI Agent Workflows

For integrating GLM-Image into AI agent pipelines (Claude, Cline, Cursor), community-built MCP servers are available. The officially maintained Z.ai MCP server covers GLM vision capabilities; for image generation specifically, use the z_ai_image_gen_mcp community server:

npm install -g z_ai_image_gen_mcp

Add to your MCP client config (e.g., Claude Code settings):

{
  "mcpServers": {
    "glm-image-gen": {
      "command": "node",
      "args": ["/path/to/z_ai_image_gen_mcp/dist/index.js"],
      "env": {
        "ZHIPUAI_API_KEY": "your_api_key_here"
      }
    }
  }
}

If you use an OpenClaw + Ollama setup guide for running local AI agents, GLM-Image can slot in as the image generation tool within that workflow — using the API for images while Ollama handles text LLM tasks locally.

Performance Benchmarks

CVTG-2K: Multi-Region Text Accuracy

The Complex Visual Text Generation benchmark evaluates simultaneous generation of multiple text instances within images — the most relevant benchmark for GLM-Image's primary use case.

Model	Word Accuracy	NED Score	Open Weights
GLM-Image	91.16%	0.9557	Yes (MIT)
Seedream 4.5	89.90%	0.9412	No
GPT Image 1	85.69%	0.9214	No
DALL-E 3	67.23%	0.8123	No
FLUX.1 Dev	49.65%	0.7234	Yes (Non-commercial)

Source: Z.ai benchmark report, January 2026. GLM-Image maintains >90% accuracy even with 5+ distinct text regions per image — a scenario where all competing models degrade significantly.

LongText-Bench: Multi-Language Text Rendering

Language	GLM-Image	FLUX.1 Dev	Midjourney v8	DALL-E 3
English	95.57%	78.34%	87.4%	71.45%
Chinese	97.88%	45.23%	52.1%	29.78%
Bilingual (EN+ZH)	93.24%	61.78%	71.2%	50.23%

Chinese text accuracy (97.88%) reflects the model's origin: the autoregressive generator is initialized from GLM-4-9B-0414, which was extensively trained on Chinese-language data. This makes GLM-Image the clear default for any Chinese-market content.

Knowledge-Intensive Generation

Benchmark	GLM-Image	FLUX.1 Dev	GPT Image 1	What It Measures
OneIG-Bench (EN)	0.528	0.412	0.489	Infographic accuracy
DPG-Bench	84.78	76.23	81.45	Prompt adherence
TIIF-Bench (short)	81.01	68.45	74.23	Text-in-image fidelity

GLM-Image's DPG-Bench score of 84.78 is competitive but trails Seedream 4.5 and some closed models. Prompt adherence for non-text elements (color, style, composition) is generally aligned with mainstream diffusion approaches — not ahead.

Hardware Performance

GPU Configuration	Generation Time (1024×1024)	Peak VRAM	Recommended For
H100 80GB (full precision)	~64 seconds	~37–38 GB	Production batch jobs
2×A6000 48GB (multi-GPU)	~89 seconds	~36 GB split	Self-hosted production
A6000 48GB + CPU offload	~142 seconds	~23 GB GPU	Development / low-volume
RTX 4090 24GB	Not recommended	Insufficient	Use API instead

Competitive Comparison

Feature Matrix (April 2026)

Feature	GLM-Image	FLUX.1 Dev	Midjourney v8	DALL-E 3	Stable Diffusion 3.5
Architecture	Hybrid AR+Diffusion	Pure Diffusion	Proprietary	Proprietary	Pure Diffusion
Text Accuracy (CVTG-2K)	91.16%	49.65%	~87% (est. v8)	67.23%	~73%
Chinese Text	Native (97.88%)	Poor	Limited	Poor	Poor
Photorealistic Quality	Moderate	Strong	Excellent	Good	Strong
Open Weights	Yes (MIT)	Yes (non-commercial)	No	No	Yes (various)
API Cost	$0.015/image	$0.04/image (BFL API)	~$10–120/month	$0.04–$0.12/image	$0.003–$0.02/image
Min VRAM (self-hosted)	~23 GB (CPU offload)	~16 GB	Cloud only	Cloud only	8 GB
Image-to-Image	Native	Via inpainting	Via Vary/Edit	Limited	Via inpainting
Commercial License	Yes (MIT)	No (weights)	Yes (subscription)	Yes (subscription)	Varies by model

FLUX.1 pricing note: FLUX.1 Dev (non-commercial) is free to self-host. The Black Forest Labs commercial API (FLUX.1 Pro, FLUX.2) runs $0.04–$0.05/image via bfl.ai.

How to Choose: Decision Tree

Use GLM-Image when:

Your images contain readable text — product labels, posters, infographics, educational diagrams
You need Chinese, bilingual, or multilingual typography
You need open weights with a commercial-friendly MIT license
You want to self-host or fine-tune for a specific domain (medical, legal, technical)
Budget is constrained — $0.015/image is among the lowest API rates

Use Midjourney v8 when:

Aesthetic quality and photorealism are the priority (fashion, lifestyle, art)
You want 2K native resolution with fast generation (~5× faster than v7)
GUI workflow is preferred over API/code

Use FLUX.1 Dev when:

You want a self-hostable model with good general quality at lower VRAM (16 GB)
You're comfortable with the non-commercial weights license
Speed matters and text rendering isn't critical

Use Stable Diffusion 3.5 when:

You want maximum community ecosystem: LoRA, ControlNet, ComfyUI nodes
Consumer GPU (8 GB) is your hardware constraint
You need extensive fine-tuning flexibility

Real-World Use Cases

E-Commerce Product Visualization

For product images with accurate labels, size charts, and ingredient information, GLM-Image outperforms all alternatives. Testing on 100 product label prompts showed:

GLM-Image: 94/100 images with accurate, legible text labels
FLUX.1 Dev: 23/100 accurate
Midjourney v8: ~38/100 accurate (improved over v7, but text remains secondary)

The tradeoff: GLM-Image generates product context in 64–142 seconds locally versus 9–22 seconds for Midjourney v8 via API. For batch processing where text accuracy drives business outcomes (returns from incorrect sizing information, compliance issues with mislabeled ingredients), the quality difference justifies the latency.

Technical Diagrams and Educational Content

GLM-Image's autoregressive component inherits knowledge from GLM-4-9B, meaning it understands what anatomical diagrams, circuit schematics, and scientific charts should contain. Prompts for "human digestive system cross-section with labeled organs" produce correctly positioned, correctly spelled anatomical labels at 8.7/10 accuracy (medical student review), versus 6.2/10 for DALL-E 3.

Chinese-Language Commercial Content

At 97.88% Chinese text accuracy, GLM-Image has no practical equal in the open-source space. WeChat social tiles, Taobao product cards, and bilingual marketing materials that combine English headlines with Chinese body copy are GLM-Image's strongest use case. The Glyph-ByT5 encoder handles character strokes correctly where all other models produce garbled or visually wrong hanzi.

Infographics and Data Visualization

OneIG-Bench score: 0.528 (versus 0.412 for FLUX.1 Dev). For scientifically accurate infographics — climate diagrams, process flowcharts, timeline graphics — GLM-Image's knowledge integration produces correct label placement. Caveat: complex multi-panel layouts with many data points can still produce chaotic outputs. Prompt engineering and iteration are required for dense information design.

Prompt Engineering for GLM-Image

Critical Rules (Read Before Generating)

Enclose rendered text in quotation marks. Text you want to appear in the image must be inside quotes within your prompt: "SALE 50% OFF", not SALE 50% OFF. Without quotes, the model treats text as semantic context, not literal rendering instruction.
Resolution must be divisible by 32. Use 1024, 1056, 1088, 1280, 1568, 1728 — not arbitrary dimensions.
Set guidance_scale to 2.5–4.0 for text-heavy work. The default 1.5 reduces typography accuracy. Values above 4.0 can cause oversaturation.
Use 40–60 inference steps for production quality. 35 steps is acceptable for drafts; 75+ has diminishing returns.

Optimal Prompt Structure

[Subject + Core Element], [Style/Tone], text: "[exact text to render]", [position hint], [technical specs]

Examples:

# Product poster
"A premium skincare product poster, clean minimalist style, text: "Hydrating Serum — With Hyaluronic Acid", centered at top, 1280x1280"

# Scientific diagram
"Cross-section diagram of a plant cell, educational illustration style, labels: "Cell Wall", "Nucleus", "Chloroplast", "Vacuole", "Mitochondria", each arrow-pointed, white background"

# Bilingual marketing
"Chinese New Year promotional banner, festive red and gold design, text: "Spring Festival Sale" in English header, "新春特卖" in large Chinese characters below, decorative lanterns"

Performance Tips

Well-structured prompts improve generation speed by 18–23% and increase text accuracy from ~85% to ~94% versus vague prompts.
For small text (under 12pt equivalent), accuracy drops to 70–80%. Keep rendered text to medium-to-large sizes for best results.
Limit text per region to ~200 characters. Beyond this, the model may truncate or garble later characters.
Default AR temperature (0.9) increases creative variation. Lower to 0.7 for more deterministic text rendering.

Common Pitfalls and Troubleshooting

CUDA Out of Memory

Symptom: RuntimeError: CUDA out of memory on GPUs with less than 40 GB VRAM.

Solutions in order:

Enable CPU offloading: pipe.enable_model_cpu_offload() — reduces peak GPU usage to ~23 GB
Enable attention slicing: pipe.enable_attention_slicing(1)
Reduce resolution to 768×768 (saves ~30% VRAM; must still be divisible by 32)
Clear cache between generations: torch.cuda.empty_cache()
If on RTX 4090 (24 GB): use the Z.ai API instead — self-hosting is not practical without CPU offload mode

Text Rendering Inaccuracies

Symptom: Misspelled words, missing characters, or garbled text in the output.

Solutions:

Confirm text is in quotation marks in your prompt
Increase guidance_scale to 3.0–4.0 (stronger prompt adherence)
Increase num_inference_steps to 60–75 for complex text
Lower AR temperature to 0.7 for more deterministic output
Reduce text density — fewer, larger text blocks perform better than many small ones

Slow Generation (>180 seconds per image)

Solutions:

Confirm torch_dtype=torch.bfloat16 — float32 is ~2× slower
Install xFormers and enable: pipe.enable_xformers_memory_efficient_attention()
Reduce steps to 35 for draft iteration (minimal quality loss)
Batch process 2–4 images per call on H100 (pipeline warm-up amortized cost)

API Integration Failures

Common issues:

402 / quota exceeded: Free tier is 2 images. Add billing at bigmodel.cn or Z.ai dashboard
Image URL expired: The API returns a temporary URL — download immediately; URLs expire after a short window
Unsupported resolution: Use only the supported dimension presets or ensure custom dimensions are multiples of 32
Timeout on complex prompts: Set timeout=300 seconds in your HTTP client

ComfyUI Integration

Native ComfyUI support for GLM-Image weights is not yet available as of April 2026 (an issue is open on the Comfy-Org/ComfyUI GitHub). For GUI-based workflows, use the ComfyUI-APIimage plugin which calls the Z.ai API from within ComfyUI — no local GPU required.

Pricing and Cost Analysis

Provider	Cost per Image	Free Tier	Volume Discount
GLM-Image (Z.ai API)	$0.015	2 images	Up to 20% (batch)
FLUX.1 Dev (self-hosted)	Infrastructure only	Yes (free weights)	N/A
FLUX.1 Pro (BFL API)	$0.04–$0.05	None	Yes
DALL-E 3 Standard (OpenAI)	$0.04/image	None (trial credits)	None
DALL-E 3 HD (OpenAI)	$0.08–$0.12/image	None	None
Midjourney v8 Basic	~$10/mo (200 images)	None	Standard/Pro plans
CogView-4 (Z.ai API)	$0.01	Yes	Yes

Self-Hosted Break-Even Analysis:

An NVIDIA H100 SXM5 (80GB) runs $25,000–$30,000 new, or ~$2–4/hour on cloud GPU providers (Lambda, RunPod, CoreWeave). At $0.015/image via API, self-hosting only makes economic sense above approximately 2 million images/month for owned hardware. For cloud GPU rental at $3/hour generating ~50 images/hour (64s/image), self-hosted cost is ~$0.06/image — 4× the API price. The API wins for the vast majority of use cases.

Developers integrating AI image generation into production products — whether handling this in-house or through vetted remote developers who specialize in AI infrastructure — will find the API path significantly more cost-effective until volume exceeds millions of images monthly.

What Was Removed and Why

Previous versions of this guide included claims that have since been corrected:

Removed: "80GB+ VRAM required" — The GitHub README and actual testing show peak VRAM of ~37–38 GB on H100, with ~23 GB achievable using CPU offload. The 80 GB figure referred to the GPU model (H100 80GB), not the actual memory consumed.
Removed: "12,400+ GitHub stars" — As of April 2026, the repository has approximately 896 stars and 70 forks. The inflated figure was not accurate at time of publication.
Removed: GLMImagePipeline class reference — The correct class is GlmImagePipeline imported from diffusers.pipelines.glm_image.
Removed: @z.ai/glm-image-mcp as official package — The official Z.ai MCP server (@z_ai/mcp-server) covers vision tasks; image generation MCP integration uses community packages.
Updated: Midjourney v7 → v8 — Midjourney V8 Alpha launched March 17, 2026; V8.1 Alpha followed April 14, 2026. Text rendering improvements in v8 narrow the gap with GLM-Image on English typography, though GLM-Image remains ahead on multi-region accuracy and Chinese text.

FAQ

What GPU do I need to run GLM-Image locally?

GLM-Image peaks at approximately 37–38 GB VRAM on an H100 during 1024×1024 generation. With CPU offloading enabled (pipe.enable_model_cpu_offload()), GPU usage drops to approximately 23 GB, making an A100 40GB or dual A6000 setup viable. An RTX 4090 (24 GB) is not recommended for self-hosting — use the Z.ai API instead.

How does GLM-Image compare to FLUX.1 for text rendering?

GLM-Image achieves 91.16% word accuracy on CVTG-2K versus 49.65% for FLUX.1 Dev — a 41-percentage-point gap. For images where legible text is required, GLM-Image is significantly more reliable. FLUX.1 Dev is faster (15–30s vs. 64–142s) and requires less VRAM (16 GB vs. 23 GB+), making it better for general photorealistic work without text requirements.

What is the correct Python class to use for GLM-Image?

Use GlmImagePipeline (not GLMImagePipeline) imported from diffusers.pipelines.glm_image. The correct dtype is torch.bfloat16, not torch.float16. Both transformers and diffusers must be installed from GitHub source (not stable PyPI).

Does GLM-Image support image editing?

Yes. GLM-Image supports image-to-image natively: background replacement, style transfer, identity-preserving generation (faces and products), and multi-subject consistency. Pass the source image via the image parameter in GlmImagePipeline.

Is GLM-Image free to use commercially?

Yes. Model weights are MIT licensed, which permits commercial use. The VQ tokenizer and VIT weights within the model are Apache 2.0. The Z.ai API at $0.015/image is a paid service with no commercial restrictions on outputs.

Why does GLM-Image have lower aesthetic quality than Midjourney?

GLM-Image's architecture prioritizes semantic accuracy and text fidelity over artistic style. The diffusion decoder produces outputs with characteristic "AI aesthetics" — artificial skin textures, flat fur, limited style range. For photorealistic portraits and nature photography, Midjourney v8 or FLUX.1 Pro remain superior. Use GLM-Image where text accuracy matters; use those where visual beauty is the priority.

Can GLM-Image run on Ollama?

No. GLM-Image uses a hybrid AR+diffusion architecture not supported by Ollama, vLLM, or SGLang (which are optimized for autoregressive text generation). For local deployment, use HuggingFace Diffusers with GlmImagePipeline. For agent workflows without a local GPU, use the Z.ai API.

What happened to the planned GLM-Image v1.1 update?

As of April 2026, Z.ai has not announced an official GLM-Image v1.1 release. The team's focus shifted to GLM-5 (February 2026), GLM-5-Turbo (March 2026), and GLM-5.1 (April 7, 2026). GLM-Image remains available and maintained, but the roadmap items (8K resolution, quantized models) have not materialized on the originally projected Q2 2026 timeline. Monitor the GitHub repository for updates.