Zhipu AI's 744B MoE model — SOTA on SWE-Bench Pro, MIT licensed, trained without Nvidia
58.4SWE-Bench Pro
$1.00Per 1M Input
744BParameters
The Problem
Closed APIs, High Cost
Frontier coding models are locked behind closed APIs at $15-25 per million tokens. No open model matches them on real-world engineering benchmarks. Self-hosting is not an option.
Vendor Lock-In$15-25/M
What Is GLM-5.1
Z.ai's Flagship Open-Weights Model
Zhipu AI's text-only LLM. 744B MoE with 256 experts, 8 active per token (~40B active parameters). 200K context window, 128K max output tokens. MIT license on Hugging Face and ModelScope. First open-weights model to achieve SOTA on SWE-Bench Pro. Designed for sustained 8-hour agentic workflows, not single-shot generation. Compatible with vLLM, SGLang, Claude Code, and OpenClaw.
Open Weights>MoE 744B>40B Active>MIT License
Mental Model
Junior Engineer, Senior Output
Think of it as hiring a junior engineer who works 8-hour shifts, costs 1/8th of a senior, and scores 94.6% as well on coding tests.
The question: does benchmark performance survive production?
Agentic Endurance
8-Hour Autonomous Sessions
Plan-execute-analyze-optimize loop. Vector DB demo: 600+ iterations, 6,000+ tool calls, 6x performance gain. Built a complete Linux desktop environment in browser from scratch.
600+ Iters6,000+ Calls
MoE Architecture
256 Experts, 8 Active
744B total parameters. 256 experts, 8 active per token (~40B active). Asynchronous RL + sparse attention. 200K context window, 128K output tokens.
Sparse MoEAsync RL
Huawei Hardware
Zero Nvidia Dependency
Trained entirely on Ascend 910B chips. First frontier model trained on domestic Chinese hardware. No CUDA, no H100s, no export-controlled components.
Ascend 910BDomestic Stack
Open Weights
MIT License, Self-Host Ready
Hugging Face + ModelScope. vLLM and SGLang for self-hosting. Compatible with Claude Code and OpenClaw agent frameworks.
MITvLLMSGLang
Benchmark Comparison
Benchmark
GLM-5.1
Claude Opus 4.6
GPT-5.4
Gemini 3.1 Pro
SWE-Bench Pro
58.4 SOTA
57.3
57.7
54.2
SWE-Bench Verified
77.8%
80.8%
80.0%
—
AIME 2026
95.3
—
98.7
—
GPQA-Diamond
86.2
91.3
—
—
HLE (w/ tools)
52.3
—
—
—
Terminal-Bench 2.0
63.5
—
—
—
MCP-Atlas
71.8
—
—
—
Pricing Comparison
Model
Input / 1M
Output / 1M
License
Self-Host
GLM-5.1
$1.00
$3.20
MIT
Yes
Gemini 3.1 Pro
~$3.00
$12.00
Closed
No
GPT-5.4
~$5.00
$15.00
Closed
No
Claude Opus 4.6
~$7.50
$25.00
Closed
No
GLM-5.1 output is 3.75x cheaper than Gemini, 4.7x cheaper than GPT-5.4, 7.8x cheaper than Opus 4.6
Limitations
Where It Falls Short
Text-only: no multimodal support
GPQA-Diamond: 86.2 vs Opus 91.3
Kernel opt: 3.6x vs Claude 4.2x
Chinese-first documentation
SWE-Verified trails leaders
Agentic Workflow — 600+ Iteration Loop
01 — Helicopter · GLM-5.1 Deep Divesangampandey.info