GLM-5.1
Refined post-training for coding and agentic engineering workflows
About model
GLM-5.1 is Z.ai's post-training upgrade to GLM-5, delivering a 28% coding performance improvement through refined reinforcement learning while retaining the same 744B parameter (40B activated) MoE architecture with 200K context. The model supports thinking mode, tool calling, and structured JSON output for agentic engineering workflows, with compatibility across coding agent frameworks.
28%
Over GLM-5 through refined RL post-training
744B
MoE with DeepSeek Sparse Attention
200K
With 131K max output tokens
- Refined Coding Performance: 28% improvement over GLM-5 through targeted post-training RL, scoring 45.3 on Z.ai coding evaluation
- Agentic Engineering: Long-horizon execution across frontend, backend, and systems engineering with sustained coherence and minimal human intervention
- Thinking Mode & Tool Calling: Step-by-step reasoning with structured JSON output and function orchestration for production agentic pipelines
- Open-Source Leadership: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, #1 on Vending Bench 2 among open-source models
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
GLM-5.1 | 92.70% | 86.00% | 50.40% | 77.80% | Related open-source models | Competitor closed-source models |
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• 744B total parameter MoE architecture with 40B parameters activated per token
• DeepSeek Sparse Attention (DSA) reducing deployment cost while preserving long-context capacity
• 200K context window with 131,072 max output tokens
• Supports thinking mode for step-by-step reasoning before generating final answers
• Tool calling and structured JSON output for agentic workflows
• Compatible with coding agent frameworks including Claude Code, Kilo Code, Cline, and Roo Code
Training Methodology:
• Post-training upgrade to GLM-5 with refined reinforcement learning pipeline targeting coding task distributions
• Same 744B base architecture pre-trained on 28.5T tokens, with enhanced RL post-training for coding performance
• Asynchronous RL infrastructure (slime) enabling fine-grained post-training iterations at scale
Performance Characteristics:
• 28% coding performance improvement over GLM-5 through post-training refinement
• 45.3 on Z.ai coding evaluation benchmark using Claude Code as the testing harness
• GLM-5 base benchmarks: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, 92.7% AIME 2026 I, 86.0% GPQA-Diamond
• #1 open-source on Vending Bench 2 for long-horizon planning
• Gains across frontend development, backend systems engineering, and long-horizon execution tasks
Prompting
Together AI API Access:
• Access GLM-5.1 via Together AI APIs using the endpoint zai-org/GLM-5.1
• Authenticate using your Together AI API key in request headers
• Supports thinking mode, tool calling, and structured JSON output
• Available on both serverless and dedicated infrastructure
Applications & use cases
Agentic Coding:
• Autonomous software development with frontend, backend, and full-stack coverage
• Repository-scale navigation, refactoring, and comprehensive testing
• Long-horizon execution tasks maintaining coherence across multi-step workflows
• Compatible with coding agent frameworks for production development
Systems Engineering:
• Complex system design spanning architecture, implementation, and testing
• Backend refactoring and deep debugging with minimal human intervention
• Multi-step workflows requiring sustained goal alignment and tool coordination
Reasoning & Tool Use:
• Thinking mode for step-by-step reasoning on complex problems
• Tool calling and function orchestration for agentic pipelines
• Structured JSON output for integration into production systems
• Long-context understanding across 200K tokens for large codebases and documentation
- Model providerZAI
- TypeReasoning
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- DeploymentServerlessOn-Demand Dedicated
- Endpoint
- Parameters744B
- Activated parameters40B
- Context length200K
- Input price
$1.40 / 1M tokens
- Output price
$4.40 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedMarch 26, 2026
- Quantization levelFP4
- CategoryChat