Reasoning

GLM-5.1

Refined post-training for coding and agentic engineering workflows

About model

GLM-5.1 is Z.ai's post-training upgrade to GLM-5, delivering a 28% coding performance improvement through refined reinforcement learning while retaining the same 744B parameter (40B activated) MoE architecture with 200K context. The model supports thinking mode, tool calling, and structured JSON output for agentic engineering workflows, with compatibility across coding agent frameworks.

Coding Improvement

28%

Over GLM-5 through refined RL post-training

Total Parameters (40B Activated)

744B

MoE with DeepSeek Sparse Attention

Context Window

200K

With 131K max output tokens

Model key capabilities

Refined Coding Performance: 28% improvement over GLM-5 through targeted post-training RL, scoring 45.3 on Z.ai coding evaluation
Agentic Engineering: Long-horizon execution across frontend, backend, and systems engineering with sustained coherence and minimal human intervention
Thinking Mode & Tool Calling: Step-by-step reasoning with structured JSON output and function orchestration for production agentic pipelines
Open-Source Leadership: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, #1 on Vending Bench 2 among open-source models

Performance benchmarks

Model	AIME 2025	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
GLM-5.1	92.70%	86.00%	50.40%			77.80%
Related open-source models
Competitor closed-source models
Claude Opus 4.6		90.5%	34.2%			78.7%
OpenAI o3		83.3%	24.9%		99.2%	62.3%
OpenAI o1		76.8%			96.4%	48.9%
GPT-4o		49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

zai-org/GLM-5.1

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-5.1",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="zai-org/GLM-5.1",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'zai-org/GLM-5.1',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• 744B total parameter MoE architecture with 40B parameters activated per token
• DeepSeek Sparse Attention (DSA) reducing deployment cost while preserving long-context capacity
• 200K context window with 131,072 max output tokens
• Supports thinking mode for step-by-step reasoning before generating final answers
• Tool calling and structured JSON output for agentic workflows
• Compatible with coding agent frameworks including Claude Code, Kilo Code, Cline, and Roo Code

Training Methodology:
• Post-training upgrade to GLM-5 with refined reinforcement learning pipeline targeting coding task distributions
• Same 744B base architecture pre-trained on 28.5T tokens, with enhanced RL post-training for coding performance
• Asynchronous RL infrastructure (slime) enabling fine-grained post-training iterations at scale

Performance Characteristics:
• 28% coding performance improvement over GLM-5 through post-training refinement
• 45.3 on Z.ai coding evaluation benchmark using Claude Code as the testing harness
• GLM-5 base benchmarks: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, 92.7% AIME 2026 I, 86.0% GPQA-Diamond
• #1 open-source on Vending Bench 2 for long-horizon planning
• Gains across frontend development, backend systems engineering, and long-horizon execution tasks
‍
Prompting
Together AI API Access:
• Access GLM-5.1 via Together AI APIs using the endpoint zai-org/GLM-5.1
• Authenticate using your Together AI API key in request headers
• Supports thinking mode, tool calling, and structured JSON output
• Available on both serverless and dedicated infrastructure
‍
Applications & use cases
Agentic Coding:
• Autonomous software development with frontend, backend, and full-stack coverage
• Repository-scale navigation, refactoring, and comprehensive testing
• Long-horizon execution tasks maintaining coherence across multi-step workflows
• Compatible with coding agent frameworks for production development

Systems Engineering:
• Complex system design spanning architecture, implementation, and testing
• Backend refactoring and deep debugging with minimal human intervention
• Multi-step workflows requiring sustained goal alignment and tool coordination

Reasoning & Tool Use:
• Thinking mode for step-by-step reasoning on complex problems
• Tool calling and function orchestration for agentic pipelines
• Structured JSON output for integration into production systems
• Long-context understanding across 200K tokens for large codebases and documentation
‍

Related models

Model specifications

Model data

Model provider
ZAI
Type
Reasoning
Main use cases
Reasoning
Features
Function Calling
JSON Mode
Deployment
Serverless
On-Demand Dedicated
Endpoint
zai-org/GLM-5.1
Parameters
744B
Activated parameters
40B
Context length
200K
Input price
$1.40 / 1M tokens
Output price
$4.40 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
March 26, 2026
Quantization level
FP4
Category
Chat

Run in Playground

Quickstart docs

Deploy model

GLM-5.1

About model

API usage

Model card

Prompting

Applications & use cases