Models / ZAI
Reasoning

GLM-5.1

Refined post-training for coding and agentic engineering workflows

About model

GLM-5.1 is Z.ai's post-training upgrade to GLM-5, delivering a 28% coding performance improvement through refined reinforcement learning while retaining the same 744B parameter (40B activated) MoE architecture with 200K context. The model supports thinking mode, tool calling, and structured JSON output for agentic engineering workflows, with compatibility across coding agent frameworks.

Coding Improvement

28%

Over GLM-5 through refined RL post-training

Total Parameters (40B Activated)

744B

MoE with DeepSeek Sparse Attention

Context Window

200K

With 131K max output tokens

Model key capabilities
  • Refined Coding Performance: 28% improvement over GLM-5 through targeted post-training RL, scoring 45.3 on Z.ai coding evaluation
  • Agentic Engineering: Long-horizon execution across frontend, backend, and systems engineering with sustained coherence and minimal human intervention
  • Thinking Mode & Tool Calling: Step-by-step reasoning with structured JSON output and function orchestration for production agentic pipelines
  • Open-Source Leadership: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, #1 on Vending Bench 2 among open-source models
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

92.70%

86.00%

50.40%

77.80%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    zai-org/GLM-5.1

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "zai-org/GLM-5.1",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="zai-org/GLM-5.1",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'zai-org/GLM-5.1',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • 744B total parameter MoE architecture with 40B parameters activated per token
    • DeepSeek Sparse Attention (DSA) reducing deployment cost while preserving long-context capacity
    • 200K context window with 131,072 max output tokens
    • Supports thinking mode for step-by-step reasoning before generating final answers
    • Tool calling and structured JSON output for agentic workflows
    • Compatible with coding agent frameworks including Claude Code, Kilo Code, Cline, and Roo Code

    Training Methodology:
    • Post-training upgrade to GLM-5 with refined reinforcement learning pipeline targeting coding task distributions
    • Same 744B base architecture pre-trained on 28.5T tokens, with enhanced RL post-training for coding performance
    • Asynchronous RL infrastructure (slime) enabling fine-grained post-training iterations at scale

    Performance Characteristics:
    • 28% coding performance improvement over GLM-5 through post-training refinement
    • 45.3 on Z.ai coding evaluation benchmark using Claude Code as the testing harness
    • GLM-5 base benchmarks: 77.8% SWE-Bench Verified, 50.4% HLE w/ tools, 92.7% AIME 2026 I, 86.0% GPQA-Diamond
    • #1 open-source on Vending Bench 2 for long-horizon planning
    • Gains across frontend development, backend systems engineering, and long-horizon execution tasks

  • Prompting

    Together AI API Access:
    • Access GLM-5.1 via Together AI APIs using the endpoint zai-org/GLM-5.1
    • Authenticate using your Together AI API key in request headers
    • Supports thinking mode, tool calling, and structured JSON output
    • Available on both serverless and dedicated infrastructure

  • Applications & use cases

    Agentic Coding:
    • Autonomous software development with frontend, backend, and full-stack coverage
    • Repository-scale navigation, refactoring, and comprehensive testing
    • Long-horizon execution tasks maintaining coherence across multi-step workflows
    • Compatible with coding agent frameworks for production development

    Systems Engineering:
    • Complex system design spanning architecture, implementation, and testing
    • Backend refactoring and deep debugging with minimal human intervention
    • Multi-step workflows requiring sustained goal alignment and tool coordination

    Reasoning & Tool Use:
    • Thinking mode for step-by-step reasoning on complex problems
    • Tool calling and function orchestration for agentic pipelines
    • Structured JSON output for integration into production systems
    • Long-context understanding across 200K tokens for large codebases and documentation

Related models
  • Model provider
    ZAI
  • Type
    Reasoning
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Deployment
    Serverless
    On-Demand Dedicated
  • Parameters
    744B
  • Activated parameters
    40B
  • Context length
    200K
  • Input price

    $1.40 / 1M tokens

  • Output price

    $4.40 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    March 26, 2026
  • Quantization level
    FP4
  • Category
    Chat