GLM-4.5-Air

106B‑parameter efficient MoE model, 128K‑token context, hybrid reasoning modes, optimized for superior efficiency while maintaining competitive performance.

Try Now

read docs

About model

GLM-4.5-Air delivers competitive AI performance with 106B parameters and 12B activation, offering the same 128K context and hybrid reasoning capabilities as GLM-4.5 but optimized for efficiency. Perfect for cost-conscious deployments requiring sophisticated AI capabilities.

‍

Quickstart guides

RAG

Building a RAG Workflow

Agents

Agent Workflows

Apps

Next.js Chat Quickstart

Performance benchmarks

Model	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
GLM-4.5-Air		8.1%
Related open-source models
Competitor closed-source models
Claude Opus 4.6	90.5%	34.2%			78.7%
OpenAI o3	83.3%	24.9%		99.2%	62.3%
OpenAI o1	76.8%			96.4%	48.9%
GPT-4o	49.2%	2.7%	32.3%	89.3%	31.0%

Model card
Architecture Overview:
• Compact Mixture-of-Experts design with 106B total parameters and 12B active parameters
• 128K token context window matching full GLM-4.5 capabilities
• Optimized MoE routing with reduced width and increased depth for efficiency
• Grouped-Query Attention with Multi-Token Prediction layer support

Training Methodology:
• Shared training pipeline with GLM-4.5 using 15T general + 7T code & reasoning tokens
• Specialized post-training for efficiency-performance balance
• Reinforcement learning optimization for agentic task performance
• FP8 and BF16 mixed precision training for accelerated inference

Performance Characteristics:
• Ranked 6th overall with 59.8 score demonstrating competitive efficiency
• Strong agentic performance with 69.4 on τ-bench and 76.4 on BFCL-v3
• Solid coding capabilities with 57.6% on SWE-bench Verified
• Optimal efficiency on performance-scale trade-off boundary
‍
Applications & use cases
Enterprise Applications:
• Cost-effective conversational AI for high-volume deployments
• Efficient intelligent agents for standard automation tasks
• Resource-conscious development environments and coding assistance
• Scalable customer support and virtual assistant implementations

Development & Technical:
• Lightweight coding assistance and software development support
• Efficient reasoning for educational and training applications
• Streamlined tool integration for standard agentic workflows
• Multi-language processing for global accessibility requirements

Business Solutions:
• SME and startup-friendly AI integration with competitive performance
• Batch processing and automated content generation at scale
• Mobile and edge deployment scenarios requiring efficiency
• Proof-of-concept and prototyping for AI-powered applications
‍

Related models

Model specifications

Model data

Model provider
ZAI
Type
Chat
Reasoning
Main use cases
Chat
Function Calling
Features
Function Calling
JSON Mode
Deployment
On-Demand Dedicated
Monthly Reserved
Parameters
106B
Activated parameters
12B
Context length
128K
Input price
$0.20 / 1M tokens
Output price
$1.10 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
July 19, 2025
Quantization level
FP8
External link
Provider docs
Category
Chat

Quickstart docs

Deploy model

GLM-4.5-Air

About model

Model card

Applications & use cases