GLM-4.5-Air
106B‑parameter efficient MoE model, 128K‑token context, hybrid reasoning modes, optimized for superior efficiency while maintaining competitive performance.
About model
GLM-4.5-Air delivers competitive AI performance with 106B parameters and 12B activation, offering the same 128K context and hybrid reasoning capabilities as GLM-4.5 but optimized for efficiency. Perfect for cost-conscious deployments requiring sophisticated AI capabilities.
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
GLM-4.5-Air | 8.1% | Related open-source models | Competitor closed-source models | |||
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• Compact Mixture-of-Experts design with 106B total parameters and 12B active parameters
• 128K token context window matching full GLM-4.5 capabilities
• Optimized MoE routing with reduced width and increased depth for efficiency
• Grouped-Query Attention with Multi-Token Prediction layer support
Training Methodology:
• Shared training pipeline with GLM-4.5 using 15T general + 7T code & reasoning tokens
• Specialized post-training for efficiency-performance balance
• Reinforcement learning optimization for agentic task performance
• FP8 and BF16 mixed precision training for accelerated inference
Performance Characteristics:
• Ranked 6th overall with 59.8 score demonstrating competitive efficiency
• Strong agentic performance with 69.4 on τ-bench and 76.4 on BFCL-v3
• Solid coding capabilities with 57.6% on SWE-bench Verified
• Optimal efficiency on performance-scale trade-off boundary
Applications & use cases
Enterprise Applications:
• Cost-effective conversational AI for high-volume deployments
• Efficient intelligent agents for standard automation tasks
• Resource-conscious development environments and coding assistance
• Scalable customer support and virtual assistant implementations
Development & Technical:
• Lightweight coding assistance and software development support
• Efficient reasoning for educational and training applications
• Streamlined tool integration for standard agentic workflows
• Multi-language processing for global accessibility requirements
Business Solutions:
• SME and startup-friendly AI integration with competitive performance
• Batch processing and automated content generation at scale
• Mobile and edge deployment scenarios requiring efficiency
• Proof-of-concept and prototyping for AI-powered applications
- Model providerZAI
- TypeChatReasoning
- Main use casesChatFunction Calling
- FeaturesFunction CallingJSON Mode
- DeploymentServerless
- Endpoint
- Parameters106B
- Activated parameters12B
- Context length128K
- Input price
$0.20 / 1M tokens
- Output price
$1.10 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedJuly 19, 2025
- Quantization levelFP8
- External link
- CategoryChat