Gemma 4 31B
Multimodal reasoning model with thinking mode and native function calling
About model
Gemma 4 31B is Google DeepMind's 31B parameter dense multimodal model supporting text and image input with a 256K token context window. It features configurable thinking mode for step-by-step reasoning, native function calling for agentic workflows, and pre-training on 140+ languages. The model delivers strong performance across reasoning (89.2% AIME), coding (80.0% LiveCodeBench v6), scientific understanding (84.3% GPQA Diamond), and multimodal tasks (76.9% MMMU Pro) under the Apache 2.0 license.
256K
With hybrid attention for long-context optimization
89.20%
Mathematical reasoning without tools
80.00%
Code generation and completion
- Multimodal Understanding: Text and image input with variable aspect ratio support, document parsing, OCR, chart comprehension, and 76.9% MMMU Pro
- Configurable Thinking: Built-in reasoning mode for step-by-step problem solving with 89.2% AIME and 84.3% GPQA Diamond
- Native Function Calling: Structured tool use with JSON mode for agentic workflows, scoring 76.9% on Tau2 benchmark
- Multilingual & Long Context: Pre-trained on 140+ languages with 256K token context window and hybrid attention architecture
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
Gemma 4 31B | 89.20% | 84.30% | 19.50% | 80.00% | Related open-source models | Competitor closed-source models |
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• 30.7B parameter dense transformer with hybrid attention (interleaved local sliding window + full global attention)
• 256K token context window with proportional RoPE for long-context optimization
• Multimodal: text and image input with variable aspect ratio and resolution support via ~550M parameter vision encoder
• Configurable thinking mode for step-by-step reasoning before generating answers
• Native function calling and structured JSON output for agentic workflows
• Native system prompt support for structured and controllable conversations
• FP8 quantization
Training Methodology:
• Pre-trained on 140+ languages with out-of-the-box support for 35+ languages
• Instruction-tuned variant with thinking mode, function calling, and multimodal capabilities
• Hybrid attention design with unified keys and values on global layers for memory-efficient long-context processing
Performance Characteristics:
• 89.2% AIME 2026 (no tools) for mathematical reasoning
• 80.0% LiveCodeBench v6 for code generation
• 84.3% GPQA Diamond for scientific reasoning
• 76.9% MMMU Pro for multimodal understanding
• 85.6% MATH-Vision for visual mathematical reasoning
• 85.2% MMLU Pro for general knowledge
• 88.4% MMMLU for multilingual understanding
• 76.9% Tau2 for agentic tool use
Prompting
Together AI API Access:
• Access Gemma 4 31B via Together AI APIs using the endpoint google/gemma-4-31B-it
• Authenticate using your Together AI API key in request headers
• Supports thinking mode, function calling, JSON mode, and multimodal image input
• $0.20 per million input tokens / $0.50 per million output tokens
• Available on Together AI serverless infrastructure
Applications & use cases
Reasoning & Coding:
• Mathematical reasoning with configurable thinking mode for step-by-step problem solving
• Code generation, completion, and correction across multiple languages
• Scientific reasoning and complex problem solving at 84.3% GPQA Diamond
Multimodal Understanding:
• Document and PDF parsing with OCR including multilingual support
• Chart comprehension, screen and UI understanding, and handwriting recognition
• Variable aspect ratio and resolution image processing for diverse visual inputs
Agentic Workflows:
• Native function calling with structured JSON output for tool orchestration
• Agentic pipelines with 76.9% Tau2 benchmark performance
• System prompt support for structured multi-turn conversations
• 256K context for processing large codebases and documentation
- TypeReasoningVision
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- DeploymentServerless
- Endpoint
- Parameters31B
- Context length256K
- Input price
$0.20 / 1M tokens
- Output price
$0.50 / 1M tokens
- Input modalitiesTextImage
- Output modalitiesText
- ReleasedApril 2, 2026
- Quantization levelFP8
- CategoryChat