Gemma 4 31B

Multimodal reasoning model with thinking mode and native function calling

About model

Gemma 4 31B is Google DeepMind's 31B parameter dense multimodal model supporting text and image input with a 256K token context window. It features configurable thinking mode for step-by-step reasoning, native function calling for agentic workflows, and pre-training on 140+ languages. The model delivers strong performance across reasoning (89.2% AIME), coding (80.0% LiveCodeBench v6), scientific understanding (84.3% GPQA Diamond), and multimodal tasks (76.9% MMMU Pro) under the Apache 2.0 license.

Context Window

256K

With hybrid attention for long-context optimization

AIME 2026

89.20%

Mathematical reasoning without tools

LiveCodeBench v6

80.00%

Code generation and completion

Model key capabilities

Multimodal Understanding: Text and image input with variable aspect ratio support, document parsing, OCR, chart comprehension, and 76.9% MMMU Pro
Configurable Thinking: Built-in reasoning mode for step-by-step problem solving with 89.2% AIME and 84.3% GPQA Diamond
Native Function Calling: Structured tool use with JSON mode for agentic workflows, scoring 76.9% on Tau2 benchmark
Multilingual & Long Context: Pre-trained on 140+ languages with 256K token context window and hybrid attention architecture

Performance benchmarks

Model	AIME 2025	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
Gemma 4 31B	89.20%	84.30%	19.50%	80.00%
Related open-source models
Competitor closed-source models
Claude Opus 4.6		90.5%	34.2%			78.7%
OpenAI o3		83.3%	24.9%		99.2%	62.3%
OpenAI o1		76.8%			96.4%	48.9%
GPT-4o		49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

google/gemma-4-31B-it

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-31B-it",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="google/gemma-4-31B-it",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'google/gemma-4-31B-it',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• 30.7B parameter dense transformer with hybrid attention (interleaved local sliding window + full global attention)
• 256K token context window with proportional RoPE for long-context optimization
• Multimodal: text and image input with variable aspect ratio and resolution support via ~550M parameter vision encoder
• Configurable thinking mode for step-by-step reasoning before generating answers
• Native function calling and structured JSON output for agentic workflows
• Native system prompt support for structured and controllable conversations
• FP8 quantization

Training Methodology:
• Pre-trained on 140+ languages with out-of-the-box support for 35+ languages
• Instruction-tuned variant with thinking mode, function calling, and multimodal capabilities
• Hybrid attention design with unified keys and values on global layers for memory-efficient long-context processing

Performance Characteristics:
• 89.2% AIME 2026 (no tools) for mathematical reasoning
• 80.0% LiveCodeBench v6 for code generation
• 84.3% GPQA Diamond for scientific reasoning
• 76.9% MMMU Pro for multimodal understanding
• 85.6% MATH-Vision for visual mathematical reasoning
• 85.2% MMLU Pro for general knowledge
• 88.4% MMMLU for multilingual understanding
• 76.9% Tau2 for agentic tool use
‍
Prompting
Together AI API Access:
• Access Gemma 4 31B via Together AI APIs using the endpoint google/gemma-4-31B-it
• Authenticate using your Together AI API key in request headers
• Supports thinking mode, function calling, JSON mode, and multimodal image input
• $0.20 per million input tokens / $0.50 per million output tokens
• Available on Together AI serverless infrastructure
‍
Applications & use cases
Reasoning & Coding:
• Mathematical reasoning with configurable thinking mode for step-by-step problem solving
• Code generation, completion, and correction across multiple languages
• Scientific reasoning and complex problem solving at 84.3% GPQA Diamond

Multimodal Understanding:
• Document and PDF parsing with OCR including multilingual support
• Chart comprehension, screen and UI understanding, and handwriting recognition
• Variable aspect ratio and resolution image processing for diverse visual inputs

Agentic Workflows:
• Native function calling with structured JSON output for tool orchestration
• Agentic pipelines with 76.9% Tau2 benchmark performance
• System prompt support for structured multi-turn conversations
• 256K context for processing large codebases and documentation
‍

Related models

Model specifications

Model data

Model provider
Google
Type
Reasoning
Vision
Main use cases
Reasoning
Features
Function Calling
JSON Mode
Deployment
Serverless
Endpoint
google/gemma-4-31B-it
Parameters
31B
Context length
256K
Input price
$0.20 / 1M tokens
Output price
$0.50 / 1M tokens
Input modalities
Text
Image
Output modalities
Text

Released
April 2, 2026
Quantization level
FP8
Category
Chat

Run in Playground

Quickstart docs

Deploy model

Gemma 4 31B

About model

API usage

Model card

Prompting

Applications & use cases