Models / Google
Reasoning
Vision

Gemma 4 31B

Multimodal reasoning model with thinking mode and native function calling

About model

Gemma 4 31B is Google DeepMind's 31B parameter dense multimodal model supporting text and image input with a 256K token context window. It features configurable thinking mode for step-by-step reasoning, native function calling for agentic workflows, and pre-training on 140+ languages. The model delivers strong performance across reasoning (89.2% AIME), coding (80.0% LiveCodeBench v6), scientific understanding (84.3% GPQA Diamond), and multimodal tasks (76.9% MMMU Pro) under the Apache 2.0 license.

Context Window

256K

With hybrid attention for long-context optimization

AIME 2026

89.20%

Mathematical reasoning without tools

LiveCodeBench v6

80.00%

Code generation and completion

Model key capabilities
  • Multimodal Understanding: Text and image input with variable aspect ratio support, document parsing, OCR, chart comprehension, and 76.9% MMMU Pro
  • Configurable Thinking: Built-in reasoning mode for step-by-step problem solving with 89.2% AIME and 84.3% GPQA Diamond
  • Native Function Calling: Structured tool use with JSON mode for agentic workflows, scoring 76.9% on Tau2 benchmark
  • Multilingual & Long Context: Pre-trained on 140+ languages with 256K token context window and hybrid attention architecture
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

89.20%

84.30%

19.50%

80.00%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    google/gemma-4-31B-it

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "google/gemma-4-31B-it",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="google/gemma-4-31B-it",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'google/gemma-4-31B-it',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • 30.7B parameter dense transformer with hybrid attention (interleaved local sliding window + full global attention)
    • 256K token context window with proportional RoPE for long-context optimization
    • Multimodal: text and image input with variable aspect ratio and resolution support via ~550M parameter vision encoder
    • Configurable thinking mode for step-by-step reasoning before generating answers
    • Native function calling and structured JSON output for agentic workflows
    • Native system prompt support for structured and controllable conversations
    • FP8 quantization

    Training Methodology:
    • Pre-trained on 140+ languages with out-of-the-box support for 35+ languages
    • Instruction-tuned variant with thinking mode, function calling, and multimodal capabilities
    • Hybrid attention design with unified keys and values on global layers for memory-efficient long-context processing

    Performance Characteristics:
    • 89.2% AIME 2026 (no tools) for mathematical reasoning
    • 80.0% LiveCodeBench v6 for code generation
    • 84.3% GPQA Diamond for scientific reasoning
    • 76.9% MMMU Pro for multimodal understanding
    • 85.6% MATH-Vision for visual mathematical reasoning
    • 85.2% MMLU Pro for general knowledge
    • 88.4% MMMLU for multilingual understanding
    • 76.9% Tau2 for agentic tool use

  • Prompting

    Together AI API Access:
    • Access Gemma 4 31B via Together AI APIs using the endpoint google/gemma-4-31B-it
    • Authenticate using your Together AI API key in request headers
    • Supports thinking mode, function calling, JSON mode, and multimodal image input
    • $0.20 per million input tokens / $0.50 per million output tokens
    • Available on Together AI serverless infrastructure

  • Applications & use cases

    Reasoning & Coding:
    • Mathematical reasoning with configurable thinking mode for step-by-step problem solving
    • Code generation, completion, and correction across multiple languages
    • Scientific reasoning and complex problem solving at 84.3% GPQA Diamond

    Multimodal Understanding:
    • Document and PDF parsing with OCR including multilingual support
    • Chart comprehension, screen and UI understanding, and handwriting recognition
    • Variable aspect ratio and resolution image processing for diverse visual inputs

    Agentic Workflows:
    • Native function calling with structured JSON output for tool orchestration
    • Agentic pipelines with 76.9% Tau2 benchmark performance
    • System prompt support for structured multi-turn conversations
    • 256K context for processing large codebases and documentation

Related models
  • Model provider
    Google
  • Type
    Reasoning
    Vision
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Deployment
    Serverless
  • Parameters
    31B
  • Context length
    256K
  • Input price

    $0.20 / 1M tokens

  • Output price

    $0.50 / 1M tokens

  • Input modalities
    Text
    Image
  • Output modalities
    Text
  • Released
    April 2, 2026
  • Quantization level
    FP8
  • Category
    Chat