Models / GoogleGemma / / Gemma 3n E4B Instruct API

Gemma 3n E4B Instruct API

Selective parameter activation delivers 2B/4B multimodal performance on low-resource devices, handling text, image, video, audio

Try our Gemma 3N E4B Instruct API

New

Gemma 3n E4B Instruct API Usage

Endpoint

google/gemma-3n-E4B-it

RUN INFERENCE

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-3n-E4B-it",
    "messages": [
      {
        "role": "system",
        "content": "What are some fun things to do in New York?"
      }
    ],
    "stream": true
  }'

RUN INFERENCE

from together import Together

client = Together()

response = client.chat.completions.create(
    model="google/gemma-3n-E4B-it",
    messages=[
      {
        "role": "system",
        "content": "What are some fun things to do in New York?"
      }
    ],
    stream=True
)
for token in response:
    if hasattr(token, 'choices'):
        print(token.choices[0].delta.content, end='', flush=True)

RUN INFERENCE

import Together from "together-ai";

const together = new Together();

const response = await together.chat.completions.create({
  messages: [
    {
      role: "system",
      content: "What are some fun things to do in New York?"
    }
  ],
  model: "google/gemma-3n-E4B-it",
  stream: true
});

for await (const token of response) {
  console.log(token.choices[0]?.delta?.content)
}

How to use Gemma 3n E4B Instruct

Model details

Gemma 3n Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3n models are designed for efficient execution on low-resource devices.

They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain.

Inputs and Outputs

Input:

Text string, such as a question, a prompt, or a document to be summarized
Images, normalized to 256x256, 512x512, or 768x768 resolution and encoded to 256 tokens each
Audio data encoded to 6.25 tokens per second from a single channel
Total input context of 32K tokens

Output:

Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
Total output length up to 32K tokens, subtracting the request input tokens

Training Dataset

These models were trained on a dataset that includes a wide variety of sources totalling approximately 11 trillion tokens. The knowledge cutoff date for the training data was June 2024.

Key components:

Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages.
Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions.
Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.
Audio: A diverse set of sound samples enables the model to recognize speech, transcribe text from recordings, and identify information in audio data.

Data Preprocessing

Key data cleaning and filtering methods applied to the training data:

CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content.
Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets.
Additional methods: Filtering based on content quality and safety in line with our policies.

Implementation Information

Hardware

Gemma was trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p and TPUv5e). Training generative models requires significant computational power. TPUs offer several advantages:

Performance: Specifically designed to handle the massive computations involved in training generative models
Memory: Large amounts of high-bandwidth memory for handling large models and batch sizes
Scalability: TPU Pods provide scalable solutions for handling growing complexity
Cost-effectiveness: More cost-effective solution compared to CPU-based infrastructure

Software

Training was done using JAX and ML Pathways. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.

Benchmark	Metric	n-shot	E2B PT	E4B PT
HellaSwag	Accuracy	10-shot	72.2	78.6
BoolQ	Accuracy	0-shot	76.4	81.6
PIQA	Accuracy	0-shot	78.9	81.0
SocialIQA	Accuracy	0-shot	48.8	50.0
TriviaQA	Accuracy	5-shot	60.8	70.2
Natural Questions	Accuracy	5-shot	15.5	20.9
ARC-c	Accuracy	25-shot	51.7	61.6
ARC-e	Accuracy	0-shot	75.8	81.6
WinoGrande	Accuracy	5-shot	66.8	71.7
BIG-Bench Hard	Accuracy	few-shot	44.3	52.9
DROP	Token F1 score	1-shot	53.9	60.8

Intended Usage

Content Creation and Communication

Text Generation: Generate creative text formats such as poems, scripts, code, marketing copy, and email drafts
Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications
Text Summarization: Generate concise summaries of a text corpus, research papers, or reports
Image Data Extraction: Extract, interpret, and summarize visual data for text communications
Audio Data Extraction: Transcribe spoken language, translate speech to text in other languages, and analyze sound-based data

Research and Education

NLP Research: These models can serve as a foundation for researchers to experiment with generative models and NLP techniques
Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice
Knowledge Exploration: Assist researchers in exploring large bodies of data by generating summaries or answering questions about specific topics

Limitations

Training Data: The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses.
Context and Task Complexity: Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging.
Language Ambiguity and Nuance: Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language.
Factual Accuracy: Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements.
Common Sense: Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations.

Prompting Gemma 3n E4B Instruct

Applications & Use Cases

Model Provider:

Google

Type:

Chat

Variant:

Parameters:

Deployment:

✔ Serverless

✔ On-Demand Dedicated

✔ Monthly Reserved

Quantization

Context length:

32K

Pricing:

$0.02 input/ $0.04 output

Check pricing

Run in playground

Deploy model

Quickstart docs

Gemma 3n E4B Instruct API

Gemma 3n E4B Instruct API Usage

How to use Gemma 3n E4B Instruct

Model details

Gemma 3n Description

Inputs and Outputs

Input:

Output:

Training Dataset

Key components:

Data Preprocessing

Implementation Information

Hardware

Software

Intended Usage

Content Creation and Communication

Research and Education

Limitations

Prompting Gemma 3n E4B Instruct

Applications & Use Cases

Looking for production scale? Deploy on a dedicated endpoint

Subscribe to newsletter