Models / Chat / Gemma 3 4B API
Gemma 3 4B API
Chat
Code
Vision
Lightweight Gemma 3 model (1B) with 128K context, vision-language input, and multilingual support for on-device AI.
Deploy Gemma 3 4B

To run this model, you first need to deploy it on a Dedicated Endpoint.
Gemma 3 4B API Usage
Endpoint
google/gemma-3-4b-it
RUN INFERENCE
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "your-dedicated-endpoint-url",
"messages": [
{
"role": "user",
"content": "What are some fun things to do in New York?"
}
]
}'
RUN INFERENCE
from together import Together
client = Together()
response = client.chat.completions.create(
model="your-dedicated-endpoint-url",
messages=[
{
"role": "user",
"content": "What are some fun things to do in New York?"
}
]
)
print(response.choices[0].message.content)
RUN INFERENCE
import Together from "together-ai";
const together = new Together();
const response = await together.chat.completions.create({
messages: [
{
role: "user",
content: "What are some fun things to do in New York?"
}
],
model: "your-dedicated-endpoint-url"
});
console.log(response.choices[0].message.content)
How to use Gemma 3 4B
Model details
Prompting Gemma 3 4B
Applications & Use Cases
Model Provider:
Type:
Chat
Variant:
Parameters:
4B
Deployment:
✔ Serverless
✔ On-Demand Dedicated
✔ Monthly Reserved
Quantization
Context length:
64K
Pricing:
Check pricing
Run in playground
Deploy model
Quickstart docs
Quickstart docs
On-Demand Dedicated
Monthly Reserved
Looking for production scale? Deploy on a dedicated endpoint
Deploy Gemma 3 4B on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.
