Models / Chat / Llama 4 Maverick API
Llama 4 Maverick API
SOTA 128-expert MoE powerhouse for multilingual image/text understanding, creative writing, and enterprise-scale applications.

Together AI offers day 1 support for the new Llama 4 multilingual vision models that can analyze multiple images and respond to queries about them.
Register for a Together AI account to get an API key. New accounts come with free credits to start. Install the Together AI library for your preferred language.
Llama 4 Maverick API Usage
Endpoint
RUN INFERENCE
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
"messages": [],
"stream": true
}'
JSON RESPONSE
RUN INFERENCE
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
messages=[],
stream=True
)
for token in response:
if hasattr(token, 'choices'):
print(token.choices[0].delta.content, end='', flush=True)
JSON RESPONSE
RUN INFERENCE
import Together from "together-ai";
const together = new Together();
const response = await together.chat.completions.create({
messages: [],
model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
stream: true
});
for await (const token of response) {
console.log(token.choices[0]?.delta?.content)
}
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Parameters:
400B
Deployment:
✔ Serverless
✔️ On-Demand Dedicated
Quantization
FP8
Context length:
1M
Pricing:
Input: $0.27 | Output: $0.85
Check pricing
Run in playground
Deploy model
Quickstart docs
Quickstart docs
How to use Llama 4 Maverick
Input
Output
Function Calling
Input
Output
Query models with multiple images
Currently this model supports 5 images as input.
Input
Output
Model details
- Model String: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
- Specs:
- 17B active parameters (400B total)
- 128-expert MoE architecture
- 524,288 context length (will be increased to 1M)
- Support for 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese
- Multimodal capabilities (text + images)
- Support Function Calling
- Best for: Enterprise applications, multilingual support, advanced document intelligence
- Knowledge Cutoff: August 2024
Prompting Llama 4 Maverick
Applications & Use Cases
- Multilingual customer support with visual context: Process and respond to customer inquiries with attached screenshots in 12 different languages, enabling support teams to quickly diagnose technical issues by understanding both the user's description and visual evidence simultaneously.
- Generating marketing content from multimodal PDFs: Create compelling marketing materials by analyzing existing multimedia PDFs containing both text and visuals, extracting key themes, and generating new content that maintains brand consistency across formats.
- Advanced document intelligence with text, diagrams, and tables: Extract structured information from complex documents containing a mix of text, diagrams, tables, and graphs, enabling automated analysis of technical manuals, financial reports, and research papers with unprecedented accuracy.
How to use Llama 4 Maverick
Input
Output
Function Calling
Input
Output
Query models with multiple images
Currently this model supports 5 images as input.
Input
Output
Looking for production scale? Deploy on a dedicated endpoint
Deploy Llama 4 Maverick on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.
