Models / Chat / Llama 4 Scout API
Llama 4 Scout API
SOTA 109B model with 17B active params & large context, excelling at multi-document analysis, codebase reasoning, and personalized tasks.

Together AI offers day 1 support for the new Llama 4 multilingual vision models that can analyze multiple images and respond to queries about them.
Register for a Together AI account to get an API key. New accounts come with free credits to start. Install the Together AI library for your preferred language.
Llama 4 Scout API Usage
Endpoint
RUN INFERENCE
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
"messages": [],
"stream": true
}'
JSON RESPONSE
RUN INFERENCE
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[],
stream=True
)
for token in response:
if hasattr(token, 'choices'):
print(token.choices[0].delta.content, end='', flush=True)
JSON RESPONSE
RUN INFERENCE
import Together from "together-ai";
const together = new Together();
const response = await together.chat.completions.create({
messages: [],
model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
stream: true
});
for await (const token of response) {
console.log(token.choices[0]?.delta?.content)
}
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Parameters:
109B
Deployment:
✔ Serverless
✔️ On-Demand Dedicated
Quantization
FP16
Context length:
1M
Pricing:
Input: $0.18 | Output: $0.59
Check pricing
Run in playground
Deploy model
Quickstart docs
Quickstart docs
How to use Llama 4 Scout
Input
Output
Function Calling
Input
Output
Query models with multiple images
Currently this model supports 5 images as input.
Input
Output
Model details
- Model String: meta-llama/Llama-4-Scout-17B-16E-Instruct
- Specs:
- 17B active parameters (109B total)
- 16-expert MoE architecture
- 327,680 context length (will be increased to 10M)
- Support for 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese
- Multimodal capabilities (text + images)
- Support Function Calling
- Best for: Multi-document analysis, codebase reasoning, and personalized tasks
- Knowledge Cutoff: August 2024
Prompting Llama 4 Scout
Applications & Use Cases
- Multi-document summarization for legal/financial analysis: Analyze multiple legal contracts or financial statements simultaneously, identifying key terms, inconsistencies, and patterns across documents to generate comprehensive summaries and risk assessments.
- Personalized task automation using years of user data: Create tailored automation workflows by analyzing an individual's historical data patterns, communication style, and preferences, enabling highly personalized digital assistants that adapt to specific user needs.
- Efficient image parsing for multimodal applications: Process and understand image content in conjunction with text to power applications like visual search, content moderation, and accessibility features that require understanding the relationship between visual and textual elements.
How to use Llama 4 Scout
Input
Output
Function Calling
Input
Output
Query models with multiple images
Currently this model supports 5 images as input.
Input
Output
Looking for production scale? Deploy on a dedicated endpoint
Deploy Llama 4 Scout on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.
