⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Models / DeepSeek

Reasoning

Chat

Code

LLM

DeepSeek V4 Flash

Efficient million-token context intelligence at 13B activated parameters

Performance benchmarks

Model	AIME 2025	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
DeepSeek V4 Flash				91.60%		79.00%
Related open-source models
Competitor closed-source models
Claude Opus 4.6		90.5%	34.2%			78.7%
OpenAI o3		83.3%	24.9%		99.2%	62.3%
OpenAI o1		76.8%			96.4%	48.9%
GPT-4o		49.2%	2.7%	32.3%	89.3%	31.0%

This model is coming soon to Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Related models

Model specifications

Model provider
DeepSeek
Type
Reasoning
Chat
Code
LLM
Main use cases
Reasoning
Features
Function Calling
JSON Mode
Intelligence
High
Parameters
284B
Activated parameters
13B
Context length
1M

Quantization level
FP4
External link
Provider docs
Category
Chat