Models / DEEPSEEK

DeepSeek

See the reasoning. Slash the bill.

DeepSeek is the first open-weight model to outperform GPT-4 with transparent reasoning tokens, at one-tenth the price. Build with confidence.

Build with DeepSeek API

Get Started in Minutes

Drop-in OpenAI replacement—no code changes, no surprises on your bill. Switch from closed models to DeepSeek instantly with OpenAI-compatible endpoints


# Install the Together AI library
pip install together 

# Get started with DeepSeek-R1 
from together import Together 

client = Together() 

response = client.chat.completions.create( 
    model="deepseek-ai/DeepSeek-R1",
    messages=[ 
        {
            "role": "user", 
            "content": "What are some fun things to do in New York?"
        } 
    ] 
)  

print(response.choices[0].message.content)

View API Docs

Why DeepSeek on Together AI?

Know exactly why your model answers the way it does.

‍

The first reasoning models with fully transparent reasoning tokens, proven benchmark superiority, and complete model ownership for advanced enterprise deployment.

Unmatched Performance
Native chain-of-thought reasoning built into model architecture through large-scale reinforcement learning. DeepSeek R1 exposes its complete thinking process in <think> tags, enabling debugging and verification of model decisions.
‍
DeepSeek R1 beats OpenAI o1 on verified benchmarks
‍
Breakthrough Economics
Mixture-of-experts architecture activates only 37B of 671B parameters per token, delivering frontier performance at dramatically reduced computational cost and faster inference speeds.
‍
90% cost reduction vs Closed models without quality compromise
‍
Full Model Control
Download the weights or call the API—deploy on Together’s cloud or on-prem. No vendor lock-in.
‍
Complete data & model ownership vs closed models
‍

Meet the DeepSeek Pod

From ultra-efficient reasoning to efficient MoE design, choose the DeepSeek model that fits your needs.

DeepSeek R1-0528

Advanced Chain-of-Thought Reasoning

$3 input / $7 output per 1M tokens

37B
Active Params
671B
Total Params
128K
Input Context
CoT
Built-In

Key Strengths:

Superior mathematical reasoning
Transparent thinking process
Outperforms OpenAI o1 on MATH benchmarks

Try it out

DeepSeek R1-0528 Throughput

Production-Optimized Reasoning

$0.55 input / $2.19 output per 1M tokens

2x
Faster than R1
FP8
Quantization
128K
Input Context
CoT
Built-In

Key Strengths:

90% cost reduction vs o1
Throughput-optimized
Production-ready scaling

Try it out

DeepSeek V3-0324

Fast MoE Chat & Code Model

$1.25 per 1M tokens

37B
Active Params
671B
Total Params
131K
Input Context
MoE
Architecture

Key Strengths:

Efficient MoE design
Strong multilingual support
Competitive with GPT-4o at lower cost

Try it out

R1 Distilled Qwen 1.5B

Ultra-Fast Lightweight Reasoning

$0.18 per 1M tokens

1.5B
Parameters
FP16
Quantization
28.9
AIME'24
83.9%
MATH-500

Key Strengths:

Lowest latency
Minimal cost per request
High-frequency workload optimized

Try it out

The real* DeepSeek whale seen leaping out of the water near the Golden Gate Bridge.

Breakthrough Technical Innovations

DeepSeek models introduce game-changing architectural advances that redefine reasoning in open-source AI.

Mixture of Experts (MoE)
Sparse expert routing activates only 37B out of 671B parameters for each token in V3. Advanced load balancing without auxiliary losses maintains performance while reducing computational cost.
V3: 671B params, 37B active per token
Group Relative Policy Optimization (GRPO)
New RL approach that removes separate value networks in RLHF, using grouped relative advantage estimation to cut compute requirements while maintaining training stability.
R1: First major reasoning model trained with GRPO methodology
Native Reasoning Transparency
First reasoning model to expose complete thinking process in <think> tags. Native reasoning capabilities built into model foundation through large-scale reinforcement learning.
Pure RL training methodology enables step-by-step transparency
MetaP Training
First successful implementation of FP8 mixed precision training on a 671B parameter model. Pioneering reinforcement learning approach without supervised fine-tuning as preliminary step.
V3: 2.788M H800 GPU hours
Multi-Head Latent Attention
Innovative attention mechanism that reduces KV-cache memory requirements while maintaining modeling performance. Optimized for efficient inference deployment.
Optimized for inference efficiency
Multi-Token Prediction
Novel training objective that allows the model to predict multiple tokens simultaneously. Enhanced performance and efficiency through advanced training techniques.
V3 Enhanced performance & efficiency optimization

Deploy on Together AI

Access DeepSeek models through Together's optimized inference platform with enterprise-grade security and performance guarantees.

Serverless Endpoints
Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.
Best for:
- Prototyping and development
- Variable or unpredictable traffic
- Cost optimization for low volume
- Getting started quickly
DeepSeek R1-0528:
‍Starting at $0.55/1M tokens

‍DeepSeek V3:
Starting at $1.25/1M tokens
Try Serverless
On-Demand Dedicated
Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.
Best for:
- Production applications
- Extended model library access
- Predictable latency requirements
- Enterprise SLA needs
DeepSeek R1-0528:
‍$0.67/minute (8x H200)

‍DeepSeek V3:
$0.67/minute (8x H200)
Deploy Endpoint
Monthly Reserved
Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.
Best for:
- High-volume committed usage
- Enterprise security requirements
- Priority hardware access
- Maximum cost efficiency
Reserved GPU pricing:
Starting $0.98/hr

Volume Discounts: 
Up to 40% savings
Contact Sales

Enterprise-Grade Security

Your data and models remain fully under your control with industry-leading security standards.

SOC 2 Type II 
Comprehensive security controls audited by third parties.
HIPAA Compliant
Healthcare-grade data protection for sensitive workloads.
Model Ownership
You own your fine-tuned models and can deploy anywhere.
US-Based Infrastructure
Models hosted on secure North American servers with strict data sovereignty controls.

Real Performance Benchmarks

See how DeepSeek models stack up against the competition on verified benchmarks that matter

Model	AIME 2024 (Pass@1)	AIME 2025 (Pass@1)	GPQA Diamond (Pass@1)	LiveCodeBench (Pass@1)	Aider (Pass@1)	Humanity’s Last Exam (Pass@1)
DeepSeek-R1-0528	91.4%	87.5%	81.0%	73.3%	71.6%	17.7%
OpenAI-o3	91.6%	88.9%	83.3%	77.3%	79.6%	20.6%
Gemini-2.5-Pro-0506	90.8%	83.0%	83.0%	71.8%	76.9%	18.4%
Qwen3-235B	85.7%	81.5%	71.1%	66.5%	65.0%	11.8%
DeepSeek-R1	79.8%	70.0%	71.5%	63.5%	57.0%	8.5%

Model	MMLU-Pro (EM)	GPQA Diamond (Pass@1)	MATH-500 (Pass@1)	AIME 2024 (Pass@1)	LiveCodeBench (Pass@1)
DeepSeek-V3-0324	81.2%	68.4%	94.0%	59.4%	49.2%
DeepSeek-V3	75.9%	59.1%	90.2%	39.6%	39.2%
Qwen-Max	76.1%	60.1%	82.6%	26.7%	38.7%
GPT-4.5	86.1%	71.4%	90.7%	36.7%	44.4%
Claude-Sonnet-3.7	80.7%	68.0%	82.2%	23.3%	42.2%

Model	MATH-500	AIME 2024	LiveCodeBench	Cost per 1M tokens
R1 Distilled Llama 70B	94.5%	70.0%	57.5%	$2.00
R1 Distilled Qwen 32B	94.3%	72.6%	57.2%	$1.60
GPT-4o	74.6%	9.3%	34.2%	$5-20
OpenAI o1-mini	90.0%	63.6%	53.8%	$3-12
Claude 3.5 Sonnet	78.3%	16.0%	33.8%	$3-15

Try DeepSeek Models - Free

Experience the performance difference in Together Chat.

Launch Together Chat View API Docs

Frequently Asked Questions

How does DeepSeek R1's reasoning compare to OpenAI o1?

DeepSeek R1 offers superior reasoning capabilities with native chain-of-thought built into the architecture. Verified benchmarks show R1 achieving 97.3% on MATH-500 vs o1's 96.4%, with full transparency of the reasoning process through <think> tags.

What are the current pricing rates for DeepSeek models?

- DeepSeek R1: $3 input / $7 output per million tokens
- DeepSeek R1 Throughput: $0.55 input / $2.19 output per million tokens
- DeepSeek V3: $1.25 per million tokens 
All models offer 70–90% cost savings compared to similar closed models.

Can I fine-tune DeepSeek models on my own data?

Yes! DeepSeek models are open-weight with MIT licensing, meaning you can fine-tune them for your specific use cases, own the resulting model weights, and use them commercially. Deploy anywhere without restrictions.

What are the context length limits for each model?

- DeepSeek V3: 128K token context length
- DeepSeek R1: 128K token context length

How do I migrate from OpenAI to DeepSeek on Together AI?

Migration is seamless with Together AI’s OpenAI-compatible API. Simply change the base URL and model name in your existing code. Same API format, better reasoning, and transparent costs.

What makes the Mixture of Experts (MoE) architecture special?

DeepSeek models use MoE architecture where only 37B of 671B parameters activate per token. This delivers frontier performance at dramatically reduced computational cost and faster inference speeds.

Is there really a free DeepSeek model?

Yes! DeepSeek R1 Distilled Llama 70B Free is completely free with reduced rate limits. It beats GPT-4o on math problems and matches o1-mini on coding tasks.

What's the difference between R1 and the distilled models?

R1 is the full 671B parameter reasoning model. Distilled models are smaller (1.5B–70B) versions trained on reasoning examples from R1, offering similar capabilities at lower cost and faster speeds.

DeepSeek

Get Started in Minutes

Why DeepSeek on Together AI?

Unmatched Performance

Breakthrough Economics

Full Model Control

Meet the DeepSeek Pod

Breakthrough Technical Innovations

Mixture of Experts (MoE)

Group Relative Policy Optimization (GRPO)

Native Reasoning Transparency

MetaP Training

Multi-Head Latent Attention

Multi-Token Prediction

Deploy on Together AI

Serverless Endpoints

On-Demand Dedicated

Monthly Reserved

Enterprise-Grade Security

Real Performance Benchmarks

Try DeepSeek Models - Free

Frequently Asked Questions

Subscribe to newsletter