This website uses cookies to anonymously analyze website traffic using Google Analytics.

Models / DEEPSEEK

DeepSeek

See the reasoning. Slash the bill.

DeepSeek is the first open-weight model to outperform GPT-4 with transparent reasoning tokens, at one-tenth the price. Build with confidence.

Get Started in Minutes

Drop-in OpenAI replacement—no code changes, no surprises on your bill. Switch from closed models to DeepSeek instantly with OpenAI-compatible endpoints


# Install the Together AI library
pip install together 

# Get started with DeepSeek-R1 
from together import Together 

client = Together() 

response = client.chat.completions.create( 
    model="deepseek-ai/DeepSeek-R1",
    messages=[ 
        {
            "role": "user", 
            "content": "What are some fun things to do in New York?"
        } 
    ] 
)  

print(response.choices[0].message.content)

View API Docs

Why DeepSeek on Together AI?

Know exactly why your model answers the way it does.

The first reasoning models with fully transparent reasoning tokens, proven benchmark superiority, and complete model ownership for advanced enterprise deployment.

Meet the DeepSeek Pod

From ultra-efficient reasoning to efficient MoE design, choose the DeepSeek model that fits your needs.

DeepSeek R1-0528

Advanced Chain-of-Thought Reasoning

  • 37B

    Active Params

  • 671B

    Total Params

  • 128K

    Input Context

  • CoT

    Built-In

Key Strengths:

  • Superior mathematical reasoning

  • Transparent thinking process

  • Outperforms OpenAI o1 on MATH benchmarks

DeepSeek R1-0528 Throughput

Production-Optimized Reasoning

  • 2x

    Faster than R1

  • FP8

    Quantization

  • 128K

    Input Context

  • CoT

    Built-In

Key Strengths:

  • 90% cost reduction vs o1

  • Throughput-optimized

  • Production-ready scaling

DeepSeek V3-0324

Fast MoE Chat & Code Model

  • 37B

     Active Params

  • 671B

    Total Params

  • 131K

    Input Context

  • MoE

    Architecture

Key Strengths:

  • Efficient MoE design

  • Strong multilingual support

  • Competitive with GPT-4o at lower cost

R1 Distilled Qwen 1.5B

Ultra-Fast Lightweight Reasoning

  • 1.5B

     Parameters

  • FP16

    Quantization

  • 28.9

    AIME'24

  • 83.9%

    MATH-500

Key Strengths:

  • Lowest latency

  • Minimal cost per request

  • High-frequency workload optimized

The real* DeepSeek whale seen leaping out of the water near the Golden Gate Bridge.

Breakthrough Technical Innovations

DeepSeek models introduce game-changing architectural advances that redefine reasoning in open-source AI.

  • Mixture of Experts (MoE)

    Sparse expert routing activates only 37B out of 671B parameters for each token in V3. Advanced load balancing without auxiliary losses maintains performance while reducing computational cost.

    V3: 671B params, 37B active per token

  • Group Relative Policy Optimization (GRPO)

    New RL approach that removes separate value networks in RLHF, using grouped relative advantage estimation to cut compute requirements while maintaining training stability.

    R1: First major reasoning model trained with GRPO methodology

  • Native Reasoning Transparency

    First reasoning model to expose complete thinking process in <think> tags. Native reasoning capabilities built into model foundation through large-scale reinforcement learning.

    Pure RL training methodology enables step-by-step transparency

  • MetaP Training

    First successful implementation of FP8 mixed precision training on a 671B parameter model. Pioneering reinforcement learning approach without supervised fine-tuning as preliminary step.

    V3: 2.788M H800 GPU hours

  • Multi-Head Latent Attention

    Innovative attention mechanism that reduces KV-cache memory requirements while maintaining modeling performance. Optimized for efficient inference deployment.

    Optimized for inference efficiency

  • Multi-Token Prediction

    Novel training objective that allows the model to predict multiple tokens simultaneously. Enhanced performance and efficiency through advanced training techniques.

    V3 Enhanced performance & efficiency optimization

Deploy on Together AI

Access DeepSeek models through Together's optimized inference platform with enterprise-grade security and performance guarantees.

  • Serverless Endpoints

    Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.

    Best for:

    • Prototyping and development

    • Variable or unpredictable traffic

    • Cost optimization for low volume

    • Getting started quickly

    DeepSeek R1-0528:
    Starting at $0.55/1M tokens

    DeepSeek V3:
    Starting at $1.25/1M tokens

  • On-Demand Dedicated

    Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.

    Best for:

    • Production applications

    • Extended model library access

    • Predictable latency requirements

    • Enterprise SLA needs

    DeepSeek R1-0528:
    $0.67/minute (8x H200)

    DeepSeek V3:
    $0.67/minute (8x H200)

  • Monthly Reserved

    Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.

    Best for:

    • High-volume committed usage

    • Enterprise security requirements

    • Priority hardware access

    • Maximum cost efficiency

    Reserved GPU pricing:
    Starting $0.98/hr

    Volume Discounts:

    Up to 40% savings

Enterprise-Grade Security

Your data and models remain fully under your control with industry-leading security standards.

  • SOC 2 Type II


    Comprehensive security controls audited by third parties.

  • HIPAA Compliant

    Healthcare-grade data protection for sensitive workloads.

  • Model Ownership

    You own your fine-tuned models and can deploy anywhere.

  • US-Based Infrastructure

    Models hosted on secure North American servers with strict data sovereignty controls.

Real Performance Benchmarks

See how DeepSeek models stack up against the competition on verified benchmarks that matter

Model

AIME 2024
(Pass@1)

AIME 2025
(Pass@1)

GPQA Diamond
(Pass@1)

LiveCodeBench
(Pass@1)

Aider
(Pass@1)

Humanity’s Last Exam
(Pass@1)

DeepSeek-R1-0528

91.4%

87.5%

81.0%

73.3%

71.6%

17.7%

OpenAI-o3

91.6%

88.9%

83.3%

77.3%

79.6%

20.6%

Gemini-2.5-Pro-0506

90.8%

83.0%

83.0%

71.8%

76.9%

18.4%

Qwen3-235B

85.7%

81.5%

71.1%

66.5%

65.0%

11.8%

DeepSeek-R1

79.8%

70.0%

71.5%

63.5%

57.0%

8.5%

Model

MMLU-Pro
(EM)

GPQA Diamond
(Pass@1)

MATH-500
(Pass@1)

AIME 2024
(Pass@1)

LiveCodeBench
(Pass@1)

DeepSeek-V3-0324

81.2%

68.4%

94.0%

59.4%

49.2%

DeepSeek-V3

75.9%

59.1%

90.2%

39.6%

39.2%

Qwen-Max

76.1%

60.1%

82.6%

26.7%

38.7%

GPT-4.5

86.1%

71.4%

90.7%

36.7%

44.4%

Claude-Sonnet-3.7

80.7%

68.0%

82.2%

23.3%

42.2%

Model

MATH-500

AIME 2024

LiveCodeBench

Cost per 1M tokens

R1 Distilled Llama 70B

94.5%

70.0%

57.5%

$2.00

R1 Distilled Qwen 32B

94.3%

72.6%

57.2%

$1.60

GPT-4o

74.6%

9.3%

34.2%

$5-20

OpenAI o1-mini

90.0%

63.6%

53.8%

$3-12

Claude 3.5 Sonnet

78.3%

16.0%

33.8%

$3-15

Try DeepSeek Models - Free

Experience the performance difference in Together Chat.

Frequently Asked Questions

How does DeepSeek R1's reasoning compare to OpenAI o1?

DeepSeek R1 offers superior reasoning capabilities with native chain-of-thought built into the architecture. Verified benchmarks show R1 achieving 97.3% on MATH-500 vs o1's 96.4%, with full transparency of the reasoning process through <think> tags.

What are the current pricing rates for DeepSeek models?

- DeepSeek R1: $3 input / $7 output per million tokens
- DeepSeek R1 Throughput: $0.55 input / $2.19 output per million tokens
- DeepSeek V3: $1.25 per million tokens

All models offer 70–90% cost savings compared to similar closed models.

Can I fine-tune DeepSeek models on my own data?

Yes! DeepSeek models are open-weight with MIT licensing, meaning you can fine-tune them for your specific use cases, own the resulting model weights, and use them commercially. Deploy anywhere without restrictions.

What are the context length limits for each model?

- DeepSeek V3: 128K token context length
- DeepSeek R1: 128K token context length

How do I migrate from OpenAI to DeepSeek on Together AI?

Migration is seamless with Together AI’s OpenAI-compatible API. Simply change the base URL and model name in your existing code. Same API format, better reasoning, and transparent costs.

What makes the Mixture of Experts (MoE) architecture special?

DeepSeek models use MoE architecture where only 37B of 671B parameters activate per token. This delivers frontier performance at dramatically reduced computational cost and faster inference speeds.

Is there really a free DeepSeek model?

Yes! DeepSeek R1 Distilled Llama 70B Free is completely free with reduced rate limits. It beats GPT-4o on math problems and matches o1-mini on coding tasks.

What's the difference between R1 and the distilled models?

R1 is the full 671B parameter reasoning model. Distilled models are smaller (1.5B–70B) versions trained on reasoning examples from R1, offering similar capabilities at lower cost and faster speeds.