Model Library

How to deploy DeepSeek-R1 and distilled models securely on Together AI

January 31, 2025

By 

Together AI

DeepSeek-R1 has taken the world by storm by establishing itself as a formidable open-weight competitor to proprietary reasoning models like OpenAI’s o1, delivering powerful reasoning at a fraction of the cost. 

Together AI is one of the only platforms offering both the full R1 and the distilled models, with opt-out privacy controls and serverless pay-per-token pricing—letting you experiment freely without costly GPU deployments.

To help developers get started, we’re also offering a completely free endpoint for DeepSeek-R1 Llama 70B distilled, giving you immediate access to the power of reasoning models.

Below, we cover all the deployment options for DeepSeek-R1 on Together AI. This list will continue to evolve as we add more capabilities to the DeepSeek-R1 experience.

{{custom-cta-1}}

Deploying the main DeepSeek-R1 model 

The core of the DeepSeek-R1 family is the main R1 model—an AI powerhouse that rivals OpenAI’s o1 in reasoning tasks, while running 9x cheaper. At 671 billion parameters, with 37 billion activated, it’s a very large model, and deploying a model of this size is no small task. Together AI is one of the only providers that serve the full DeepSeek-R1 model on high-performance serverless infrastructure, ensuring you pay only per token at a competitive rate of $7 per 1 million tokens—9x cheaper than OpenAI’s o1.

Why serverless deployment for R1 matters

Many companies and developers are eager to test DeepSeek-R1 to see how it stacks up against proprietary models. Our OpenAI-compatible APIs make this seamless, and because of its large size, deploying DeepSeek-R1 on Together Serverless offers significant advantages. Unlike hyperscalers, which require instance-based GPU deployments and charge by the GPU hour, Together Serverless provides:

  • Pay-per-token pricing that’s ideal for experimentation without the upfront cost of dedicated GPUs
  • High-performance infrastructure tailored to large models
  • Full flexibility to scale deployments as needed

Try the full DeepSeek-R1 model on Together Serverless today → 

Other providers only offer the R1 distilled – not DeepSeek-R1 

It’s important to clarify the difference between the large DeepSeek-R1 model and the distilled variants. The distilled models are not the main DeepSeek-R1 model. Instead they are other leading open-source models like Llama and Qwen that have been fine-tuned with reasoning examples that were generated by DeepSeek-R1.

While some providers offer only the distilled models, Together AI stands out by offering both. This distinction is crucial—because of its size, the full DeepSeek-R1 model requires specialized high-performance GPU infrastructure that many platforms lack. At Together AI, we run the full DeekSeek-R1 model on dedicated high-performance GPUs in our data centers, ensuring optimal performance and cost efficiency.

Security-first deployments: Deploying DeepSeek-R1 with privacy controls

Security is a key consideration when deploying AI models. Unlike DeepSeek’s own API, which doesn’t provide opt-out controls for data sharing, Together AI prioritizes privacy. With full opt-out privacy controls, your data remains secure and is never shared back with DeepSeek.

Because we host all the R1 models in our own data centers, you can be assured that your sensitive information is always protected while you experiment with both the main DeepSeek-R1, and the distilled models.

Deploying the DeepSeek-R1 distilled models on Together Serverless

The DeepSeek-R1 distilled models extend the reach of advanced reasoning by fine-tuning smaller open-source models like Llama and Qwen on 800,000 examples from the main DeepSeek-R1 model. This brings powerful reasoning capabilities to models that are more accessible for a range of applications. These distilled models are particularly strong in areas like code and math, check out the benchmarks below:

Together AI offers these R1 distilled models on Together Serverless:

  • DeepSeek-R1 Distilled Llama 70: Surpasses GPT-4o with 94.5% accuracy on MATH-500 and matches o1-mini on coding tasks. Try it now →
  • DeepSeek-R1 Distilled Qwen 14: Outperforms GPT-4o in math and matches o1-mini on coding. Try it now →
  • DeepSeek-R1 Distilled Qwen 1.5: Small Qwen 1.5B model, fine-tuned to deliver superior performance on math while remaining compact and efficient. Try it now →

With pay-per-token pricing on Together Serverless, developers can easily experiment with these models without the overhead of traditional deployments.

Free endpoint for DeepSeek-R1 Llama 70B distilled

We’re also excited to offer a completely free endpoint for DeepSeek-R1 Llama 70B distilled, making it easier than ever to experiment with powerful reasoning models.

All the DeepSeek-R1 models show their chain-of-thought (CoT) reasoning traces making it easy to see how they “think” and further improve your prompts and outputs. 

Note: The free model endpoint has reduced rate limits and performance compared to our paid Turbo endpoints for any of the DeepSeek-R1 models.

{{custom-cta-2}}

Get started with DeepSeek-R1 on Together Serverless

We’re excited to bring the full DeepSeek-R1 model family to developers, making it easier than ever to leverage cutting-edge reasoning models securely and affordably.

Getting started is easy:

  1. Sign up for an account at Together AI.
  2. Get your API key and add some credits to your account.
  3. Start sending requests to DeepSeek-R1 via our playground, API, or Python/TypeScript SDKs. Follow our API quickstart to get up and running in minutes.

Our APIs are fully OpenAI compatible, making integration simple and seamless.

Contact us to discuss production traffic deployments of DeepSeek-R1 and learn how we can support your enterprise needs.

Try DeepSeek-R1 securely on Together AI

Run the full DeepSeek-R1 model with opt-out privacy controls, and only pay per token pricing.

Start experimenting with R1 reasoning for free

Try out our free endpoint for DeepSeek-R1 70B distilled today.

Try DeepSeek-R1 securely on Together AI

Run the full DeepSeek-R1 model with opt-out privacy controls, and only pay per token pricing.

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #2

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #3

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Grow

Benefits included:

  • ✔ Up to $30K in free platform credits*

  • ✔ 6 hours of free forward-deployed engineering time.

Funding: $5M-$10M

Scale

Benefits included:

  • ✔ Up to $50K in free platform credits*

  • ✔ 10 hours of free forward-deployed engineering time.

Funding: $10M-$25M

Multilinguality

Word limit

Disclaimer

JSON formatting

Uppercase only

Remove commas

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

  • A. 2.08*1e-1 m
  • B. 2.08*1e-9 m
  • C. 2.08*1e-6 m
  • D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

  • A. releasing nitrogen in the soil.
  • B. crowding out non-native species.
  • C. adding carbon dioxide to the atmosphere.
  • D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

Start
building
yours
here →