How to deploy DeepSeek-R1 and distilled models securely on Together AI

DeepSeek-R1 has taken the world by storm by establishing itself as a formidable open-weight competitor to proprietary reasoning models like OpenAI's o1, delivering powerful reasoning at a fraction of the cost.

Together AI is one of the only platforms offering both the full R1 and the distilled models, with opt-out privacy controls and serverless pay-per-token pricing—letting you experiment freely without costly GPU deployments.

To help developers get started, we're also offering a completely free endpoint for DeepSeek-R1 Llama 70B distilled, giving you immediate access to the power of reasoning models.

Below, we cover all the deployment options for DeepSeek-R1 on Together AI. This list will continue to evolve as we add more capabilities to the DeepSeek-R1 experience.

Deploying the main DeepSeek-R1 model

The core of the DeepSeek-R1 family is the main R1 model—an AI powerhouse that rivals OpenAI's o1 in reasoning tasks, while running 9x cheaper. At 671 billion parameters, with 37 billion activated, it's a very large model, and deploying a model of this size is no small task. Together AI is one of the only providers that serve the full DeepSeek-R1 model on high-performance serverless infrastructure, ensuring you pay only per token at a competitive rate of $7 per 1 million tokens—9x cheaper than OpenAI's o1.

Why serverless deployment for R1 matters

Many companies and developers are eager to test DeepSeek-R1 to see how it stacks up against proprietary models. Our OpenAI-compatible APIs make this seamless, and because of its large size, deploying DeepSeek-R1 on Together Serverless offers significant advantages. Unlike hyperscalers, which require instance-based GPU deployments and charge by the GPU hour, Together Serverless provides:

Pay-per-token pricing that's ideal for experimentation without the upfront cost of dedicated GPUs
High-performance infrastructure tailored to large models
Full flexibility to scale deployments as needed

Try the full DeepSeek-R1 model on Together Serverless today →

Other providers only offer the R1 distilled – not DeepSeek-R1

It's important to clarify the difference between the large DeepSeek-R1 model and the distilled variants. The distilled models are not the main DeepSeek-R1 model. Instead they are other leading open-source models like Llama and Qwen that have been fine-tuned with reasoning examples that were generated by DeepSeek-R1.

While some providers offer only the distilled models, Together AI stands out by offering both. This distinction is crucial—because of its size, the full DeepSeek-R1 model requires specialized high-performance GPU infrastructure that many platforms lack. At Together AI, we run the full DeekSeek-R1 model on dedicated high-performance GPUs in our data centers, ensuring optimal performance and cost efficiency.

Security-first deployments: Deploying DeepSeek-R1 with privacy controls

Security is a key consideration when deploying AI models. Unlike DeepSeek's own API, which doesn't provide opt-out controls for data sharing, Together AI prioritizes privacy. With full opt-out privacy controls, your data remains secure and is never shared back with DeepSeek.

Because we host all the R1 models in our own data centers, you can be assured that your sensitive information is always protected while you experiment with both the main DeepSeek-R1, and the distilled models.

Deploying the DeepSeek-R1 distilled models on Together Serverless

The DeepSeek-R1 distilled models extend the reach of advanced reasoning by fine-tuning smaller open-source models like Llama and Qwen on 800,000 examples from the main DeepSeek-R1 model. This brings powerful reasoning capabilities to models that are more accessible for a range of applications. These distilled models are particularly strong in areas like code and math, check out the benchmarks below:

Together AI offers these R1 distilled models on Together Serverless:

DeepSeek-R1 Distilled Llama 70: Surpasses GPT-4o with 94.5% accuracy on MATH-500 and matches o1-mini on coding tasks. Try it now →
DeepSeek-R1 Distilled Qwen 14: Outperforms GPT-4o in math and matches o1-mini on coding. Try it now →
DeepSeek-R1 Distilled Qwen 1.5: Small Qwen 1.5B model, fine-tuned to deliver superior performance on math while remaining compact and efficient. Try it now →

With pay-per-token pricing on Together Serverless, developers can easily experiment with these models without the overhead of traditional deployments.

Free endpoint for DeepSeek-R1 Llama 70B distilled

We're also excited to offer a completely free endpoint for DeepSeek-R1 Llama 70B distilled, making it easier than ever to experiment with powerful reasoning models.

Chat interface with model and parameter settings, including output length and temperature sliders.

All the DeepSeek-R1 models show their chain-of-thought (CoT) reasoning traces making it easy to see how they "think" and further improve your prompts and outputs.

Note: The free model endpoint has reduced rate limits and performance compared to our paid Turbo endpoints for any of the DeepSeek-R1 models.

Get started with DeepSeek-R1 on Together Serverless

We're excited to bring the full DeepSeek-R1 model family to developers, making it easier than ever to leverage cutting-edge reasoning models securely and affordably.

Getting started is easy:

Sign up for an account at Together AI.
Get your API key and add some credits to your account.
Start sending requests to DeepSeek-R1 via our playground, API, or Python/TypeScript SDKs. Follow our API quickstart to get up and running in minutes.

Our APIs are fully OpenAI compatible, making integration simple and seamless.

Contact us to discuss production traffic deployments of DeepSeek-R1 and learn how we can support your enterprise needs.

Try DeepSeek-R1 securely on Together AI

Run the full DeepSeek-R1 model with opt-out privacy controls, and only pay per token pricing.

Get started

Start experimenting with R1 reasoning for free

Try out our free endpoint for DeepSeek-R1 70B distilled today.

Get started for free