Interested in running DeepSeek-R1 in production?

Request access to Together Dedicated Endpoints—private and fast DeepSeek-R1 inference at scale.

  • Fastest inference: Our DeepSeek-R1 API runs 10x faster than DeepSeek's API
  • Flexible scaling: Deploy via Together Serverless or dedicated endpoints
  • High throughput: Up to 334 tokens/sec on dedicated infrastructure
  • Secure & reliable: Private, compliant, and built for production

First name*

Last Name*

COMPANY Email*

COMPANY LOCATION*

What peak QUERIES PER SECOND would you like to support?*

Are you interested in nvidia DGX cloud?*

Thank you for reaching out.

We'll get back to you shortly!

Oops! Something went wrong while submitting the form.

DeepSeek-R1 on Together AI

Unmatched performance. Cost-effective scaling. Secure infrastructure.

  • Fastest inference engine

    We run DeepSeek-R1 10x faster than DeepSeek's API, ensuring low-latency performance for production workloads.

  • Scalable infrastructure

    Whether you're just starting out or scaling to production workloads, choose from Together Serverless APIs for flexible, pay-per-token usage or dedicated endpoints for predictable, high-volume operations.

  • Security-first approach

    We host all models in our own data centers, with no data sharing back to DeepSeek. Developers retain full control over their data with opt-out privacy settings.

Seamlessly scale your R1 deployment

  • Together Serverless API

    We run DeepSeek-R1 10x faster than DeepSeek's API, ensuring low-latency performance for production workloads.

    • Instant scalability and generous rate limits
    • Flexible, pay-per-token pricing with no long-term commitments
    • Full opt-out privacy controls
  • Together Dedicated Endpoints

    Whether you're just starting out or scaling to production workloads, choose from Together Serverless APIs for flexible, pay-per-token usage or dedicated endpoints for predictable, high-volume operations.

    • Low latency (speeds up to 334 tokens/sec) from Together Inference stack
    • High-performance NVIDIA HGX B200 GPUs, optimized for reasoning models
    • Contract-based pricing for predictable, cost-effective scaling

Powering the next generation of reasoning models

Use our API to deploy DeepSeek-R1 on the fastest inference stack available with optimal cost efficiency. Servers are available in North America with complete data privacy controls.