This website uses cookies to anonymously analyze website traffic using Google Analytics.


Inference pricing

Over 100 leading open-source Chat, Language, Image, Code, and Embedding models are available through the Together Inference API. For these models you pay just for what you use.

Serverless Endpoints

Prices are per 1 million tokens including input and output tokens for Chat, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.






    • 8B







    • 70B







    • 405B



  • CHat, language, code and moderation models

    • Model size

      price 1M tokens

    • Up to 4B

      price 1M tokens


    • 4.1B - 8B

      price 1M tokens


    • 8.1B - 21B

      price 1M tokens


    • 21.1B - 41B

      price 1M tokens


    • 41.1B - 80B

      price 1M tokens


    • 80.1B - 110B

      price 1M tokens


  • Mixture-of-experts

    • Model size

      price 1M tokens

    • Up to 56B total parameters

      price 1M tokens


    • 56.1B - 176B total parameters

      price 1M tokens


    • 176.1B - 480B total parameters

      price 1M tokens


  • EMbeddings models

    • Model size

      price 1M tokens

    • Up to 150M

      price 1M tokens


    • 151M - 350M

      price 1M tokens


  • Image models

    • Image Size

      25 steps

      50 steps

      75 steps

      100 steps

    • 512X512

      25 steps


      50 steps


      75 steps


      100 steps


    • 1024X1024

      25 steps


      50 steps


      75 steps


      100 steps



    • Model size

      price 1M tokens

    • 4.1B - 8B

      price 1M tokens


Dedicated endpoints

When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Together Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.

  • Your fine-tuned models

    • hardware type

      price per MINUTE HOSTed

    • 1x RTX-6000 48GB


    • 1x L40 48GB


    • 1x L40S 48GB


    • 1x A100 PCIe 80GB


    • 1x A100 SXM 40GB


    • 1x A100 SXM 80GB


    • 1x H100 80GB


Interested in a dedicated endpoint for your own model?

Fine-tuning pricing

Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.

  • Download checkpoints and final model weights.

  • View job status and logs through CLI or Playgrounds.

  • Deploy a model instantly once it’s fine-tuned.

Try the interactive calculator

Together GPU Clusters Pricing

Together Compute provides private, state of the art clusters with H100 and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.



  • A100 PCIe 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet

  • A100 SXM 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available

  • H100 80GB

    price 1k tokens

    3.2 Tbps Infiniband