This website uses cookies to anonymously analyze website traffic using Google Analytics.

together.pricing

Inference pricing

Over 100 leading open-source Chat, Language, Image, Code, and Embedding models are available through the Together Inference API. For these models you pay just for what you use.

Serverless Endpoints


Prices are per 1 million tokens including input and output tokens for Chat, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.

  • LLAMA 3 AND LLAMA 3.1 LITE, TURBO, & REFERENCE MODELS

    • MODEL SIZE

      LITE

      TURBO

      REFERENCE

    • 8B

      LITE

      $0.10

      TURBO

      $0.18

      REFERENCE

      $0.20*

    • 70B

      LITE

      $0.54

      TURBO

      $0.88

      REFERENCE

      $0.90*

    • 405B

      TURBO

      $5.00

  • CHat, language, code and moderation models

    • Model size

      price 1M tokens

    • Up to 4B

      price 1M tokens

      $0.10

    • 4.1B - 8B

      price 1M tokens

      $0.20

    • 8.1B - 21B

      price 1M tokens

      $0.30

    • 21.1B - 41B

      price 1M tokens

      $0.80

    • 41.1B - 80B

      price 1M tokens

      $0.90

    • 80.1B - 110B

      price 1M tokens

      $1.80

  • Mixture-of-experts

    • Model size

      price 1M tokens

    • Up to 56B total parameters

      price 1M tokens

      $0.60

    • 56.1B - 176B total parameters

      price 1M tokens

      $1.20

    • 176.1B - 480B total parameters

      price 1M tokens

      $2.40

  • EMbeddings models

    • Model size

      price 1M tokens

    • Up to 150M

      price 1M tokens

      $0.008

    • 151M - 350M

      price 1M tokens

      $0.016

  • Image models

    • Image Size

      25 steps

      50 steps

      75 steps

      100 steps

    • 512X512

      25 steps

      $0.001

      50 steps

      $0.002

      75 steps

      $0.0035

      100 steps

      $0.005

    • 1024X1024

      25 steps

      $0.01

      50 steps

      $0.02

      75 steps

      $0.035

      100 steps

      $0.05

  • GENOMIC MODELS

    • Model size

      price 1M tokens

    • 4.1B - 8B

      price 1M tokens

      $2.00

Dedicated endpoints

When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Together Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.

  • Your fine-tuned models

    • hardware type

      price per MINUTE HOSTed

    • 1x RTX-6000 48GB

      $0.034

    • 1x L40 48GB

      $0.034

    • 1x L40S 48GB

      $0.048

    • 1x A100 PCIe 80GB

      $0.068

    • 1x A100 SXM 40GB

      $0.068

    • 1x A100 SXM 80GB

      $0.085

    • 1x H100 80GB

      $0.128

Interested in a dedicated endpoint for your own model?

Fine-tuning pricing

Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.

  • Download checkpoints and final model weights.

  • View job status and logs through CLI or Playgrounds.

  • Deploy a model instantly once it’s fine-tuned.

Try the interactive calculator

Together GPU Clusters Pricing

Together Compute provides private, state of the art clusters with H100 and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.

  • haRDWARE TYPES AVAILABLE

    NETWORKING

  • A100 PCIe 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet

  • A100 SXM 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available

  • H100 80GB

    price 1k tokens

    3.2 Tbps Infiniband