Together AI - Pricing

Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Price 1M tokens

Batch API price

Model	Input	Output
Llama 4 Maverick Llama	$0.27	$0.85
Llama 4 Scout Llama	$0.18	$0.59
Llama 3.3 70B Instruct-Turbo Llama	$0.88	$0.88
Llama 3.2 3B Instruct Turbo Llama	$0.06	$0.06
Llama 3.1 405B Instruct Turbo Llama	$3.50	$3.50
Llama 3.1 70B Instruct Turbo Llama	$0.88	$0.88
Llama 3.1 8B Instruct Turbo Llama	$0.18	$0.18
Llama 3 8B Instruct Lite Llama	$0.10	$0.10
Llama 3 70B Instruct Reference Llama	$0.88	$0.88
Llama 3 70B Instruct Turbo Llama	$0.88	$0.88
LLaMA-2 Llama	$0.90	$0.90
DeepSeek-R1 DeepSeek	$3.00	$7.00
DeepSeek R1 Distilled Qwen 14B DeepSeek	$0.18	$0.18
DeepSeek R1 Distilled Llama 70B DeepSeek	$2.00	$2.00
DeepSeek R1-0528-tput DeepSeek	$0.55	$2.19
DeepSeek-V3-1 DeepSeek	$0.60	$1.70
DeepSeek-V3 DeepSeek	$1.25	$1.25
gpt-oss-120B OpenAI	$0.15	$0.60
gpt-oss-20B OpenAI	$0.05	$0.20
Qwen3-Coder 480B A35B Instruct Qwen	$2.00	$2.00
Qwen3 235B A22B Instruct 2507 FP8 Qwen	$0.20	$0.60
Qwen3 235B A22B Thinking 2507 FP8 Qwen	$0.65	$3.00
Qwen3 235B A22B FP8 Throughput Qwen	$0.20	$0.60
Qwen 2.5 72B Qwen	$1.20	$1.20
Qwen2.5-VL 72B Instruct Qwen	$1.95	$8
Qwen2.5 Coder 32B Instruct Qwen	$0.80	$0.80
Qwen2.5 7B Instruct Turbo Qwen	$0.30	$0.30
Qwen QwQ-32B Qwen	$1.20	$1.20
GLM-4.5-Air GLM	$0.20	$1.10
Kimi K2 Instruct Kimi	$1.00	$3.00
Mistral (7B) Instruct v0.2 Mistral	$0.20	$0.20
Mistral Instruct Mistral	$0.20	$0.20
Mistral Small 3 Mistral	$0.80	$0.80
Mixtral 8x7B Instruct v0.1 Mistral	$0.60	$0.60

Looks like there are no models for this filter.

All Other Models

Price per 1M tokens

Batch API price

Model	Input	Output
Marin 8B Instruct Marin Community	$0.18	$0.18
Arcee AI AFM-4.5B Arcee	$0.10	$0.40
Arcee AI Coder-Large Arcee	$0.50	$0.80
Arcee AI Maestro Arcee	$0.90	$3.30
Arcee AI Virtuoso-Large Arcee	$0.75	$1.20
Cogito v2 preview - 109B MoE Cogito	$0.18	$0.59
Cogito v2 preview - 405B Cogito	$3.50	$3.50
Cogito v2 preview - 671B MoE Cogito	$1.25	$1.25
Cogito v2 preview - 70B Cogito	$0.88	$0.88
Refuel LLM-2 Refuel	$0.60	$0.60
Refuel LLM-2 Small Refuel	$0.20	$0.20
Typhoon 2 70B Instruct Refuel	$0.88	$0.88
gemma-3n-E4B-it Google	$0.02	$0.04

Looks like there are no models for this filter.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Model	Input	Images Per $1 (1MP)	Default steps
FLUX.1 Krea [dev]	$0.025	40	28
FLUX.1 Kontext [dev]	$0.025	40	28
FLUX.1 Kontext [pro]	$0.04	25	28
FLUX.1 Kontext [max]	$0.08	12.5	28
FLUX1.1 [pro]	$0.04	25	-
FLUX.1 [dev]	$0.025	40	28
FLUX.1 [pro]	$0.05	20	28
FLUX.1 [schnell]	$0.0027	370	4

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model	Price
Cartesia Sonic-2	$65.00

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model	Price
Whisper Large v3	$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price per MP

Model	Price
BGE-Base-EN v1.5	$0.01
BGE-Large-EN v1.5	$0.02
GTE ModernBERT base	$0.08
Multilingual e5 large instruct	$0.02
M2-BERT 80M 32K Retrieval	$0.01

Rerank Models

Improve search relevance with reranking models.

Price per MP

Model	Price
Mxbai Rerank Large V2	$0.10
Salesforce Llama Rank V1 (8B)	$0.10

Moderation Models

Improve search relevance with reranking models.

Price per hour

Model	Price
VirtueGuard Text Lite	$0.20
Llama Guard 4 12B	$0.20
Llama Guard 3 11B Vision Turbo	$0.18
Llama Guard 3 8B	$0.20
Llama Guard 2 8B	$0.20

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling
Per-minute billing

Ideal for workloads > 130,000 tokens/minute

Hardware Type	Price/Minute	Price/Hour
1x H200 141GB	$0.083	$4.99
1x H100 80GB	$0.056	$3.36
1x A100 SXM 80GB	$0.043	$2.56
1x A100 SXM 40GB	$0.040	$2.40
1x A100 PCIe 80GB	$0.040	$2.40
1x L40S 48GB	$0.035	$2.10

Fine-tuning

Customize open-source models with your data.

	Supervised Fine-Tuning		Direct Preference Optimization
Size	LoRA	Full Fine-Tuning	LoRA	Full Fine-Tuning
Up to 16B	$0.48	$0.54	$1.20	$1.35
17B-69B	$1.50	$1.65	$3.75	$4.12
70-100B	$2.90	$3.20	$7.25	$8.00
gpt-oss-120B**	$5.00	N/A	$12.50	N/A

Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

	Price
Per vCPU	$0.0446
Per GiB RAM	$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

	Price
Session (60 minutes)	$0.03

GPU Cloud

Instant GPU Clusters

HARDWARE TYPES
pricING
HARDWARE TYPES
NVIDIA GB200
Pricing
Coming soon
HARDWARE TYPES
NVIDIA B200
Pricing
Coming soon
HARDWARE TYPES
NVIDIA H200
Pricing
On-demand hourly: $3.79 GPU/hr
1 to 6 days: $3.45 GPU/hr
Up to 3 months: $3.15 GPU/hr
HARDWARE TYPES
NVIDIA H100
Pricing
On-demand hourly: $3.19 GPU/hr
1 to 6 days: $2.85 GPU/hr
Up to 3 months: $2.65 GPU/hr

STORAGE TYPES
STORAGE SIZE
pricING
Shared Storage
STORAGE SIZE
up to 1PB
pricING
$0.16 Gib/mo

Explore all clusters

GPU Calculator

Type of GPU:

Number of GPUs:

DURATION (IN HOURS):

Estimated GPU Cost:

Storage Calculator

Storage size (PER TB/MONTH):

DURATION (IN DAYS):

Estimated Storage Cost:

Estimated TOTAL Cost:

Reserved GPU Clusters

State-of-the-art clusters with NVIDIA Blackwell and Hopper GPUs. 3 month minimum commitment. 64 → 1K+ GPUs.

Price per hour

Hardware	GPU Memory	Price
NVIDIA GB200 NVL72	384GB HBM3e	Contact us
NVIDIA B200	192GB HBM3e	Contact us
NVIDIA H200	141GB HBM3e	Starting at $2.09
NVIDIA H100	80GB HBM2e	Starting at $1.75
NVIDIA A100	80GB HBM2e	Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters. 1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale

Talk to our team of experts to get a custom quote for your AI Factory project plan.

Contact Sales

Serverless Inference

Code Execution

GPU Cloud

HARDWARE TYPES

pricING

HARDWARE TYPES

Pricing

HARDWARE TYPES

Pricing

HARDWARE TYPES

Pricing

HARDWARE TYPES

Pricing

STORAGE TYPES

STORAGE SIZE

pricING

STORAGE SIZE

pricING

GPU Calculator

Storage Calculator

Interested in a custom large-scale deployment?

Subscribe to newsletter