Pricing

INFERENCE

Serverless Inference
Dedicated Inference

Compute

GPU Clusters
Sandbox
Managed Storage

Model Shaping

Fine-Tuning

Need help choosing?

Our team can help you find the best fit for your needs.

Pricing

Serverless Inference

Most teams start with serverless inference and move to dedicated endpoints at scale.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Price per 1M tokens

Batch API price

Model	Input	output
MiniMax M2.5	$0.30 $0.06 (cached)	$1.20
Kimi K2.5	$0.50	$2.80
GLM-5.1	$1.40	$4.40
Gemma 4 31B	$0.20	$0.50
MiniMax M2.7	$0.30 $0.06 (cached)	$1.20
gpt-oss-120B	$0.15	$0.60
LFM2 24B A2B	$0.03	$0.12
Qwen3.5-397B-A17B	$0.60	$3.60
GLM-5	$1.00	$3.20
Qwen3-Coder-Next	$0.50	$1.20
Qwen3.5 9B	$0.10	$0.15
DeepSeek-V3.1	$0.60	$1.70
Cogito v2.1 671B	$1.25	$1.25
Qwen3-Coder 480B A35B Instruct	$2.00	$2.00
Rnj-1 Instruct	$0.15	$0.15
Kimi K2 Instruct	$1.00	$3.00
DeepSeek-R1-0528	$3.00	$7.00
Llama 3.3 70B	$0.88	$0.88
Gemma 3n E4B Instruct	$0.06	$0.12
gpt-oss-20B	$0.05	$0.20
Qwen2.5 7B Instruct Turbo	$0.30	$0.30
Mistral (7B) Instruct v0.2	$0.20	$0.20
Llama 3 8B Instruct Lite	$0.10	$0.10

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per 1M tokens

Model	Input	output
Kimi K2.5	$0.50	$2.80
Gemma 4 31B	$0.20	$0.50
Qwen3.5 9B	$0.10	$0.15
Gemma 3n E4B Instruct	$0.06	$0.12

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Model	Price per mp	Price per iMAGE	Default steps
Wan 2.6 Image	-	$0.03	-
GPT Image 1.5	-	$0.034	-
FLUX.2 [pro]	-	$0.03	-
Nano Banana Pro (Gemini 3 Pro Image)	-	$0.134	-
Seedream 5.0 Lite	-	$0.04	-
Gemini 3.1 Flash Image (Nano Banana 2)	-	$0.05	-
Qwen Image 2.0 Pro	-	$0.08	-
Qwen Image 2.0	-	$0.04	-
FLUX.2 [dev]	-	$0.0154	-
FLUX.2 [flex]	-	$0.03	-
FLUX.2 [max]	$0.070	-	50
FLUX.1 Krea [dev]	$0.025	-	28
FLUX.1 Kontext [pro]	$0.04	-	28
FLUX1.1 [pro]	$0.04	-	-
FLUX.1 Kontext [max]	$0.08	-	28
FLUX.1 [schnell]	$0.0027	-	4
Stable Diffusion 3	$0.0019	-	-
Dreamshaper	$0.0006	-	-
SD XL	$0.0019	-	-
HiDream-I1-Fast	$0.0032	-	-
Ideogram 3.0	$0.06	-	-
HiDream-I1-Dev	$0.0045	-	-
HiDream-I1-Full	$0.009	-	-
Juggernaut Lightning Flux	$0.0017	-	-
Qwen Image	$0.0058	-	-
Google Imagen 4.0 Fast	$0.02	-	-
ByteDance Seedream 4.0	$0.03	-	-
Google Imagen 4.0 Preview	$0.04	-	-
Gemini Flash Image 2.5 (Nano Banana)	-	$0.039	-
Google Imagen 4.0 Ultra	$0.06	-	-
ByteDance Seedream 3.0	$0.018	-	-
Juggernaut Pro Flux	$0.0049	-	-

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Price per 1M Characters

Model	Price
Cartesia Sonic-3	$65.00
NVIDIA Parakeet TDT 0.6B v3	$0.0015
Orpheus TTS	$0.27
Kokoro-82M TTS	$10.00
Cartesia Sonic-2	$65.00

Price per video

Model	Price
Google Veo 3.0	$1.60
Kling 1.6 Pro	$0.32
Kling 1.6 Standard	$0.19
Kling 2.0 Master	$0.92
Kling 2.1 Master	$0.92
Kling 2.1 Pro	$0.32
Kling 2.1 Standard	$0.18
Vidu 2.0	$0.28
Vidu Q1	$0.22
Wan 2.2 I2V	$0.31
Wan 2.2 T2V	$0.66
Sora 2	$0.80
PixVerse v5	$0.30
ByteDance Seedance 1.0 Lite	$0.14
ByteDance Seedance 1.0 Pro	$0.57
Google Veo 3.0 Fast + Audio	$1.20
Google Veo 3.0 Fast	$0.80
Google Veo 3.0 + Audio	$3.20
Google Veo 2.0	$2.50
MiniMax Hailuo 02	$0.49
MiniMax 01 Director	$0.28

Price per audio minute

Batch API price

Model	Price
Whisper Large v3	$0.0015
Whisper Large v3 (Streaming)	$0.27

Price per 1M tokens

Model	Price
Multilingual e5 large instruct	$0.02

Price per 1M tokens

Model	Price

Price per 1M tokens

Model	Price
VirtueGuard Text Lite	$0.20
Llama Guard 4 12B	$0.20

Dedicated Inference

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling

Hardware Type	Price/hour
1x H100 80GB	$3.99
1x H200 141GB	$5.49
1x B200 180GB	$9.95

GPU Clusters

On-demand

Pay as you go GPU capacity on an hourly basis.

Hardware	Hourly
NVIDIA HGX H100	$3.49
NVIDIA HGX H200	$4.19
NVIDIA HGX B200	$7.49

Reserved

Reserve GPU capacity for a duration above 6 days.

Hardware	1 Week - 1 Month	2 - 3 Months	4 - 6 Months	6+ Months
NVIDIA HGX H100	$2.99	$2.69	$2.55	Contact us
NVIDIA HGX H200	$3.49	$3.19	$2.89	Contact us
NVIDIA HGX B200	$7.15	$6.75	$6.39	Contact us
NVIDIA GB200 NVL72	Contact us	Contact us	Contact us	Contact us
NVIDIA GB300 NVL72	Contact us	Contact us	Contact us	Contact us

Sandbox

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Compute costs	Price/Hour
Per vCPU	$0.0446
Per GiB RAM	$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Duration?	Price/Session
Session (60 minutes)	$0.03

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Compute costs	Price	Unit
Shared Filesystem	$0.16	GiB/month

Fine-Tuning

Train open-source models for real production use.

Per 1M tokens

	Supervised Fine-Tuning		Direct Preference Optimization
Size	LoRA	Full Fine-Tuning	LoRA	Full Fine-Tuning
Up to 16B	$0.48	$0.54	$1.20	$1.35
17B-69B	$1.50	$1.65	$3.75	$4.12
70-100B	$2.90	$3.20	$7.25	$8.00

Size	Supervised Fine-Tuning (LoRA)	Direct Preference Optimization (LoRA)	Minimum charge
DeepSeek-R1 DeepSeek-R1-0528 DeepSeek-V3 DeepSeek-V3-0324 DeepSeek-V3.1 DeepSeek-V3.1-Base	$10.00	$25.00	$20.00
GLM-4.6 GLM-4.7	$9.00	$22.50	$27.00
GLM-5 GLM-5.1	$40	$100	$60
gpt-oss-120B	$5.00	$12.50	$6.00
Kimi K2 Thinking Kimi K2 Instruct-0905 Kimi K2 Instruct Kimi K2 Base	$15.00	$37.50	$60.00
Llama 4 Maverick Llama 4 Maverick Instruct	$8.00	$20.00	$16.00
Llama 4 Scout Llama 4 Scout	$3.00	$7.50	$6.00
Qwen3-Coder-480B-A35B-Instruct	$9.00	$22.50	$18.00
Qwen3-235B-A22B Qwen3-235B-A22B-Instruct-2507	$6.00	$15.00	No min. price
Qwen3.5-122B-A10B	$6.00	$15.00	$10.00
Qwen3.5-397B-A17B	$8.00	$20.00	$22.00

Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).

Serverless Inference

Dedicated Inference

GPU Clusters

Sandbox

Storage

Fine-Tuning

Trusted by

Start building on Together AI