Need help choosing?

Our team can help you find the best fit for your needs.

Pricing

Pricing

Serverless Inference

Most teams start with serverless inference and move to dedicated endpoints at scale.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Price per 1M tokens

Batch API price

Model

Input

output

MiniMax M2.5

$0.30

$0.06 (cached)

$1.20

Kimi K2.5

$0.50

$2.80

GLM-5.1

$1.40

$4.40

Gemma 4 31B

$0.20

$0.50

MiniMax M2.7

$0.30

$0.06 (cached)

$1.20

gpt-oss-120B

$0.15

$0.60

LFM2 24B A2B

$0.03

$0.12

Qwen3.5-397B-A17B

$0.60

$3.60

GLM-5

$1.00

$3.20

Qwen3-Coder-Next

$0.50

$1.20

Qwen3.5 9B

$0.10

$0.15

DeepSeek-V3.1

$0.60

$1.70

Cogito v2.1 671B

$1.25

$1.25

Qwen3-Coder 480B A35B Instruct

$2.00

$2.00

Rnj-1 Instruct

$0.15

$0.15

Kimi K2 Instruct

$1.00

$3.00

DeepSeek-R1-0528

$3.00

$7.00

Llama 3.3 70B

$0.88

$0.88

Gemma 3n E4B Instruct

$0.06

$0.12

gpt-oss-20B

$0.05

$0.20

Qwen2.5 7B Instruct Turbo

$0.30

$0.30

Mistral (7B) Instruct v0.2

$0.20

$0.20

Llama 3 8B Instruct Lite

$0.10

$0.10

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per 1M tokens

Model

Input

output

Kimi K2.5

$0.50

$2.80

Gemma 4 31B

$0.20

$0.50

Qwen3.5 9B

$0.10

$0.15

Gemma 3n E4B Instruct

$0.06

$0.12

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per 1M Characters

Model

Price

Cartesia Sonic-3

$65.00

NVIDIA Parakeet TDT 0.6B v3

$0.0015

Orpheus TTS

$0.27

Kokoro-82M TTS

$10.00

Cartesia Sonic-2

$65.00

Price per audio minute

Batch API price

Model

Price

Whisper Large v3

$0.0015

Whisper Large v3 (Streaming)

$0.27

Price per 1M tokens

Model

Price

Multilingual e5 large instruct

$0.02

Price per 1M tokens

Model

Price

Price per 1M tokens

Model

Price

VirtueGuard Text Lite

$0.20

Llama Guard 4 12B

$0.20

Dedicated Inference

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

  • Guaranteed performance (no sharing)

  • Support for custom models

  • Autoscaling & traffic spike handling

Hardware Type

Price/hour

1x H100 80GB

$3.99

1x H200 141GB

$5.49

1x B200 180GB

$9.95

GPU Clusters

On-demand

Pay as you go GPU capacity on an hourly basis.

Hardware

Hourly

NVIDIA HGX H100

$3.49

NVIDIA HGX H200

$4.19

NVIDIA HGX B200

$7.49

Reserved

Reserve GPU capacity for a duration above 6 days.

Hardware

1 Week - 1 Month

2 - 3 Months

4 - 6 Months

6+ Months

NVIDIA HGX H100

$2.99

$2.69

$2.55

NVIDIA HGX H200

$3.49

$3.19

$2.89

NVIDIA HGX B200

$7.15

$6.75

$6.39

NVIDIA GB200 NVL72

NVIDIA GB300 NVL72

Sandbox

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Compute costs

Price/Hour

Per vCPU

$0.0446

Per GiB RAM

$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Duration?

Price/Session

Session (60 minutes)

$0.03

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Compute costs

Price

Unit

Shared Filesystem

$0.16

GiB/month

Fine-Tuning

Train open-source models for
real production use.

Per 1M tokens

Supervised Fine-Tuning

Direct Preference Optimization

Size

LoRA

Full Fine-Tuning

LoRA

Full Fine-Tuning

Up to 16B

$0.48

$0.54

$1.20

$1.35

17B-69B

$1.50

$1.65

$3.75

$4.12

70-100B

$2.90

$3.20

$7.25

$8.00

Size

Supervised
Fine-Tuning (LoRA)

Direct Preference
Optimization (LoRA)

Minimum charge

DeepSeek-R1

DeepSeek-R1-0528

DeepSeek-V3

DeepSeek-V3-0324

DeepSeek-V3.1

DeepSeek-V3.1-Base

$10.00

$25.00

$20.00

GLM-4.6

GLM-4.7

$9.00

$22.50

$27.00

GLM-5

GLM-5.1

$40

$100

$60

gpt-oss-120B

$5.00

$12.50

$6.00

Kimi K2 Thinking

Kimi K2 Instruct-0905

Kimi K2 Instruct

Kimi K2 Base

$15.00

$37.50

$60.00

Llama 4 Maverick

Llama 4 Maverick Instruct

$8.00

$20.00

$16.00

Llama 4 Scout

Llama 4 Scout

$3.00

$7.50

$6.00

Qwen3-Coder-480B-A35B-Instruct

$9.00

$22.50

$18.00

Qwen3-235B-A22B

Qwen3-235B-A22B-Instruct-2507

$6.00

$15.00

No min. price

Qwen3.5-122B-A10B

$6.00

$15.00

$10.00

Qwen3.5-397B-A17B

$8.00

$20.00

$22.00

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

No matching models

Trusted by