This website uses cookies to anonymously analyze website traffic using Google Analytics.

Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Price 1M tokens

Batch API price

Model

Input

Output

Llama 4 Maverick

Llama

$0.27

$0.85

Llama 4 Scout

Llama

$0.18

$0.59

Llama 3.3 70B Instruct-Turbo

Llama

$0.88

$0.88

Llama 3.2 3B Instruct Turbo

Llama

$0.06

$0.06

Llama 3.1 405B Instruct Turbo

Llama

$3.50

$3.50

Llama 3.1 70B Instruct Turbo

Llama

$0.88

$0.88

Llama 3.1 8B Instruct Turbo

Llama

$0.18

$0.18

Llama 3 8B Instruct Lite

Llama

$0.10

$0.10

Llama 3 70B Instruct Reference

Llama

$0.88

$0.88

Llama 3 70B Instruct Turbo

Llama

$0.88

$0.88

LLaMA-2

Llama

$0.90

$0.90

DeepSeek-R1

DeepSeek

$3.00

$7.00

DeepSeek R1 Distilled Qwen 14B

DeepSeek

$0.18

$0.18

DeepSeek R1 Distilled Llama 70B

DeepSeek

$2.00

$2.00

DeepSeek R1-0528-tput

DeepSeek

$0.55

$2.19

DeepSeek-V3-1

DeepSeek

$0.60

$1.70

DeepSeek-V3

DeepSeek

$1.25

$1.25

gpt-oss-120B

OpenAI

$0.15

$0.60

gpt-oss-20B

OpenAI

$0.05

$0.20

Qwen3-Coder 480B A35B Instruct

Qwen

$2.00

$2.00

Qwen3 235B A22B Instruct 2507 FP8

Qwen

$0.20

$0.60

Qwen3 235B A22B Thinking 2507 FP8

Qwen

$0.65

$3.00

Qwen3 235B A22B FP8 Throughput

Qwen

$0.20

$0.60

Qwen 2.5 72B

Qwen

$1.20

$1.20

Qwen2.5-VL 72B Instruct

Qwen

$1.95

$8

Qwen2.5 Coder 32B Instruct

Qwen

$0.80

$0.80

Qwen2.5 7B Instruct Turbo

Qwen

$0.30

$0.30

Qwen QwQ-32B

Qwen

$1.20

$1.20

GLM-4.5-Air

GLM

$0.20

$1.10

Kimi K2 Instruct

Kimi

$1.00

$3.00

Kimi K2 0905

Kimi

$1.00

$3.00

Mistral (7B) Instruct v0.2

Mistral

$0.20

$0.20

Mistral Instruct

Mistral

$0.20

$0.20

Mistral Small 3

Mistral

$0.80

$0.80

Mixtral 8x7B Instruct v0.1

Mistral

$0.60

$0.60

Marin 8B Instruct

Other

$0.18

$0.18

Arcee AI AFM-4.5B

Other

$0.10

$0.40

Arcee AI Coder-Large

Other

$0.50

$0.80

Arcee AI Maestro

Other

$0.90

$3.30

Arcee AI Virtuoso-Large

Other

$0.75

$1.20

Cogito v2 preview - 109B MoE

Other

$0.18

$0.59

Cogito v2 preview - 405B

Other

$3.50

$3.50

Cogito v2 preview - 671B MoE

Other

$1.25

$1.25

Cogito v2 preview - 70B

Other

$0.88

$0.88

Refuel LLM-2

Other

$0.60

$0.60

Refuel LLM-2 Small

Other

$0.20

$0.20

Typhoon 2 70B Instruct

Other

$0.88

$0.88

gemma-3n-E4B-it

Other

$0.02

$0.04

Looks like there are no models for this filter.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Model

Input

Images Per $1 (1MP)

Default steps

FLUX.1 Krea [dev]

$0.025

40

28

FLUX.1 Kontext [dev]

$0.025

40

28

FLUX.1 Kontext [pro]

$0.04

25

28

FLUX.1 Kontext [max]

$0.08

12.5

28

FLUX1.1 [pro]

$0.04

25

-

FLUX.1 [dev]

$0.025

40

28

FLUX.1 [pro]

$0.05

20

28

FLUX.1 [schnell]

$0.0027

370

4

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model

Price

Cartesia Sonic-2

$65.00

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model

Price

Whisper Large v3

$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price 1M tokens

Rerank Models

Improve search relevance with reranking models.

Price 1M tokens

Moderation Models

Filter and classify content for safety and compliance.

Price 1M tokens

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:
  • Guaranteed performance (no sharing)

  • Support for custom models

  • Autoscaling & traffic spike handling

Hardware Type

Price/Hour

1x H200 141GB

$4.99

1x H100 80GB

$3.36

1x A100 SXM 80GB

$2.56

1x A100 SXM 40GB

$2.40

1x A100 PCIe 80GB

$2.40

1x L40S 48GB

$2.10

Fine-tuning

Standard pricing

Supervised Fine-Tuning

Direct Preference Optimization

Size

LoRA

Full Fine-Tuning

LoRA

Full Fine-Tuning

Up to 16B

$0.48

$0.54

$1.20

$1.35

17B-69B

$1.50

$1.65

$3.75

$4.12

70-100B

$2.90

$3.20

$7.25

$8.00

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Specialized pricing

Fine-tuning for the models below incurs minimum charges and is limited to LoRA fine-tuning.

Model

Supervised Fine-Tuning (LoRA)

Direct Preference Optimization (LoRA)

Minimum charge

gpt-oss-120B

$5.00

$12.50

$6.00

Llama 4 Scout

Llama 4 Scout Instruct

$3.00

$7.50

$6.00

Llama 4 Maverick

Llama 4 Maverick Instruct

$8.00

$20.00

$16.00

DeepSeek-R1

DeepSeek-R1-0528

DeepSeek-V3

DeepSeek-V3-0324

DeepSeek-V3.1

DeepSeek-V3.1-Base

$10.00

$25.00

$20.00

Qwen3-Coder-480B-A35B-Instruct

$9.00

$22.50

$18.00

Qwen3-235B-A22B

Qwen3-235B-A22B-Instruct-2507

$6.00

$15.00

No min price

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

Price

Per vCPU

$0.0446

Per GiB RAM

$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

Price

Session (60 minutes)

$0.03

GPU Cloud

All Together Instant and Reserved Clusters feature:
  • Choice of Kubernetes or Slurm on Kubernetes

  • Free network ingress and egress

  • NVIDIA InfiniBand and NVLink networking

All Together Instant and Reserved Clusters feature: choice of Kubernetes or Slurm on Kubernetes, free network ingress and egress, NVIDIA InfiniBand, and NVLink networking.

Instant Clusters

Ready to use, self-service GPUs.

Price per hour per GPU

Hardware

1 Week - 3 Months

1 - 6 Days

Hourly

NVIDIA HGX H100 Inference

$1.76

$2.00

$2.39

NVIDIA HGX H100 SXM

$2.20

$2.50

$2.99

NVIDIA HGX H200

$3.79

NVIDIA HGX B200

$5.50

Reserved Clusters

Dedicated capacity, with expert support.

Price per hour

Hardware

GPU Memory

Price

NVIDIA GB200 NVL72

384GB HBM3e

NVIDIA B200

192GB HBM3e

NVIDIA H200

141GB HBM3e

Starting at $2.09

NVIDIA H100

80GB HBM2e

Starting at $1.75

NVIDIA A100

80GB HBM2e

Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters.
1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale
Talk to our team of experts to get a custom quote for your AI Factory project plan.
Request a project plan
Storage

High-bandwidth, parallel filesystem colocated with your compute.

Item

Price

Unit

Shared Filesystem

$0.16

GiB/month

Interested in a custom large-scale deployment?