This website uses cookies to anonymously analyze website traffic using Google Analytics.

Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Price 1M tokens

Batch API price

Model

Input

Output

Llama 4 Maverick

Llama

$0.27

$0.85

Llama 4 Scout

Llama

$0.18

$0.59

Llama 3.3 70B Instruct-Turbo

Llama

$0.88

$0.88

Llama 3.2 3B Instruct Turbo

Llama

$0.06

$0.06

Llama 3.1 405B Instruct Turbo

Llama

$3.50

$3.50

Llama 3.1 70B Instruct Turbo

Llama

$0.88

$0.88

Llama 3.1 8B Instruct Turbo

Llama

$0.18

$0.18

Llama 3 8B Instruct Lite

Llama

$0.10

$0.10

Llama 3 70B Instruct Reference

Llama

$0.88

$0.88

Llama 3 70B Instruct Turbo

Llama

$0.88

$0.88

LLaMA-2

Llama

$0.90

$0.90

DeepSeek-R1

DeepSeek

$3.00

$7.00

DeepSeek R1 Distilled Qwen 14B

DeepSeek

$0.18

$0.18

DeepSeek R1 Distilled Llama 70B

DeepSeek

$2.00

$2.00

DeepSeek R1-0528-tput

DeepSeek

$0.55

$2.19

DeepSeek-V3-1

DeepSeek

$0.60

$1.70

DeepSeek-V3

DeepSeek

$1.25

$1.25

gpt-oss-120B

OpenAI

$0.15

$0.60

gpt-oss-20B

OpenAI

$0.05

$0.20

Qwen3-Coder 480B A35B Instruct

Qwen

$2.00

$2.00

Qwen3 235B A22B Instruct 2507 FP8

Qwen

$0.20

$0.60

Qwen3 235B A22B Thinking 2507 FP8

Qwen

$0.65

$3.00

Qwen3 235B A22B FP8 Throughput

Qwen

$0.20

$0.60

Qwen 2.5 72B

Qwen

$1.20

$1.20

Qwen2.5-VL 72B Instruct

Qwen

$1.95

$8

Qwen2.5 Coder 32B Instruct

Qwen

$0.80

$0.80

Qwen2.5 7B Instruct Turbo

Qwen

$0.30

$0.30

Qwen QwQ-32B

Qwen

$1.20

$1.20

GLM-4.5-Air

GLM

$0.20

$1.10

Kimi K2 Instruct

Kimi

$1.00

$3.00

Mistral (7B) Instruct v0.2

Mistral

$0.20

$0.20

Mistral Instruct

Mistral

$0.20

$0.20

Mistral Small 3

Mistral

$0.80

$0.80

Mixtral 8x7B Instruct v0.1

Mistral

$0.60

$0.60

Looks like there are no models for this filter.

All Other Models

Price per 1M tokens

Batch API price

Model

Input

Output

Marin 8B Instruct

Marin Community

$0.18

$0.18

Arcee AI AFM-4.5B

Arcee

$0.10

$0.40

Arcee AI Coder-Large

Arcee

$0.50

$0.80

Arcee AI Maestro

Arcee

$0.90

$3.30

Arcee AI Virtuoso-Large

Arcee

$0.75

$1.20

Cogito v2 preview - 109B MoE

Cogito

$0.18

$0.59

Cogito v2 preview - 405B

Cogito

$3.50

$3.50

Cogito v2 preview - 671B MoE

Cogito

$1.25

$1.25

Cogito v2 preview - 70B

Cogito

$0.88

$0.88

Refuel LLM-2

Refuel

$0.60

$0.60

Refuel LLM-2 Small

Refuel

$0.20

$0.20

Typhoon 2 70B Instruct

Refuel

$0.88

$0.88

gemma-3n-E4B-it

Google

$0.02

$0.04

Looks like there are no models for this filter.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Model

Input

Images Per $1 (1MP)

Default steps

FLUX.1 Krea [dev]

$0.025

40

28

FLUX.1 Kontext [dev]

$0.025

40

28

FLUX.1 Kontext [pro]

$0.04

25

28

FLUX.1 Kontext [max]

$0.08

12.5

28

FLUX1.1 [pro]

$0.04

25

-

FLUX.1 [dev]

$0.025

40

28

FLUX.1 [pro]

$0.05

20

28

FLUX.1 [schnell]

$0.0027

370

4

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model

Price

Cartesia Sonic-2

$65.00

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model

Price

Whisper Large v3

$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price per MP

Rerank Models

Improve search relevance with reranking models.

Price per MP

Moderation Models

Improve search relevance with reranking models.

Price per hour

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:
  • Guaranteed performance (no sharing)

  • Support for custom models

  • Autoscaling & traffic spike handling

  • Per-minute billing

Ideal for workloads > 130,000 tokens/minute

Hardware Type

Price/Minute

Price/Hour

1x H200 141GB

$0.083

$4.99

1x H100 80GB

$0.056

$3.36

1x A100 SXM 80GB

$0.043

$2.56

1x A100 SXM 40GB

$0.040

$2.40

1x A100 PCIe 80GB

$0.040

$2.40

1x L40S 48GB

$0.035

$2.10

Fine-tuning

Customize open-source models with your data.

Supervised Fine-Tuning

Direct Preference Optimization

Size

LoRA

Full Fine-Tuning

LoRA

Full Fine-Tuning

Up to 16B

$0.48

$0.54

$1.20

$1.35

17B-69B

$1.50

$1.65

$3.75

$4.12

70-100B

$2.90

$3.20

$7.25

$8.00

gpt-oss-120B**

$5.00

N/A

$12.50

N/A

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

Price

Per vCPU

$0.0446

Per GiB RAM

$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

Price

Session (60 minutes)

$0.03

GPU Cloud

Instant GPU Clusters
  • HARDWARE TYPES

    pricING

  • HARDWARE TYPES

    NVIDIA GB200

    Pricing

    Coming soon

  • HARDWARE TYPES

    NVIDIA B200

    Pricing

    Coming soon

  • HARDWARE TYPES

    NVIDIA H200

    Pricing

    On-demand hourly: $3.79 GPU/hr

    1 to 6 days: $3.45 GPU/hr

    Up to 3 months: $3.15 GPU/hr

  • HARDWARE TYPES

    NVIDIA H100

    Pricing

    On-demand hourly: $3.19 GPU/hr

    1 to 6 days: $2.85 GPU/hr

    Up to 3 months: $2.65 GPU/hr

  • STORAGE TYPES

    STORAGE SIZE

    pricING

  • Shared Storage

    STORAGE SIZE

    up to 1PB

    pricING

    $0.16 Gib/mo

GPU Calculator

Storage Calculator

Reserved GPU Clusters

State-of-the-art clusters with NVIDIA Blackwell and Hopper GPUs.
3 month minimum commitment. 64 → 1K+ GPUs.

Price per hour

Hardware

GPU Memory

Price

NVIDIA GB200 NVL72

384GB HBM3e

NVIDIA B200

192GB HBM3e

NVIDIA H200

141GB HBM3e

Starting at $2.09

NVIDIA H100

80GB HBM2e

Starting at $1.75

NVIDIA A100

80GB HBM2e

Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters.
1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale
Talk to our team of experts to get a custom quote for your AI Factory project plan.
Contact Sales

Interested in a custom large-scale deployment?