Serverless Inference
State-of-the-art language and multimodal models.
Price 1M tokens
Batch API price
Generate stunning visuals with the latest and greatest image models.
Price per MP
Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →
Speech synthesis and processing models.
Price per 1M Characters
Models for automatic speech recognition (ASR) and speech translation.
Price per audio minute
Batch API price
Vector embeddings for semantic search and RAG.
Price 1M tokens
Improve search relevance with reranking models.
Price 1M tokens
Filter and classify content for safety and compliance.
Price 1M tokens
Deploy models on custom hardware with guaranteed performance and full control.
Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling
Fine-tuning
Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).
Fine-tuning for the models below incurs minimum charges and is limited to LoRA fine-tuning.
Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).
Code Execution
Customize a deployment of VM sandboxes for large development environments.
Price per hour
Execute LLM-generated code securely using our API.
Price per session
GPU Cloud
All Together Instant and Reserved Clusters feature: choice of Kubernetes or Slurm on Kubernetes, free network ingress and egress, NVIDIA InfiniBand, and NVLink networking.
Ready to use, self-service GPUs.
Price per hour per GPU
Dedicated capacity, with expert support.
Price per hour
Large-scale, custom-built private GPU clusters. 1K → 10K → 100K+ NVIDIA GPUs.
High-bandwidth, parallel filesystem colocated with your compute.