Introducing Together AI's new look →
For founders and builders defining the AI-native era. Register now →
🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →
⚡ Together Instant Clusters: self-service NVIDIA GPUs, now generally available →
📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →
🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →
Inference
Serverless Inference
High-performance inference as APIs
Batch Inference
Inference for batch workloads
Dedicated Model Inference
Inference on custom hardware
Dedicated Container Inference
Inference for custom models
Model library
Explore the top open-source models
Compute
Accelerated Compute
Scale with research-optimized GPUs
Sandbox
Build development environments for AI
Managed Storage
Store model weights & data securely
GB300
GB200
B200
H200
Model Shaping
Fine-Tuning
Shape models with your data
Evaluations
Measure model quality
Fine-tune top open-source models
Research
Systems research for production AI
Research blog
All our research publications
Featured publications
FlashAttention
ATLAS
Kernel Collection
ThunderKittens
DSGym
Developers
Documentation
Technical docs for Together AI
Demos
Our open-source demo apps
Cookbooks
Practical implementation guides
Model Library
Playground
Together Chat
Which LLM to use
Company
Solutions
Customer stories
Testimonials from AI Natives
Startup accelerator
Build and scale your startup
Customer support
Find answers to your questions
About
Get to know us
Careers
Join our mission
Resources
Blog
Our latest news & blog posts
Events
Explore our events calendar
Pricing
Together
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov
Tianle Cai*, Yuhong Li*, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution)
No search results
Try expanding your search or changing the filters.