# Together AI

> Together AI is the AI Native Cloud — a platform for running, fine-tuning, and deploying
> open-source and frontier AI models at scale. Developers use Together for serverless
> inference, dedicated endpoints, GPU clusters, fine-tuning, and code sandboxes via an
> OpenAI-compatible API.

## Getting Started
- [Quickstart](https://docs.together.ai/docs/quickstart): Get your first API call working in minutes
- [OpenAI Compatibility](https://docs.together.ai/docs/openai-api-compatibility): Drop-in replacement for OpenAI SDK
- [Pricing](https://www.together.ai/pricing): Per-token and per-compute pricing for all products
- [API Authentication](https://docs.together.ai/reference/authentication-1): How to authenticate API requests
- [Introduction](https://docs.together.ai/docs/introduction): Platform overview and capabilities
- [Integrations](https://docs.together.ai/docs/integrations): Connect Together AI with your existing tools
- [Multiple API Keys](https://docs.together.ai/docs/multiple-api-keys): Manage API keys across teams and projects

## Products
- [Serverless Inference](https://www.together.ai/serverless-inference): Pay-per-token access to 200+ open-source and frontier models
- [Dedicated Model Inference](https://www.together.ai/dedicated-model-inference): Reserved model capacity for production workloads
- [Dedicated Container Inference](https://www.together.ai/dedicated-container-inference): Deploy custom containers on dedicated infrastructure
- [Batch Inference](https://www.together.ai/batch-inference): Async large-scale inference at lower cost
- [Fine-tuning](https://www.together.ai/fine-tuning): Full fine-tuning and LoRA adapters for custom models
- [Accelerated Compute / GPU Clusters](https://www.together.ai/accelerated-compute): On-demand H100/H200/GB200/B300 clusters for training and inference
- [Sandbox](https://www.together.ai/sandbox): Secure code interpreter and execution environment
- [Managed Storage](https://www.together.ai/managed-storage): Persistent storage for AI workloads
- [Evaluations](https://www.together.ai/evaluations): Evaluate and benchmark model performance
- [Enterprise](https://www.together.ai/scale-enterprise): Private deployments, SLAs, and compliance features

## Models
- [Models Catalog](https://www.together.ai/models): Browse 200+ available models by category and capability
- [Llama 4 Scout](https://www.together.ai/deploy/llama-4-scout): Meta's latest multimodal frontier model
- [Llama 4 Maverick](https://www.together.ai/deploy/llama-4-maverick): Meta's high-performance reasoning model
- [DeepSeek R1](https://www.together.ai/deploy/deepseek-r1): Top-ranked open reasoning model
- [DeepSeek V3](https://www.together.ai/deploy/deepseek-v3): High-performance open frontier model
- [DeepSeek V3.1](https://www.together.ai/deploy/deepseek-v3-1): Updated DeepSeek V3 with improvements
- [Qwen3 Coder](https://www.together.ai/deploy/qwen3-coder): State-of-the-art open coding model
- [Qwen3 Instruct](https://www.together.ai/deploy/qwen3-instruct): Qwen3 instruction-tuned model family
- [Llama 3.3 70B](https://www.together.ai/deploy/llama-3-3-70b): Meta's Llama 3.3 70B instruct model
- [OpenAI Open Models](https://www.together.ai/deploy/openai-open-models): OpenAI's open-weight models on Together

## API Reference
- [Chat Completions](https://docs.together.ai/reference/chat-completions-1): OpenAI-compatible chat endpoint
- [Text Completions](https://docs.together.ai/reference/completions-1): Legacy completions endpoint
- [Image Generation](https://docs.together.ai/reference/post_images-generations): Text-to-image API
- [Embeddings](https://docs.together.ai/reference/embeddings-2): Vector embeddings for RAG
- [Rerank](https://docs.together.ai/reference/rerank-1): Rerank documents by relevance
- [Fine-tuning Jobs](https://docs.together.ai/reference/finetune): Create and manage fine-tuning runs
- [Fine-tuning Events](https://docs.together.ai/reference/get_fine-tunes-id-events): Monitor fine-tuning job progress
- [Fine-tuning Checkpoints](https://docs.together.ai/reference/get_fine-tunes-id-checkpoints): Download fine-tuning checkpoints
- [Dedicated Endpoints: Create](https://docs.together.ai/reference/createendpoint): Deploy a dedicated endpoint
- [Dedicated Endpoints: List](https://docs.together.ai/reference/listendpoints): List all dedicated endpoints
- [Dedicated Endpoints: Get](https://docs.together.ai/reference/getendpoint): Get endpoint details
- [Dedicated Endpoints: Update](https://docs.together.ai/reference/updateendpoint): Update endpoint configuration
- [Dedicated Endpoints: Delete](https://docs.together.ai/reference/deleteendpoint): Delete a dedicated endpoint
- [Batch API: Submit](https://docs.together.ai/reference/post_batches): Submit async inference batches
- [Batch API: Get](https://docs.together.ai/reference/get_batches-id): Get batch job status
- [Batch API: List](https://docs.together.ai/reference/get_batches): List all batch jobs
- [Files: Upload](https://docs.together.ai/reference/post_files-upload): Upload training or batch files
- [Files: List](https://docs.together.ai/reference/get_files): List uploaded files
- [Files: Get](https://docs.together.ai/reference/get_files-id): Get file metadata
- [Files: Download](https://docs.together.ai/reference/get_files-id-content): Download file content
- [Files: Delete](https://docs.together.ai/reference/delete_files-id): Delete a file
- [Models List](https://docs.together.ai/reference/models-1): List all available models
- [Audio: Speech](https://docs.together.ai/reference/audio-speech): Text-to-speech API
- [Audio: Transcriptions](https://docs.together.ai/reference/audio-transcriptions): Speech-to-text transcription
- [Audio: Translations](https://docs.together.ai/reference/audio-translations): Translate audio to English
- [Inference: Execute](https://docs.together.ai/reference/inference): Direct inference endpoint
- [Hardware: List](https://docs.together.ai/reference/listhardware): List available GPU hardware
- [Upload Model](https://docs.together.ai/reference/uploadmodel): Upload a custom model

## Key Documentation
- [Function Calling](https://docs.together.ai/docs/function-calling): Structured outputs and tool use
- [JSON Mode](https://docs.together.ai/docs/json-mode): Guaranteed JSON-formatted responses
- [Reasoning Models Guide](https://docs.together.ai/docs/reasoning-models-guide): Working with chain-of-thought models
- [RAG Quickstart](https://docs.together.ai/docs/quickstart-retrieval-augmented-generation-rag): Build retrieval-augmented apps
- [Rate Limits](https://docs.together.ai/docs/rate-limits): Understand and increase rate limits
- [Error Codes](https://docs.together.ai/docs/error-codes): API error reference
- [Inference Best Practices Guide](https://www.together.ai/guides/best-practices-to-accelerate-inference-for-large-scale-production-workloads): Optimizing large-scale production workloads
- [Serverless Models Reference](https://docs.together.ai/docs/serverless-models): Complete list of serverless-available models
- [Dedicated Inference Guide](https://docs.together.ai/docs/dedicated-inference): Configure and optimize dedicated deployments
- [Fine-tuning Quickstart](https://docs.together.ai/docs/fine-tuning-quickstart): Fine-tune your first model
- [Fine-tuning Data Preparation](https://docs.together.ai/docs/fine-tuning-data-preparation): Format training data correctly
- [Fine-tuning Pricing](https://docs.together.ai/docs/fine-tuning-pricing): Per-token pricing for training runs
- [Fine-tuning FAQs](https://docs.together.ai/docs/fine-tuning-faqs): Common questions about fine-tuning
- [Batch Inference Guide](https://docs.together.ai/docs/batch-inference): Submit and manage async inference jobs
- [Embeddings Overview](https://docs.together.ai/docs/embeddings-overview): Vector embeddings for semantic search
- [Embeddings + RAG](https://docs.together.ai/docs/embeddings-rag): Build RAG pipelines with embeddings
- [Rerank Overview](https://docs.together.ai/docs/rerank-overview): Improve retrieval quality with reranking
- [Images Overview](https://docs.together.ai/docs/images-overview): Text-to-image generation guide
- [Vision Overview](https://docs.together.ai/docs/vision-overview): Multimodal image understanding
- [Logprobs](https://docs.together.ai/docs/logprobs): Access token log probabilities
- [LoRA Inference](https://docs.together.ai/docs/lora-inference): Deploy fine-tuned LoRA adapters
- [Custom Models](https://docs.together.ai/docs/custom-models): Upload and run your own model weights
- [Deploying a Fine-tuned Model](https://docs.together.ai/docs/deploying-a-fine-tuned-model): Deploy a trained model to production
- [Deployment Options](https://docs.together.ai/docs/deployment-options): Overview of serverless vs dedicated
- [Deprecations](https://docs.together.ai/docs/deprecations): Model deprecation schedule
- [Instant Clusters](https://docs.together.ai/docs/instant-clusters): On-demand GPU cluster provisioning
- [Cluster Storage](https://docs.together.ai/docs/cluster-storage): Persistent storage for GPU clusters
- [Cluster User Management](https://docs.together.ai/docs/cluster-user-management): Manage team access to clusters
- [Speech to Text](https://docs.together.ai/docs/speech-to-text): Whisper-based transcription API
- [Text to Speech](https://docs.together.ai/docs/text-to-speech): TTS model API
- [OCR](https://docs.together.ai/docs/ocr): Optical character recognition guide
- [Language Overview](https://docs.together.ai/docs/language-overview): Chat and completion model overview
- [Chat Overview](https://docs.together.ai/docs/chat-overview): Chat model capabilities
- [Preference Fine-tuning](https://docs.together.ai/docs/preference-fine-tuning): DPO and RLHF fine-tuning
- [Deepseek FAQs](https://docs.together.ai/docs/deepseek-faqs): Common questions about DeepSeek models
- [Prompting DeepSeek R1](https://docs.together.ai/docs/prompting-deepseek-r1): Effective prompting for reasoning models
- [DeepSeek R1 Guide](https://docs.together.ai/docs/deepseek-r1): Deploy and use DeepSeek R1
- [Llama 4 Quickstart](https://docs.together.ai/docs/llama4-quickstart): Get started with Llama 4 models
- [Together Code Sandbox](https://docs.together.ai/docs/together-code-sandbox): Secure code execution environment
- [Workflows](https://docs.together.ai/docs/workflows): Multi-step AI pipeline orchestration
- [Mixture of Agents](https://docs.together.ai/docs/mixture-of-agents): Combine multiple models for better results
- [SLURM](https://docs.together.ai/docs/slurm): SLURM job scheduling on GPU clusters

## Framework & Integration Guides
- [LangGraph](https://docs.together.ai/docs/langgraph): Build stateful LLM agents with LangGraph
- [CrewAI](https://docs.together.ai/docs/crewai): Multi-agent frameworks with CrewAI
- [AutoGen](https://docs.together.ai/docs/autogen): Microsoft AutoGen integration
- [PydanticAI](https://docs.together.ai/docs/pydanticai): Type-safe LLM apps with PydanticAI
- [Agno](https://docs.together.ai/docs/agno): Build AI agents with Agno
- [DSPy](https://docs.together.ai/docs/dspy): Programmatic LLM pipelines with DSPy
- [Composio](https://docs.together.ai/docs/composio): Tool calling integrations via Composio
- [Vercel AI SDK](https://docs.together.ai/docs/using-together-with-vercels-ai-sdk): Next.js and Edge Runtime integration
- [Next.js Chat Quickstart](https://docs.together.ai/docs/nextjs-chat-quickstart): Build a chat app with Next.js
- [Cline](https://docs.together.ai/docs/cline): Cline IDE extension integration
- [Inference Web Interface](https://docs.together.ai/docs/inference-web-interface): Use the Together Playground UI
- [HuggingFace Integration](https://docs.together.ai/docs/quickstart-using-hugging-face-inference): Run HuggingFace models via Together
- [Together + LlamaRank](https://docs.together.ai/docs/together-and-llamarank): Salesforce LlamaRank integration

## Cookbook Quickstarts
- [AI Search Engine](https://docs.together.ai/docs/ai-search-engine): Build a semantic search engine
- [AI Tutor](https://docs.together.ai/docs/ai-tutor): Build an adaptive tutoring system
- [Data Analyst Agent](https://docs.together.ai/docs/data-analyst-agent): Build a data analysis agent
- [Building a RAG Workflow](https://docs.together.ai/docs/building-a-rag-workflow): End-to-end RAG implementation
- [OCR + RAG](https://docs.together.ai/docs/quickstart-flux-kontext): FLUX Kontext image editing quickstart
- [FLUX LoRA Quickstart](https://docs.together.ai/docs/quickstart-flux-lora): Fine-tune and run FLUX LoRA models
- [FLUX Tools Quickstart](https://docs.together.ai/docs/quickstart-flux-tools-models): Use FLUX Canny and Depth models
- [Building Coding Agents](https://docs.together.ai/docs/how-to-build-coding-agents): Practical guide to AI coding agents
- [NotebookLM Clone](https://docs.together.ai/docs/open-notebooklm-pdf-to-podcast): PDF-to-podcast with open models
- [Multimodal Document RAG](https://docs.together.ai/docs/quickstart): Vision models + ColQwen2 retrieval
- [Contextual RAG](https://docs.together.ai/docs/how-to-implement-contextual-rag-from-anthropic): Anthropic-style contextual RAG
- [Search with Rerankers](https://docs.together.ai/docs/how-to-improve-search-with-rerankers): Improve search quality with reranking
- [Sequential Agent Workflow](https://docs.together.ai/docs/sequential-agent-workflow): Chain agents in sequence
- [Parallel Workflows](https://docs.together.ai/docs/parallel-workflows): Run agents in parallel
- [Conditional Workflows](https://docs.together.ai/docs/conditional-workflows): Branch workflows based on model output
- [Iterative Workflow](https://docs.together.ai/docs/iterative-workflow): Self-improving agent loops
- [Create Slack Tickets](https://docs.together.ai/docs/create-tickets-in-slack): Automate ticket creation in Slack

## Customers
- [Customer Stories](https://www.together.ai/customers): How leading AI companies build on Together
- [Cursor](https://www.together.ai/customers/cursor): Real-time low-latency inference at scale for AI coding
- [Decagon](https://www.together.ai/customers/decagon): AI customer support with sub-second voice AI
- [Washington Post](https://www.together.ai/customers/washington-post): AI-powered editorial workflows at a major news publisher
- [Zomato](https://www.together.ai/customers/zomato): AI customer support bot that doubled satisfaction scores
- [Arcee AI](https://www.together.ai/customers/arcee-ai): Custom model training and deployment at scale
- [Dippy AI](https://www.together.ai/customers/dippy-ai): Character AI and roleplay applications
- [Hedra](https://www.together.ai/customers/hedra): AI video generation platform
- [HeroUI](https://www.together.ai/customers/heroui): AI-powered UI component generation
- [Latent Health](https://www.together.ai/customers/latent-health): Healthcare AI applications
- [LegionEdge](https://www.together.ai/customers/legionedge): AI-powered gaming infrastructure
- [Runware](https://www.together.ai/customers/runware): High-throughput image generation
- [Scaled Cognition](https://www.together.ai/customers/scaled-cognition): AI reasoning and agent systems
- [SCB10X](https://www.together.ai/customers/scb10x): Financial services AI
- [Slingshot AI](https://www.together.ai/customers/slingshot-ai): Sales intelligence AI
- [Vercept](https://www.together.ai/customers/vercept): AI-powered video analysis

## Company
- [About](https://www.together.ai/about-us): Company mission and team
- [Research](https://www.together.ai/research): Together AI research publications
- [Research Blog](https://www.together.ai/research-blog): Technical research posts
- [Blog](https://www.together.ai/blog): Engineering and product updates
- [Careers](https://www.together.ai/careers): Open roles at Together AI
- [Brand](https://www.together.ai/brand): Together AI brand assets and guidelines
- [Events](https://www.together.ai/events): Upcoming and past Together AI events

## GPU Hardware
- [NVIDIA GB200 NVL72](https://www.together.ai/nvidia-gb200-nvl72): GB200 NVL72 specs and availability
- [NVIDIA GB300 NVL72](https://www.together.ai/nvidia-gb300-nvl72): GB300 NVL72 specs and availability
- [NVIDIA HGX B200](https://www.together.ai/nvidia-hgx-b200): HGX B200 specs and availability
- [NVIDIA H200](https://www.together.ai/nvidia-h200): H200 specs and availability
- [NVIDIA H100](https://www.together.ai/nvidia-h100): H100 specs and availability
- [B300 GPU Cluster](https://www.together.ai/gpu/b300): B300 cluster configuration and pricing

## Optional
- [Cookbooks](https://www.together.ai/cookbooks): End-to-end implementation examples
- [Demos](https://www.together.ai/demos): Interactive product demos
- [Startup Accelerator](https://www.together.ai/startup-accelerator): Credits and support for early-stage AI startups
- [Contact Sales](https://www.together.ai/contact-sales): Enterprise and GPU cluster inquiries
- [Support](https://www.together.ai/support): Documentation and help center
- [Privacy Policy](https://www.together.ai/privacy): Data handling and privacy
- [Terms of Service](https://www.together.ai/terms-of-service): Usage terms
- [Cookie Policy](https://www.together.ai/cookie-policy): Cookie usage policy
- [Data Center Locations](https://www.together.ai/data-center-locations): Infrastructure regions

## All Model Pages
https://www.together.ai/models/afm-4-5b-preview
https://www.together.ai/models/apriel-1-5-15b-thinker
https://www.together.ai/models/apriel-1-6-15b-thinker
https://www.together.ai/models/arcana-v2
https://www.together.ai/models/arcee-ai-afm-4-5b
https://www.together.ai/models/arcee-ai-arcee-blitz
https://www.together.ai/models/arcee-ai-arcee-spotlight
https://www.together.ai/models/arcee-ai-caller
https://www.together.ai/models/arcee-ai-coder-large
https://www.together.ai/models/arcee-ai-maestro-reasoning
https://www.together.ai/models/arcee-ai-virtuoso-large
https://www.together.ai/models/arcee-ai-virtuoso-medium
https://www.together.ai/models/bge-base-en-v1-5
https://www.together.ai/models/bge-large-en-v1-5
https://www.together.ai/models/bytedance-seedance-1-0-lite
https://www.together.ai/models/bytedance-seedance-1-0-pro
https://www.together.ai/models/bytedance-seededit
https://www.together.ai/models/bytedance-seedream-3-0
https://www.together.ai/models/bytedance-seedream-4-0
https://www.together.ai/models/cartesia-sonic
https://www.together.ai/models/cartesia-sonic-3
https://www.together.ai/models/cogito-109b-moe
https://www.together.ai/models/cogito-405b
https://www.together.ai/models/cogito-671b-moe
https://www.together.ai/models/cogito-70b
https://www.together.ai/models/cogito-v1-preview-llama-3b
https://www.together.ai/models/cogito-v1-preview-llama-70b
https://www.together.ai/models/cogito-v1-preview-qwen-14b
https://www.together.ai/models/cogito-v1-preview-qwen-32b
https://www.together.ai/models/cogito-v2-1-671b
https://www.together.ai/models/dbrx-instruct
https://www.together.ai/models/deepseek-r1
https://www.together.ai/models/deepseek-r1-0528-throughput
https://www.together.ai/models/deepseek-r1-distilled-llama-70
https://www.together.ai/models/deepseek-r1-distilled-llama-70b-free
https://www.together.ai/models/deepseek-r1-distilled-qwen-14
https://www.together.ai/models/deepseek-v3
https://www.together.ai/models/deepseek-v3-1
https://www.together.ai/models/deepseek-v3-2-exp
https://www.together.ai/models/devstral-small-2505
https://www.together.ai/models/dreamshaper
https://www.together.ai/models/exaone-3-5-32b-instruct
https://www.together.ai/models/exaone-deep-32b
https://www.together.ai/models/flux-1-canny-pro
https://www.together.ai/models/flux-1-dev
https://www.together.ai/models/flux-1-kontext-dev
https://www.together.ai/models/flux-1-kontext-max
https://www.together.ai/models/flux-1-kontext-pro
https://www.together.ai/models/flux-1-krea-dev
https://www.together.ai/models/flux-1-schnell-2
https://www.together.ai/models/flux-1-schnell-fixedres
https://www.together.ai/models/flux-2-flex
https://www.together.ai/models/flux-2-max
https://www.together.ai/models/flux-2-pro
https://www.together.ai/models/flux1-1-pro
https://www.together.ai/models/gemini-flash-image-2-5
https://www.together.ai/models/gemma-3-12b
https://www.together.ai/models/gemma-3-1b
https://www.together.ai/models/gemma-3-27b
https://www.together.ai/models/gemma-3-4b
https://www.together.ai/models/gemma-3n-e4b-it
https://www.together.ai/models/gemma-instruct-2b
https://www.together.ai/models/glm-4-5-air
https://www.together.ai/models/glm-4-6
https://www.together.ai/models/glm-4-7
https://www.together.ai/models/glm-5
https://www.together.ai/models/google-imagen-4-0-fast
https://www.together.ai/models/google-imagen-4-0-preview
https://www.together.ai/models/google-imagen-4-0-ultra
https://www.together.ai/models/google-veo-2-0
https://www.together.ai/models/google-veo-3-0
https://www.together.ai/models/google-veo-3-0-audio
https://www.together.ai/models/google-veo-3-0-fast
https://www.together.ai/models/google-veo-3-0-fast-audio
https://www.together.ai/models/gpt-oss-120b
https://www.together.ai/models/gpt-oss-20b
https://www.together.ai/models/gryphe-mythomax-l2-lite-13b
https://www.together.ai/models/gte-modernbert-base
https://www.together.ai/models/hidream-i1-dev
https://www.together.ai/models/hidream-i1-fast
https://www.together.ai/models/hidream-i1-full
https://www.together.ai/models/ideogram-3-0
https://www.together.ai/models/juggernaut-lightning-flux
https://www.together.ai/models/juggernaut-pro-flux
https://www.together.ai/models/kimi-k2-0905
https://www.together.ai/models/kimi-k2-5
https://www.together.ai/models/kimi-k2-instruct
https://www.together.ai/models/kimi-k2-thinking
https://www.together.ai/models/kling-1-6-pro
https://www.together.ai/models/kling-1-6-standard
https://www.together.ai/models/kling-2-0-master
https://www.together.ai/models/kling-2-1-master
https://www.together.ai/models/kling-2-1-pro
https://www.together.ai/models/kling-2-1-standard
https://www.together.ai/models/kokoro-82m
https://www.together.ai/models/lfm2-24b-a2b
https://www.together.ai/models/llama-2-chat-13b
https://www.together.ai/models/llama-2-chat-7b
https://www.together.ai/models/llama-3-1
https://www.together.ai/models/llama-3-1-70b
https://www.together.ai/models/llama-3-2-3b-instruct-turbo
https://www.together.ai/models/llama-3-3-70b
https://www.together.ai/models/llama-3-3-70b-free
https://www.together.ai/models/llama-3-70b-instruct-reference
https://www.together.ai/models/llama-3-70b-instruct-turbo
https://www.together.ai/models/llama-3-8b-instruct-lite
https://www.together.ai/models/llama-4-maverick
https://www.together.ai/models/llama-4-scout
https://www.together.ai/models/llama-guard-2-8b
https://www.together.ai/models/llama-guard-3-11b-vision-turbo
https://www.together.ai/models/llama-guard-3-8b
https://www.together.ai/models/llama-guard-4-12b
https://www.together.ai/models/llama-guard-7b
https://www.together.ai/models/m2-bert-80m-2k-retrieval
https://www.together.ai/models/m2-bert-80m-32k-retrieval
https://www.together.ai/models/m2-bert-80m-8k-retrieval
https://www.together.ai/models/magistral-small-2506
https://www.together.ai/models/marin-8b-instruct
https://www.together.ai/models/minimax-01-director
https://www.together.ai/models/minimax-hailuo-02
https://www.together.ai/models/minimax-m1-40k
https://www.together.ai/models/minimax-m1-80k
https://www.together.ai/models/minimax-m2-5
https://www.together.ai/models/minimax-m21
https://www.together.ai/models/minimax-speech-2-6-turbo
https://www.together.ai/models/ministral-3-14b-instruct-2512
https://www.together.ai/models/ministral-3-3b-instruct-2512
https://www.together.ai/models/mist-v2
https://www.together.ai/models/mistral-7b-instruct-v0-2
https://www.together.ai/models/mistral-beb7b
https://www.together.ai/models/mistral-instruct
https://www.together.ai/models/mistral-small-3
https://www.together.ai/models/mixtral-8x7b-v0-1
https://www.together.ai/models/mixtral-instruct
https://www.together.ai/models/multilingual-e5-large-instruct
https://www.together.ai/models/mxbai-rerank-large-v2
https://www.together.ai/models/mythomax-l2
https://www.together.ai/models/nim-llama-3-1-70b-instruct
https://www.together.ai/models/nim-llama-3-1-8b-instruct
https://www.together.ai/models/nim-llama-3-1-nemotron-70b-instruct
https://www.together.ai/models/nim-llama-3-2-11b-vision-instruct
https://www.together.ai/models/nim-llama-3-2-90b-vision-instruct
https://www.together.ai/models/nim-llama-3-3-70b-instruct
https://www.together.ai/models/nim-llama-3-3-nemotron-super-49b-v1
https://www.together.ai/models/nim-mistral-nemo-12b-instruct
https://www.together.ai/models/nim-mixtral-8x22b-instruct-v0-1
https://www.together.ai/models/nim-mixtral-8x7b-instruct-v0-1
https://www.together.ai/models/nvidia-nemotron-3-nano
https://www.together.ai/models/nvidia-nemotron-nano-9b-v2
https://www.together.ai/models/openai-whisper-large-v3
https://www.together.ai/models/orpheus-tts
https://www.together.ai/models/pixverse-v5
https://www.together.ai/models/qwen-2-5
https://www.together.ai/models/qwen-2-5-coder-32b-instruct
https://www.together.ai/models/qwen-image
https://www.together.ai/models/qwen-image-edit
https://www.together.ai/models/qwen-qwq-32b
https://www.together.ai/models/qwen2-5-7b-instruct-turbo
https://www.together.ai/models/qwen2-5-vl-72b-instruct
https://www.together.ai/models/qwen3-0-6b
https://www.together.ai/models/qwen3-0-6b-base
https://www.together.ai/models/qwen3-1-7b
https://www.together.ai/models/qwen3-1-7b-base
https://www.together.ai/models/qwen3-14b-base
https://www.together.ai/models/qwen3-235b-a22b-fp8-tput
https://www.together.ai/models/qwen3-235b-a22b-instruct-2507-fp8
https://www.together.ai/models/qwen3-235b-a22b-thinking-2507
https://www.together.ai/models/qwen3-30b-a3b
https://www.together.ai/models/qwen3-30b-a3b-base
https://www.together.ai/models/qwen3-32b
https://www.together.ai/models/qwen3-4b
https://www.together.ai/models/qwen3-4b-base
https://www.together.ai/models/qwen3-5-397b-a17b
https://www.together.ai/models/qwen3-8b
https://www.together.ai/models/qwen3-coder-480b-a35b-instruct
https://www.together.ai/models/qwen3-coder-next
https://www.together.ai/models/qwen3-next-80b-a3b-instruct
https://www.together.ai/models/qwen3-next-80b-a3b-thinking
https://www.together.ai/models/qwen3-vl-32b-instruct
https://www.together.ai/models/refuel-llm-2
https://www.together.ai/models/refuel-llm-2-small
https://www.together.ai/models/rime-arcana-v3
https://www.together.ai/models/rime-arcana-v3-turbo
https://www.together.ai/models/rnj-1-instruct
https://www.together.ai/models/salesforce-llamarank
https://www.together.ai/models/sd-xl
https://www.together.ai/models/sora-2
https://www.together.ai/models/sora-2-pro
https://www.together.ai/models/stable-diffusion-3
https://www.together.ai/models/trinity-mini
https://www.together.ai/models/typhoon-2-8b-instruct
https://www.together.ai/models/typhoon2-1-gemma3-12b
https://www.together.ai/models/uae-large-v1
https://www.together.ai/models/virtueguard-text-lite
https://www.together.ai/models/vidu-2-0
https://www.together.ai/models/vidu-q1
https://www.together.ai/models/voxtral-mini-3b-2507
https://www.together.ai/models/wan-2-2-i2v
https://www.together.ai/models/wan-2-2-t2v
https://www.together.ai/models/whisper-large-v3-streaming

## All Blog Posts
https://www.together.ai/blog/20-exaflops-gpu-clusters
https://www.together.ai/blog/40-new-image-and-video-models
https://www.together.ai/blog/a-practitioners-guide-to-testing-and-running-large-gpu-clusters-for-training-generative-ai-models
https://www.together.ai/blog/adaptive-learning-speculator-system-atlas
https://www.together.ai/blog/ai-agents-to-automate-complex-engineering-tasks
https://www.together.ai/blog/alon-gavrielov-as-vp-of-infrastructure-strategy
https://www.together.ai/blog/announcing-the-availability-of-openais-open-models-on-together-ai
https://www.together.ai/blog/announcing-together-ai-startup-accelerator
https://www.together.ai/blog/announcing-together-custom-models
https://www.together.ai/blog/api-announcement
https://www.together.ai/blog/arcee-ai
https://www.together.ai/blog/august-2023-pricing-update
https://www.together.ai/blog/axiomatic-agents
https://www.together.ai/blog/based
https://www.together.ai/blog/batch-api
https://www.together.ai/blog/batch-inference-api-updates-2025
https://www.together.ai/blog/benchmarking-language-models-using-the-together-research-computer
https://www.together.ai/blog/bitdelta
https://www.together.ai/blog/build-ultra-low-latency-voice-ai-applications-with-together-ai-and-cartesia-sonic
https://www.together.ai/blog/building-an-autonomous-and-open-data-scientist-agent-from-scratch
https://www.together.ai/blog/cache-aware-disaggregated-inference
https://www.together.ai/blog/chipmunk
https://www.together.ai/blog/clustermax-gold
https://www.together.ai/blog/cocktailsgd
https://www.together.ai/blog/code-sandbox-code-interpreter
https://www.together.ai/blog/collinear-simulations-together-evals
https://www.together.ai/blog/consistency-diffusion-language-models
https://www.together.ai/blog/continued-fine-tuning
https://www.together.ai/blog/customized-speculative-decoding
https://www.together.ai/blog/decentralized-training-of-foundation-models-in-heterogeneous-environments
https://www.together.ai/blog/deepcoder
https://www.together.ai/blog/deepseek-v3-1-hybrid-thinking-model-now-available-on-together-ai
https://www.together.ai/blog/deepswe
https://www.together.ai/blog/deploy-deepseek-r1-and-distilled-models-securely-on-together-ai
https://www.together.ai/blog/deploy-deepseek-r1-at-scale-fast-secure-serverless-apis-and-large-scale-together-reasoning-clusters
https://www.together.ai/blog/dippy-ai
https://www.together.ai/blog/direct-preference-optimization
https://www.together.ai/blog/dragonfly-v1
https://www.together.ai/blog/embeddings-endpoint-release
https://www.together.ai/blog/evaluate-and-benchmark-llms
https://www.together.ai/blog/even-better-even-faster-quantized-llms-with-qtip
https://www.together.ai/blog/evo
https://www.together.ai/blog/fastest-inference-for-deepseek-r1-0528-with-nvidia-hgx-b200
https://www.together.ai/blog/fastest-inference-for-the-top-open-source-models
https://www.together.ai/blog/fine-tune-gpt-oss-models-into-domain-experts-together-ai
https://www.together.ai/blog/fine-tune-small-open-source-llms-outperform-closed-models
https://www.together.ai/blog/fine-tuning-api-introducing-long-context-training-conversation-data-support-and-more-configuration-options
https://www.together.ai/blog/fine-tuning-language-models-over-slow-networks-using-activation-compression-with-guarantees
https://www.together.ai/blog/fine-tuning-llms-for-multi-turn-conversations-a-technical-deep-dive
https://www.together.ai/blog/fine-tuning-open-llm-judges-to-outperform-gpt-5-2
https://www.together.ai/blog/fine-tuning-updates-sept-2025
https://www.together.ai/blog/finetuning
https://www.together.ai/blog/flash-decoding-for-long-context-inference
https://www.together.ai/blog/flashattentionfandm
https://www.together.ai/blog/flashfftconv
https://www.together.ai/blog/flexgen-high-throughput-generative-inference-of-large-language-models-with-a-single-gpu
https://www.together.ai/blog/flux-1-kontext
https://www.together.ai/blog/flux-2-multi-reference-image-generation-now-available-on-together-ai
https://www.together.ai/blog/flux-api-is-now-available-on-together-ai-new-pro-free-access-to-flux-schnell
https://www.together.ai/blog/flux-tools-models-together-apis-canny-depth-image-generation
https://www.together.ai/blog/function-calling-json-mode
https://www.together.ai/blog/futurebench
https://www.together.ai/blog/generate-images-with-specific-styles-using-flux-loras-on-together-ai
https://www.together.ai/blog/h3
https://www.together.ai/blog/how-speech-models-fail
https://www.together.ai/blog/how-to-build-a-coding-agent-from-scratch-a-practical-guide-for-developers
https://www.together.ai/blog/how-to-build-a-real-time-image-generator-with-together-ai
https://www.together.ai/blog/how-to-choose-the-right-open-model-for-production
https://www.together.ai/blog/how-zomato-built-an-ai-customer-support-bot-that-doubled-customer-satisfaction
https://www.together.ai/blog/hungry-hungry-hippos-towards-language-modeling-with-state-space-models
https://www.together.ai/blog/hyena-hierarchy-towards-larger-convolutional-language-models
https://www.together.ai/blog/instant-gpu-clusters
https://www.together.ai/blog/introducing-autojudge-streamlined-inference-acceleration-via-automated-dataset-curation
https://www.together.ai/blog/introducing-fine-tuning-platform
https://www.together.ai/blog/introducing-the-together-enterprise-platform
https://www.together.ai/blog/introducing-together-evaluations
https://www.together.ai/blog/kimi-k2-leading-open-source-model-now-available-on-together-ai
https://www.together.ai/blog/large-reasoning-models-fail-to-follow-instructions-during-reasoning-a-benchmark-study
https://www.together.ai/blog/learn-how-cursor-partnered-with-together-ai-to-deliver-real-time-low-latency-inference-at-scale
https://www.together.ai/blog/linearizing-llms-with-lolcats
https://www.together.ai/blog/llama-2-7b-32k
https://www.together.ai/blog/llama-2-7b-32k-instruct
https://www.together.ai/blog/llama-3-2-vision-stack
https://www.together.ai/blog/llama-3-3
https://www.together.ai/blog/llama-31-quality
https://www.together.ai/blog/llama-4
https://www.together.ai/blog/long-context-retrieval-models-with-monarch-mixer
https://www.together.ai/blog/mahadev-konar-svp-infrastructure-engineering
https://www.together.ai/blog/mamba-3b-slimpj
https://www.together.ai/blog/medusa
https://www.together.ai/blog/meta-llama-3-1
https://www.together.ai/blog/minions
https://www.together.ai/blog/mistral-small-3-api-now-available-on-together-ai-a-new-category-leader-in-small-models
https://www.together.ai/blog/mixtral
https://www.together.ai/blog/monarch-mixer
https://www.together.ai/blog/multi-node-gpu-training
https://www.together.ai/blog/multimodal-document-rag-with-llama-3-2-vision-and-colqwen2
https://www.together.ai/blog/nemotron-3-nano-now-available-on-together-ai
https://www.together.ai/blog/neurips-2022-overcoming-communication-bottlenecks-for-decentralized-training-12
https://www.together.ai/blog/neurips-2022-overcoming-communication-bottlenecks-for-decentralized-training-2
https://www.together.ai/blog/nvidia-ai-foundry-partnership
https://www.together.ai/blog/nvidia-blackwell-test-drive
https://www.together.ai/blog/nvidia-cloud-partner
https://www.together.ai/blog/nvidia-gb200-together-gpu-cluster-36k
https://www.together.ai/blog/nvidia-h200-and-h100-gpu-cluster-performance-together-kernel-collection
https://www.together.ai/blog/nvidia-hgx-b200-with-together-kernel-collection
https://www.together.ai/blog/nvidia-nim
https://www.together.ai/blog/on-demand-dedicated-endpoints
https://www.together.ai/blog/open-deep-research
https://www.together.ai/blog/openais-new-open-gpt-oss-models-vs-o4-mini-a-real-world-comparison
https://www.together.ai/blog/openchatkit
https://www.together.ai/blog/openchatkit-016
https://www.together.ai/blog/optimizing-inference-speed-and-costs
https://www.together.ai/blog/python-sdk-v1
https://www.together.ai/blog/qwen-3-coder
https://www.together.ai/blog/rag-fine-tuning
https://www.together.ai/blog/rag-tutorial-langchain
https://www.together.ai/blog/rag-tutorial-llamaindex
https://www.together.ai/blog/rag-tutorial-mongodb
https://www.together.ai/blog/redpajama
https://www.together.ai/blog/redpajama-3b-updates
https://www.together.ai/blog/redpajama-7b
https://www.together.ai/blog/redpajama-data-v2
https://www.together.ai/blog/redpajama-models-v1
https://www.together.ai/blog/redpajama-training-progress
https://www.together.ai/blog/redpajama-v2-faq
https://www.together.ai/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai
https://www.together.ai/blog/research-pov-yes-agi-can-happen
https://www.together.ai/blog/rime-arcana-v3-turbo-and-rime-arcana-v3-now-available-on-together-ai
https://www.together.ai/blog/safety-models
https://www.together.ai/blog/seed-funding
https://www.together.ai/blog/sequoia
https://www.together.ai/blog/series-a
https://www.together.ai/blog/series-a2
https://www.together.ai/blog/serverless-multi-lora-fine-tune-and-deploy-hundreds-of-adapters-for-model-customization-at-scale
https://www.together.ai/blog/snorkel-partnership
https://www.together.ai/blog/snowflake-artic-llm
https://www.together.ai/blog/soc-2-compliance
https://www.together.ai/blog/sota-search-stack-for-llms
https://www.together.ai/blog/specexec
https://www.together.ai/blog/speculative-decoding-for-high-throughput-long-context-inference
https://www.together.ai/blog/speech-to-text-whisper-apis
https://www.together.ai/blog/stanford-open-source-software-award
https://www.together.ai/blog/stripedhyena-7b
https://www.together.ai/blog/teal-training-free-activation-sparsity-in-large-language-models
https://www.together.ai/blog/the-fastest-inference-for-realtime-voice-ai-agents
https://www.together.ai/blog/the-frontier-is-open
https://www.together.ai/blog/the-mamba-in-the-llama-distilling-and-accelerating-hybrid-models
https://www.together.ai/blog/thunderkittens
https://www.together.ai/blog/thunderkittens-nvidia-blackwell-gpus
https://www.together.ai/blog/together-ai-acquires-refuel-ai
https://www.together.ai/blog/together-ai-and-meta-partner-to-bring-pytorch-reinforcement-learning-to-the-ai-native-cloud
https://www.together.ai/blog/together-ai-announcing-305m-series-b
https://www.together.ai/blog/together-ai-available-aws-marketplace-to-accelerate-enterprise-ai-development
https://www.together.ai/blog/together-ai-expands-in-europe
https://www.together.ai/blog/together-ai-partners-with-meta-to-release-meta-llama-3-for-inference-and-fine-tuning
https://www.together.ai/blog/together-ai-powers-pioneers-at-nvidia-gtc-2025
https://www.together.ai/blog/together-ai-welcomes-kai-mak
https://www.together.ai/blog/together-chat
https://www.together.ai/blog/together-crusoe-reduce-carbon-impact-of-generative-ai
https://www.together.ai/blog/together-evaluations-v2
https://www.together.ai/blog/together-inference-engine-2
https://www.together.ai/blog/together-inference-engine-v1
https://www.together.ai/blog/together-instant-clusters-ga
https://www.together.ai/blog/together-moa
https://www.together.ai/blog/together-python-sdk-2-0
https://www.together.ai/blog/together-rerank-api-and-salesforce-llamarank
https://www.together.ai/blog/torchforge-reinforcement-learning-pipelines
https://www.together.ai/blog/tri-dao-flash-attention
https://www.together.ai/blog/virtueguard
https://www.together.ai/blog/yaqa

## Webinars & Events
https://www.together.ai/events
https://www.together.ai/build-coding-agent-webinar
https://www.together.ai/codesandbox-sdk-webinar
https://www.together.ai/deepseek-r1-how-it-works-simplified-together-ai-webinar
https://www.together.ai/nvidia-blackwell-deep-dive-webinar
https://www.together.ai/webinar-how-advanced-tool-calling-transforms-agentic-use-cases
https://www.together.ai/webinar-openai-gpt-oss-deep-dive