This website uses cookies to anonymously analyze website traffic using Google Analytics.

Together AI launches full stack for developers to build with open-source AI

July 14, 2023



Introducing Together API and Together Compute — simple, powerful, cost effective cloud services to train, fine-tune, and run the world’s leading open-source AI models

We’re in the middle of an AI revolution that will impact nearly every aspect of society.

Whether this AI is open and accessible will shape the pace and direction of innovation for decades to come. Most of today’s leading generative AI models are closed behind commercial APIs—limiting companies and researchers from inspecting, understanding, and customizing models for their needs. At the same time, training large custom models is complicated, expensive, and time consuming, requiring significant AI expertise and management of large-scale infrastructure.

Today, we’re excited to announce two products to help change this: Together API and Together Compute.

These cloud services offer a full stack solution for AI developers to train, fine-tune, and run the world’s leading open-source AI models. We currently host more than 50 models, including RedPajama, LLaMA, Falcon, and Stable Diffusion.

Together API: With an easy-to-use fine-tuning API, powered by one of the most efficient distributed training systems, Together API makes fine-tuning large AI models simple and fast. It also offers optimized, private API endpoints for low-latency inference. Deploy your first model in seconds.

Together Compute: For AI/ML research groups who want to pre-train models on their own datasets, we offer clusters of high-end GPUs paired with our distributed training stack. Together AI’s research team is behind breakthroughs like FlashAttention, FlexGen, and CocktailSGD that are core to modern optimization—making Together Compute the most cost effective way to build new models with supercharged speed. Reserve a training cluster.

One of our goals is making AI accessible by radically reducing costs.

Generative AI models have billions of parameters, and every token flows through all of these parameters, which makes these models expensive in production. Hosting them for inference on hyperscalers can cost $4-6 per hour per A100 GPU. A typical cloud instance with NVIDIA’s 8xA100 cards costs 25,000 a month to operate.  

We are achieving and offering significant reductions in cost of interactive inference workloads on large models. We optimize down the stack, with thousands of GPUs located in multiple secure facilities, software for virtualization, scheduling, and model optimizations that significantly bring down operating costs.

Our A100-based inference VMs are offered as low as $0.11 per hour for inference, and our users can often pay 1/5th of what it costs on hyperscalers to train and fine-tune models.

We believe AI is having its Linux moment—and we’re proud to be part of it.

In April, along with collaborators, we released RedPajama, a set of leading open-source models and the largest-ever open pre-training dataset. The RedPajama dataset has been used to train over 200 models! We plan to keep engaging, building, and releasing models, datasets, and research in the open.

In May, we announced our seed funding led by Lux Capital to build a cloud platform for open-source AI. Today we are thrilled to share the first version of that platform with you and excited to see what you build!

Read the API Docs, and join us in #api on our Discord.

See you there!

Editorial note: Together Compute was renamed to Together GPU Clusters on November 5th 2023.

  • Lower
  • faster
  • network

here →