Open-source Research from Together AI | Cutting-edge Models, Datasets, and Optimizations.

Research

Advancing the open-source AI frontier

Our research team contributes cutting-edge models, datasets, and optimizations to the open-source community.

Research

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs

Benjamin Spector, Aaryan Singhal, Dan Fu, Chris Ré

・

March 15, 2025

Research

Boosting DeepSeek-R1’s Speed with Customized Speculative Decoding

Wai Tong Chung, Dan Waters, Avner May, Ben Athiwaratkun

・

May 12, 2025

Research

Mixture-of-Agents Alignment: Harnessing the Collective Intelligence of Open-Source LLMs to Improve Post-Training

Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou

・

May 28, 2025

Company

Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints

Together AI

・

July 18, 2024

Research

TEAL: Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun

・

August 28, 2024

Research

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

・

September 9, 2024

Research

Speculative decoding for high-throughput long-context inference

Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen

・

September 5, 2024

Research

Even Better, Even Faster Quantized LLMs with QTIP

Albert Tseng, Qingyao Sun, David Hou, Chris De Sa

・

October 30, 2024

Research

Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process

Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Ré

・

February 25, 2025

Research

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI)

・

July 11, 2024

Research

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

Together

・

October 30, 2023

RedPajama

RedPajama provides a set of leading open-source foundation models built on the largest-ever open pre-training dataset.

01 RedPajama-Data-30T
The largest open-source pre-training dataset, used by over 500 leading generative AI models. This dataset and the open research approach used to create the RedPajama models is helping to advance the frontier of open-source AI.
Learn more
02 RedPajama-7B
A suite of fully open-source base, instruction-tuned, and chat models.
‍
The instruct model is the highest scoring open model on HELM benchmarks, making it ideal for a wide range of tasks. It outperforms LLaMA-7B and state-of-the-art open models such as Falcon-7B (Base and Instruct) and MPT-7B (Base and Instruct) on HELM by 2-9 points.
Learn more
03 RedPajama-3B
The smaller RedPajama model is ideally suited for running on the edge, with support for running on iPhones, Android smartphones, Raspberry pi, and other devices.
Learn more

Innovations

Innovations that make training and inference faster, more scalable, and reliable.

01 FlashAttention-2
This update to FlashAttention is now broadly used by all transformer models, speeds up training and fine-tuning of LLMs by up to 9x and achieves 72% model FLOPs utilization for training on NVIDIA A100s.
Learn more
02 Sub-quadratic model architectures
In collaboration with Hazy Research, we are actively working on the next core architecture for generative AI models that will provide much faster performance, with longer context. Research in this area includes Hyena, Monarch Mixer, and FlashConv.
Learn more
03 Cocktail SGD
One of the key challenges in training generative AI models is networking. To enable faster, more reliable training that can run in a distributed environment, we created Cocktail SGD – a set of optimizations that reduces network communication by 117x.
Learn more

Research

Read the latest research from our team and academic partners

Apr 21

・

2025

Research

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Austin Silveria, Soham Govande, Dan Fu

Apr 17

・

2025

Research

Direct Preference Optimization: A Technical Deep Dive

Ivan Provilkov, Zain Hasan, Max Ryabinin

Apr 17

・

2025

Research

Continued Fine-tuning of LLMs: A Technical Deep Dive

Artem Chumachenko, Zain Hasan, Max Ryabinin

Apr 16

・

2025

Research

Open Deep Research

Together AI

Apr 8

・

2025

Research

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Michael Luo*, Sijun Tan*, Roy Huang*, Ameen Patel*, Alpay Ariyak*, Qingyang Wu*, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs

Boosting DeepSeek-R1’s Speed with Customized Speculative Decoding

Mixture-of-Agents Alignment: Harnessing the Collective Intelligence of Open-Source LLMs to Improve Post-Training

Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints

TEAL: Training-Free Activation Sparsity in Large Language Models

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Speculative decoding for high-throughput long-context inference

Even Better, Even Faster Quantized LLMs with QTIP

Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

RedPajama

01 RedPajama-Data-30T

02 RedPajama-7B

03 RedPajama-3B

Innovations

01 FlashAttention-2

02 Sub-quadratic model architectures

03 Cocktail SGD

Research

Apr 21

・

2025

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Apr 17

・

2025

Direct Preference Optimization: A Technical Deep Dive

Apr 17

・

2025

Continued Fine-tuning of LLMs: A Technical Deep Dive

Apr 16

・

2025

Open Deep Research

Apr 8

・

2025

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Subscribe to newsletter