Open-source Research: Models, Datasets, and Optimizations

Research

Advancing the open-source AI frontier

Our research team contributes cutting-edge models, datasets, and optimizations to the open-source community.

Together AI delivers fastest inference for the top open-source models

Jue Wang, Wai Tong Chung, Chandra Mourya, John Heo, Shirley Wu, Alaskar Alizada, Rupert Wu, Roy Yuan, Pragaash Ponnusamy, Ben Athiwaratkun, Leon Song

・

December 1, 2025

Research

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Yongchan Kwon, Shang Zhu, Federico Bianchi, Kaitlyn Zhou, James Zou

・

October 22, 2025

Research

Multimodal Document RAG with Llama 3.2 Vision and ColQwen2

Zain Hasan

・

October 8, 2024

Research

Direct Preference Optimization: A Technical Deep Dive

Ivan Provilkov, Zain Hasan, Max Ryabinin

・

April 17, 2025

Research

Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process

Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Ré

・

February 25, 2025

Research

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Austin Silveria, Soham Govande, Dan Fu

・

April 21, 2025

Research

How to run TorchForge reinforcement learning pipelines in the Together AI Native Cloud

Together AI Training and Research, The PyTorch team at Meta

・

December 3, 2025

RedPajama

RedPajama provides a set of leading open-source foundation models built on the largest-ever open pre-training dataset.

01 RedPajama-Data-30T
The largest open-source pre-training dataset, used by over 500 leading generative AI models. This dataset and the open research approach used to create the RedPajama models is helping to advance the frontier of open-source AI.
Learn more
02 RedPajama-7B
A suite of fully open-source base, instruction-tuned, and chat models.
‍
The instruct model is the highest scoring open model on HELM benchmarks, making it ideal for a wide range of tasks. It outperforms LLaMA-7B and state-of-the-art open models such as Falcon-7B (Base and Instruct) and MPT-7B (Base and Instruct) on HELM by 2-9 points.
Learn more
03 RedPajama-3B
The smaller RedPajama model is ideally suited for running on the edge, with support for running on iPhones, Android smartphones, Raspberry pi, and other devices.
Learn more

Innovations

Innovations that make training and inference faster, more scalable, and reliable.

01 FlashAttention-2
This update to FlashAttention is now broadly used by all transformer models, speeds up training and fine-tuning of LLMs by up to 9x and achieves 72% model FLOPs utilization for training on NVIDIA A100s.
Learn more
02 Sub-quadratic model architectures
In collaboration with Hazy Research, we are actively working on the next core architecture for generative AI models that will provide much faster performance, with longer context. Research in this area includes Hyena, Monarch Mixer, and FlashConv.
Learn more
03 Cocktail SGD
One of the key challenges in training generative AI models is networking. To enable faster, more reliable training that can run in a distributed environment, we created Cocktail SGD – a set of optimizations that reduces network communication by 117x.
Learn more

Research

Read the latest research from our team and academic partners

Dec 3

・

2025

Research

How to run TorchForge reinforcement learning pipelines in the Together AI Native Cloud

Together AI Training and Research, The PyTorch team at Meta

Dec 3

・

2025

Research

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

ROMAN GARIPOV, FEDOR VELIKONIVTSEV, IVAN ERMAKOV, RUSLAN SVIRSCHEVSKI, VAGE EGIAZARIAN, MAX RYABININ

Oct 22

・

2025

Research

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Yongchan Kwon, Shang Zhu, Federico Bianchi, Kaitlyn Zhou, James Zou

Oct 10

・

2025

Research

AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

Junxiong Wang, Shirley Wu, Zelei Shao, Vikranth Srivatsa, Jue Wang, Roy Yuan, Qingyang Wu, Alpay Ariyak, Rupert Wu, Wai Tong Chung, Chenfeng Xu, Yonatan Oren, Pragaash Ponnusamy, Yineng Zhang, Avner May, Leon Song, Tri Dao, Percy Liang, Ce Zhang, Ben Athiwaratkun

Aug 21

・

2025

Research

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

Shang Zhu, Federico Bianchi, Wai Tong Chung, Zain Hasan, Rupert Wu, Ce Zhang, James Zou, Ben Athiwaratkun

Together AI delivers fastest inference for the top open-source models

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Multimodal Document RAG with Llama 3.2 Vision and ColQwen2

Direct Preference Optimization: A Technical Deep Dive

Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

How to run TorchForge reinforcement learning pipelines in the Together AI Native Cloud

RedPajama

01 RedPajama-Data-30T

02 RedPajama-7B

03 RedPajama-3B

Innovations

01 FlashAttention-2

02 Sub-quadratic model architectures

03 Cocktail SGD

Research

Dec 3

・

2025

How to run TorchForge reinforcement learning pipelines in the Together AI Native Cloud

Dec 3

・

2025

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Oct 22

・

2025

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Oct 10

・

2025

AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

Aug 21

・

2025

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

Subscribe to newsletter