Research blog

Architecture

Aakash Lahoti* (CMU), Kevin Y. Li* (CMU), Berlin Chen* (Princeton), Caitlin Wang* (Princeton), Aviv Bick (CMU), J. Zico Kolter (CMU), Tri Dao (Princeton, Together AI), Albert Gu (CMU, Cartesia A)

Abstract purple and red shapes background with the word 'Mamba-3' and a small label 'RESEARCH'.

Kernels

Key research and product announcements at the AI Native Conf

Kernels

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

Ted Zadouri (Princeton University, Together AI), Markus Hoehnerbach (Meta), Jay Shah (Colfax Research), Timmy Liu (NVIDIA), Vijay Thakkar (Meta, Georgia Tech), Tri Dao (Princeton University, Together AI)

Abstract colorful curved shapes with the text FlashAttention-4 and a small label reading RESEARCH.

Inference

Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving

Jiejing Zhang, Yubo Wang, Yinghui Liu, Mourya Vangala Srinivasa, Chenxi Li, Jue Wang, Yineng Zhang, Shuaiwen Leon Song, Ce Zhang

Bar chart showing CPD outperforming baseline in queries per second for 1D vs 2P1D and 2D vs 2P2D tests.

No search result

Try expanding your search or changing the filters.