⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Explore Research

Research blog

Architecture

Mamba-3B-SlimPJ: State-space models rivaling the best Transformer architecture

Tri Dao, Albert Gu

Architecture

Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

Together

Kernels

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré

Inference

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

Together

No search result

Try expanding your search or changing the filters.