MiniMax Speech 2.6 Turbo

Ultra-low latency TTS for production voice agents on Together AI.

About model

Production-Ready Voice Intelligence: MiniMax Speech 2.6 Turbo is an enterprise-grade text-to-speech model optimized for real-time voice agent scenarios. With under 250ms end-to-end latency and advanced format handling, it powers voice interactions for platforms like LiveKit, Pipecat, and Vapi. Deploy on Together AI dedicated endpoints for reliable, scalable voice infrastructure that integrates seamlessly with your LLM workloads.

Dedicated Deployment: Available on on-demand dedicated and monthly reserved capacity on Together AI, ensuring consistent performance for production voice applications without infrastructure overhead.

Quickstart guides

Audio

Open NotebookLM: PDF to Podcast

Model card
Model Architecture:
• Advanced speech synthesis model trained for natural, human-like voice quality
• Optimized audio generation pipeline achieving industry-leading sub-250ms end-to-end latency
• Supports 40+ languages with native format handling and prosodic naturalness

Advanced Capabilities:
• Seamless handling of specialized formats: URLs, email addresses, phone numbers, dates, monetary amounts, and IP addresses across multiple languages without preprocessing
• Fluent LoRA technology: high-fidelity voice cloning that transforms non-fluent or accented recordings into fluent, natural speech while preserving timbre
• Direct integration with large language models - no text preprocessing required for dynamic entity information

Production Performance:
• Powers voice infrastructure for major platforms including LiveKit (ChatGPT's advanced voice mode), Pipecat, and Vapi
• Deployed in smart hardware products: Haivivi Bubble Pal, Fuzozo, and Rokid Glasses
• Proven at scale across global voice intelligence applications
‍
Applications & use cases
Real-Time Voice Agents:
• AI customer service and support with sub-250ms response times
• Interactive voice response (IVR) systems requiring natural speech
• Live conversational AI assistants and chatbots

Multilingual Voice Applications:
• Global contact centers with 40+ language support
• Voice cloning for localized brand voices across markets
• Smart hardware with voice interaction capabilities

Enterprise Voice Infrastructure:
• Voice agent platforms requiring reliable, scalable TTS
• Integration with LLM reasoning for end-to-end voice pipelines
• Production deployments on dedicated Together AI infrastructure with consistent performance and isolated workloads
‍

Related models

Model specifications

Model data

Model provider
MiniMax AI
Type
Audio
Main use cases
Text-to-Speech
Deployment
On-Demand Dedicated
Monthly Reserved
Price
$30 / 1M characters + GPU hourly (by hardware)
Input modalities
Text
Output modalities
Audio

Category
Audio

Quickstart docs

Deploy model

MiniMax Speech 2.6 Turbo

About model

Model card

Applications & use cases