MiniMax Speech 2.6 Turbo
Ultra-low latency TTS for production voice agents on Together AI.
About model
Production-Ready Voice Intelligence: MiniMax Speech 2.6 Turbo is an enterprise-grade text-to-speech model optimized for real-time voice agent scenarios. With under 250ms end-to-end latency and advanced format handling, it powers voice interactions for platforms like LiveKit, Pipecat, and Vapi. Deploy on Together AI dedicated endpoints for reliable, scalable voice infrastructure that integrates seamlessly with your LLM workloads.
Dedicated Deployment: Available on on-demand dedicated and monthly reserved capacity on Together AI, ensuring consistent performance for production voice applications without infrastructure overhead.
Model card
Model Architecture:
• Advanced speech synthesis model trained for natural, human-like voice quality
• Optimized audio generation pipeline achieving industry-leading sub-250ms end-to-end latency
• Supports 40+ languages with native format handling and prosodic naturalness
Advanced Capabilities:
• Seamless handling of specialized formats: URLs, email addresses, phone numbers, dates, monetary amounts, and IP addresses across multiple languages without preprocessing
• Fluent LoRA technology: high-fidelity voice cloning that transforms non-fluent or accented recordings into fluent, natural speech while preserving timbre
• Direct integration with large language models - no text preprocessing required for dynamic entity information
Production Performance:
• Powers voice infrastructure for major platforms including LiveKit (ChatGPT's advanced voice mode), Pipecat, and Vapi
• Deployed in smart hardware products: Haivivi Bubble Pal, Fuzozo, and Rokid Glasses
• Proven at scale across global voice intelligence applications
Applications & use cases
Real-Time Voice Agents:
• AI customer service and support with sub-250ms response times
• Interactive voice response (IVR) systems requiring natural speech
• Live conversational AI assistants and chatbots
Multilingual Voice Applications:
• Global contact centers with 40+ language support
• Voice cloning for localized brand voices across markets
• Smart hardware with voice interaction capabilities
Enterprise Voice Infrastructure:
• Voice agent platforms requiring reliable, scalable TTS
• Integration with LLM reasoning for end-to-end voice pipelines
• Production deployments on dedicated Together AI infrastructure with consistent performance and isolated workloads
- Model providerMinimax AI
- TypeAudio
- Main use casesText-to-Speech
- DeploymentOn-Demand DedicatedMonthly Reserved
- Input modalitiesText
- Output modalitiesAudio
- CategoryAudio