Models / Minimax AI
Audio

MiniMax Speech 2.6 Turbo

Ultra-low latency TTS for production voice agents on Together AI.

About model

Production-Ready Voice Intelligence: MiniMax Speech 2.6 Turbo is an enterprise-grade text-to-speech model optimized for real-time voice agent scenarios. With under 250ms end-to-end latency and advanced format handling, it powers voice interactions for platforms like LiveKit, Pipecat, and Vapi. Deploy on Together AI dedicated endpoints for reliable, scalable voice infrastructure that integrates seamlessly with your LLM workloads.

Dedicated Deployment: Available on on-demand dedicated and monthly reserved capacity on Together AI, ensuring consistent performance for production voice applications without infrastructure overhead.

  • Model card

    Model Architecture:
    • Advanced speech synthesis model trained for natural, human-like voice quality
    • Optimized audio generation pipeline achieving industry-leading sub-250ms end-to-end latency
    • Supports 40+ languages with native format handling and prosodic naturalness

    Advanced Capabilities:
    • Seamless handling of specialized formats: URLs, email addresses, phone numbers, dates, monetary amounts, and IP addresses across multiple languages without preprocessing
    • Fluent LoRA technology: high-fidelity voice cloning that transforms non-fluent or accented recordings into fluent, natural speech while preserving timbre
    • Direct integration with large language models - no text preprocessing required for dynamic entity information

    Production Performance:
    • Powers voice infrastructure for major platforms including LiveKit (ChatGPT's advanced voice mode), Pipecat, and Vapi
    • Deployed in smart hardware products: Haivivi Bubble Pal, Fuzozo, and Rokid Glasses
    • Proven at scale across global voice intelligence applications

  • Applications & use cases

    Real-Time Voice Agents:
    • AI customer service and support with sub-250ms response times
    • Interactive voice response (IVR) systems requiring natural speech
    • Live conversational AI assistants and chatbots

    Multilingual Voice Applications:
    • Global contact centers with 40+ language support
    • Voice cloning for localized brand voices across markets
    • Smart hardware with voice interaction capabilities

    Enterprise Voice Infrastructure:
    • Voice agent platforms requiring reliable, scalable TTS
    • Integration with LLM reasoning for end-to-end voice pipelines
    • Production deployments on dedicated Together AI infrastructure with consistent performance and isolated workloads

Related models
  • Model provider
    Minimax AI
  • Type
    Audio
  • Main use cases
    Text-to-Speech
  • Deployment
    On-Demand Dedicated
    Monthly Reserved
  • Input modalities
    Text
  • Output modalities
    Audio
  • Category
    Audio