Model Library

Published 12/23/2025

MiniMax Speech 2.6 Turbo now available natively on Together AI

State-of-the-art multilingual TTS with human-level, emotionally aware voices in 40+ languages and real-time latency on dedicated, production-grade infrastructure.

Summary

  • MiniMax Speech 2.6 Turbo on Together AI: Top-ranked on Artificial Analysis Arena, available on dedicated infrastructure only on Together AI
  • Sub-250ms latency, 40-plus languages with streaming inline switching, 10-second voice cloning, automatic emotional awareness
  • Expands elite proprietary TTS models on Together AI alongside Cartesia and Rime models
  • Dedicated GPU endpoints co-located with LLM and STT workloads

Building a real time voice agent usually forces an ugly choice: ship a voice that sounds convincingly human, or ship a voice that responds instantly and holds up in production. Most teams split the difference with a patchwork of providers: one for showcase experiences, another for low latency turns, and others for cloning or global language coverage. Over time that patchwork becomes the product. Behavior diverges by market, latency and quality drift, and "upgrade the voice" turns into a cross vendor infrastructure project instead of a product decision.

Starting today, Together AI, the AI Native Cloud, is the only platform where you can run MiniMax Speech 2.6 Turbo on dedicated infrastructure alongside your LLM and STT workloads, so naturalness and speed live on one platform instead of being traded off across vendors. MiniMax Speech 2.6 Turbo is benchmarked at the top of public TTS leaderboards, built by the team behind Talkie (150 million users with 90+ minute average sessions), and trained for real conversational interaction rather than read-aloud narration. Requests run on Together AI infrastructure with zero data retention, SOC 2 Type II and HIPAA support, and data residency options. You get a single production surface for streaming delivery, capacity, and debugging with one API, one auth, and unified metrics, so conversational latency becomes an infrastructure guarantee rather than an integration tax.

MiniMax multilingual

English to Japanese to Spanish streaming language switching

0:00
"Welcome to our service. Our AI seamlessly bridges the gap between cultures in real-time. 日本語でもサポートできます。言葉の壁を越えて、世界中の人々と自然につながることができます。También ofrecemos soporte en español. Porque creemos que la comunicación global debe ser así de simple."

Why naturalness drives engagement

MiniMax Speech 2.6 Turbo ranks at the top of Artificial Analysis Arena in blind human evaluation. The model is trained on Talkie conversation data, where 150 million users chose to engage with AI voice for sessions averaging more than 90 minutes. Instead of learning from audiobook and podcast narration, MiniMax Speech 2.6 Turbo learned from real dialogue, which produces different prosody, pacing, and emotional range.

Teams building AI native voice products choose models where voice quality directly drives completion rates. A customer service agent can have correct intent recognition and strong LLM reasoning, but synthetic delivery still causes users to drop. MiniMax Speech 2.6 Turbo is now available on Together AI with performance isolation and reliability tuned for production workloads at scale.

Technical capabilities

40-plus languages with streaming inline switching

Native-quality speech across major global languages with streaming inline language switching. English, Japanese, Spanish, Mandarin, French, German mid-sentence with authentic accents. The model detects language boundaries and switches with native pronunciation in real time.

Automatic emotional awareness

The model analyzes semantic context and adapts prosody. When your LLM outputs apologetic language, MiniMax adjusts to empathetic delivery. Upbeat greetings sound upbeat. Serious warnings sound serious. This happens automatically across all 40-plus languages without prompt engineering or markup.

MiniMax emotional awareness

Same phrase in empathetic, upbeat, and serious tones

0:00
"Empathetic: "I understand. I'm sorry to hear you're experiencing this issue." Upbeat: "I understand! Great question, let me help with that." Serious: "I understand. This is a critical security matter.""

10-second voice cloning

Clone a voice from a 10-second audio sample. That voice speaks 40-plus languages with native accents. The model handles imperfect recordings—background noise, accent, disfluency—and produces fluent output while preserving unique timbre. Create a branded voice for your application and deploy it globally through Together AI. Professional voice cloning services available through Sales.

MiniMax voice original

10-second original sample

0:00
"A specific, you know, a specific piece of information or some event or something on their website or something that they know, hey, when they have this information, they have a much higher propensity to need our products."
MiniMax voice cloning

Multilingual output generated using a samples

0:00
"Now, I am speaking with that exact same voice, created from just ten seconds of audio. 甚至可以用这个声音说中文,音色和说话习惯都完美保留了下来。 Et maintenant, écoutez ma voix en français. Remarquez la fluidité de la prononciation, qui reste fidèle à mon timbre original."

Sub-250ms latency on Together AI infrastructure

MiniMax achieves sub-250ms latency on Together AI dedicated endpoints. When TTS runs alongside LLM and STT workloads on the same infrastructure, you eliminate cross-vendor network overhead. The complete pipeline from speech recognition through reasoning to synthesis stays fast enough for real-time conversation.

Automatic format handling

URLs, email addresses, phone numbers, dates, and currency amounts convert without preprocessing. Works with LLM output directly without building text normalization pipelines.

Use cases

Customer service agents

Deploy voice agents on Together AI where naturalness determines whether customers complete calls. MiniMax Speech 2.6 Turbo voice quality reduces hang-ups from synthetic detection. Automatic emotional awareness means your LLM focuses on reasoning rather than tone management. Streaming multilingual support means one deployment handles customers switching between English, Spanish, and Japanese.

Content generation at scale

Audiobooks, e-learning courses, and podcast narration where voice quality determines completion rates. Talkie's 90-plus minute average sessions demonstrate that MiniMax Speech 2.6 Turbo voices hold attention. 10-second voice cloning means one narrator voice scales across 40-plus languages with native pronunciation. Deploy content generation workloads on Together AI infrastructure with the same reliability and observability as your other AI workloads.

Interactive entertainment

Character voices for games, interactive fiction, and virtual companions. MiniMax Speech 2.6 Turbo delivered the expressiveness that made Talkie successful. Automatic emotional intelligence means characters respond naturally to conversation context. 10-second cloning enables rapid prototyping of character voices. Deploy gaming voice infrastructure on Together AI dedicated endpoints for guaranteed performance during traffic spikes.

MiniMax gaming character voices

Emotional range from angry to cautious to warm

0:00
"You dare challenge me? Stop right there!One more step and you will regret crossing me! Wait... Perhaps... is that the ancient amulet? I haven't seen that symbol in centuries... Welcome, my friend! Oh, you are one of the chosen ones! Please, come in, let us share a drink!"

Multilingual applications

Applications serving global user bases need consistent voice quality across languages. MiniMax Speech 2.6 Turbo handles 40-plus languages with streaming inline switching on Together AI infrastructure. Build one voice stack that handles multilingual users without separate deployments or vendor fragmentation. The same dedicated endpoints, observability, and reliability across all languages.

Production infrastructure on Together AI for TTS Models

Together AI offers TTS models across different performance and cost profiles:

  • Open-source such as Orpheus and Kokoro: Cost-efficient, high-volume deployment
  • Enterprise proprietary such as Rime Arcana v2: Deterministic pronunciation, 40-plus voices, trained on 1B-plus conversations
  • Elite naturalness such as MiniMax Speech 2.6 Turbo: Top-ranked on Artificial Analysis Arena, 40-plus languages, automatic emotion control, 10-second voice cloning

MiniMax Speech 2.6 Turbo runs only on Together AI dedicated endpoints—isolated GPU capacity with 99.9 percent uptime SLA supporting over one million developers' production workloads.

Infrastructure

  • ✔ Dedicated GPU capacity with isolated workloads

  • ✔ 99.9% uptime SLA

  • ✔ SOC 2 Type II, HIPAA ready, PCI compliant

  • ✔ Global data centers

  • ✔ WebSocket streaming support

  • ✔ Zero data retention and full data ownership and control

Developer experience

  • ✔ Same SDKs and authentication as LLM and STT endpoints

  • ✔ Unified pronunciation API across Arcana v2 and Mist v2

  • ✔ Single observability and logging surface for entire voice pipeline

  • ✔ Model selection and swapping via configuration

  • ✔ Professional voice cloning services available

  • ✔ Batch processing for high-volume workflows

Get started

Try the model now

→ Read TTS Documentation

Contact Sales for enterprise dedicated endpoint deployment and volume pricing