Model Library

MiniMax Speech 2.6 Turbo now available natively on Together AI

State-of-the-art multilingual TTS with human-level, emotionally aware voices in 40+ languages and real-time latency on dedicated, production-grade infrastructure.

December 23, 2025

By 

Arielle Fidel, Rajas Bansal, Sahil Yadav, Rishabh Bhargava, Sonny Khan

Summary

  • MiniMax Speech 2.6 Turbo on Together AI: Top-ranked on Artificial Analysis Arena, available on dedicated infrastructure only on Together AI
  • Sub-250ms latency, 40-plus languages with streaming inline switching, 10-second voice cloning, automatic emotional awareness
  • Expands elite proprietary TTS models on Together AI alongside Cartesia and Rime models
  • Dedicated GPU endpoints co-located with LLM and STT workloads

Building a real time voice agent usually forces an ugly choice: ship a voice that sounds convincingly human, or ship a voice that responds instantly and holds up in production. Most teams split the difference with a patchwork of providers: one for showcase experiences, another for low latency turns, and others for cloning or global language coverage. Over time that patchwork becomes the product. Behavior diverges by market, latency and quality drift, and “upgrade the voice” turns into a cross vendor infrastructure project instead of a product decision.

Starting today, Together AI, the AI Native Cloud, is the only platform where you can run MiniMax Speech 2.6 Turbo on dedicated infrastructure alongside your LLM and STT workloads, so naturalness and speed live on one platform instead of being traded off across vendors. MiniMax Speech 2.6 Turbo is benchmarked at the top of public TTS leaderboards, built by the team behind Talkie (150 million users with 90+ minute average sessions), and trained for real conversational interaction rather than read-aloud narration. Requests run on Together AI infrastructure with zero data retention, SOC 2 Type II and HIPAA support, and data residency options. You get a single production surface for streaming delivery, capacity, and debugging with one API, one auth, and unified metrics, so conversational latency becomes an infrastructure guarantee rather than an integration tax.

MiniMax multilingual
English to Japanese to Spanish streaming language switching
0:00
"Welcome to our service. Our AI seamlessly bridges the gap between cultures in real-time. 日本語でもサポートできます。言葉の壁を越えて、世界中の人々と自然につながることができます。También ofrecemos soporte en español. Porque creemos que la comunicación global debe ser así de simple."
Try now

Why naturalness drives engagement

MiniMax Speech 2.6 Turbo ranks at the top of Artificial Analysis Arena in blind human evaluation. The model is trained on Talkie conversation data, where 150 million users chose to engage with AI voice for sessions averaging more than 90 minutes. Instead of learning from audiobook and podcast narration, MiniMax Speech 2.6 Turbo learned from real dialogue, which produces different prosody, pacing, and emotional range.

Teams building AI native voice products choose models where voice quality directly drives completion rates. A customer service agent can have correct intent recognition and strong LLM reasoning, but synthetic delivery still causes users to drop. MiniMax Speech 2.6 Turbo is now available on Together AI with performance isolation and reliability tuned for production workloads at scale.


Technical capabilities

40-plus languages with streaming inline switching

Native-quality speech across major global languages with streaming inline language switching. English, Japanese, Spanish, Mandarin, French, German mid-sentence with authentic accents. The model detects language boundaries and switches with native pronunciation in real time.

Automatic emotional awareness

The model analyzes semantic context and adapts prosody. When your LLM outputs apologetic language, MiniMax adjusts to empathetic delivery. Upbeat greetings sound upbeat. Serious warnings sound serious. This happens automatically across all 40-plus languages without prompt engineering or markup.

MiniMax emotional awareness
Same phrase in empathetic, upbeat, and serious tones
0:00
Empathetic: “I understand. I’m sorry to hear you’re experiencing this issue.”
Upbeat: “I understand! Great question, let me help with that.”
Serious: “I understand. This is a critical security matter.”
Try now

10-second voice cloning

Clone a voice from a 10-second audio sample. That voice speaks 40-plus languages with native accents. The model handles imperfect recordings—background noise, accent, disfluency—and produces fluent output while preserving unique timbre. Create a branded voice for your application and deploy it globally through Together AI. Professional voice cloning services available through Sales.

MiniMax voice original
10-second original sample
0:00
"A specific, you know, a specific piece of information or some event or something on their website or something that they know, hey, when they have this information, they have a much higher propensity to need our products."
MiniMax voice cloning
Multilingual output generated using a sample
0:00
"Now, I am speaking with that exact same voice, created from just ten seconds of audio. 甚至可以用这个声音说中文,音色和说话习惯都完美保留了下来。 Et maintenant, écoutez ma voix en français. Remarquez la fluidité de la prononciation, qui reste fidèle à mon timbre original."
Try now

Sub-250ms latency on Together AI infrastructure

MiniMax achieves sub-250ms latency on Together AI dedicated endpoints. When TTS runs alongside LLM and STT workloads on the same infrastructure, you eliminate cross-vendor network overhead. The complete pipeline from speech recognition through reasoning to synthesis stays fast enough for real-time conversation.

Automatic format handling

URLs, email addresses, phone numbers, dates, and currency amounts convert without preprocessing. Works with LLM output directly without building text normalization pipelines.


Use cases

Customer service agents

Deploy voice agents on Together AI where naturalness determines whether customers complete calls. MiniMax Speech 2.6 Turbo voice quality reduces hang-ups from synthetic detection. Automatic emotional awareness means your LLM focuses on reasoning rather than tone management. Streaming multilingual support means one deployment handles customers switching between English, Spanish, and Japanese.

Content generation at scale

Audiobooks, e-learning courses, and podcast narration where voice quality determines completion rates. Talkie's 90-plus minute average sessions demonstrate that MiniMax Speech 2.6 Turbo voices hold attention. 10-second voice cloning means one narrator voice scales across 40-plus languages with native pronunciation. Deploy content generation workloads on Together AI infrastructure with the same reliability and observability as your other AI workloads.

Interactive entertainment

Character voices for games, interactive fiction, and virtual companions. MiniMax Speech 2.6 Turbo delivered the expressiveness that made Talkie successful. Automatic emotional intelligence means characters respond naturally to conversation context. 10-second cloning enables rapid prototyping of character voices. Deploy gaming voice infrastructure on Together AI dedicated endpoints for guaranteed performance during traffic spikes.

MiniMax gaming character voices
Emotional range from angry to cautious to warm
0:00
"You dare challenge me? Stop right there!One more step and you will regret crossing me! Wait... Perhaps... is that the ancient amulet? I haven't seen that symbol in centuries... Welcome, my friend! Oh, you are one of the chosen ones! Please, come in, let us share a drink!"
Try now

Multilingual applications

Applications serving global user bases need consistent voice quality across languages. MiniMax Speech 2.6 Turbo handles 40-plus languages with streaming inline switching on Together AI infrastructure. Build one voice stack that handles multilingual users without separate deployments or vendor fragmentation. The same dedicated endpoints, observability, and reliability across all languages.


Production infrastructure on Together AI for TTS Models

Together AI offers TTS models across different performance and cost profiles:

  • Open-source such as Orpheus and Kokoro: Cost-efficient, high-volume deployment
  • Enterprise proprietary such as Rime Arcana v2: Deterministic pronunciation, 40-plus voices, trained on 1B-plus conversations
  • Elite naturalness such as MiniMax Speech 2.6 Turbo: Top-ranked on Artificial Analysis Arena, 40-plus languages, automatic emotion control, 10-second voice cloning

MiniMax Speech 2.6 Turbo runs only on Together AI dedicated endpoints—isolated GPU capacity with 99.9 percent uptime SLA supporting over one million developers' production workloads.

Infrastructure

  • ✔ Dedicated GPU capacity with isolated workloads

  • ✔ 99.9% uptime SLA

  • ✔ SOC 2 Type II, HIPAA ready, PCI compliant

  • ✔ Global data centers

  • ✔ WebSocket streaming support

  • ✔ Zero data retention and full data ownership and control

Developer experience

  • ✔ Same SDKs and authentication as LLM and STT endpoints

  • ✔ Unified pronunciation API across Arcana v2 and Mist v2

  • ✔ Single observability and logging surface for entire voice pipeline

  • ✔ Model selection and swapping via configuration

  • ✔ Professional voice cloning services available

  • ✔ Batch processing for high-volume workflows


Get started

Try the model now

→ Read TTS Documentation

Contact Sales for enterprise dedicated endpoint deployment and volume pricing

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out

LOREM IPSUM

Tag

Audio Name

Audio Description

0:00

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #2

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #3

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Grow

Benefits included:

  • ✔ Up to $30K in free platform credits*

  • ✔ 6 hours of free forward-deployed engineering time.

Funding: $5M-$10M

Scale

Benefits included:

  • ✔ Up to $50K in free platform credits*

  • ✔ 10 hours of free forward-deployed engineering time.

Funding: $10M-$25M

Multilinguality

Word limit

Disclaimer

JSON formatting

Uppercase only

Remove commas

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

  • A. 2.08*1e-1 m
  • B. 2.08*1e-9 m
  • C. 2.08*1e-6 m
  • D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

  • A. releasing nitrogen in the soil.
  • B. crowding out non-native species.
  • C. adding carbon dioxide to the atmosphere.
  • D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

Start
building
yours
here →