Audio

Mist v3

English TTS with deterministic pronunciation for enterprise voice agents

About model

Rime Mist v3 is Rime's English text-to-speech model with a phoneme-first architecture delivering deterministic pronunciation control. The same input always produces the same phonetic output across all voices and calls, with pronunciation corrections applied in minutes without model retraining. Mist v3 features an updated inference engine optimized for high-throughput concurrent requests, SSML support for controllable pauses and speed adjustment, and full backwards compatibility with Mist v2 voices.

Cupola

Professional Health

0:00

"Okay, cool, cool, gotcha. So first things first, let's get your date of birth and then we can get you set right up with an appointment."

Vespera

Casual Finance

0:00

"Oh, yeah, believe me, I definitely understand how daunting this finance stuff can be. But, you know, I'm here for you and we'll work through it together."

Eliphas

Calm Telecom

0:00

"Okay, so now the modem should be showing a blinking yellow light. Is that what you're seeing?"

Pronunciation

Deterministic

Same input produces same phonetic output every time

Language

English

Optimized for enterprise voice deployments

Control

SSML

Controllable pauses and inline speed adjustment

Model key capabilities

Deterministic Pronunciation: Phoneme-first architecture ensuring the same input always produces the same phonetic output across all voices and calls
Pronunciation Control: Correct brand names, medical terms, and domain vocabulary in minutes without model retraining via custom phoneme mapping
High Throughput: Updated inference engine optimized for concurrent request handling at enterprise contact center volumes
SSML Support: Controllable pauses and inline speed adjustment for fine-grained voice output control

Model card
Architecture Overview:
• Updated inference engine for the Mist text-to-speech model, optimized for high-throughput concurrent requests
• Phoneme-first architecture delivering deterministic pronunciation: the same input always produces the same phonetic output
• English language support
• SSML features including controllable pauses and inline speed adjustment
• Same voice catalog as Mist v2 with full backwards compatibility

Training Methodology:
• Trained on real customer service conversations for natural pacing, rhythm, and conversational cadence
• Optimized for clarity and pronunciation accuracy in production voice environments
• Robust text normalization layer providing deterministic control over brand names, medical terms, and domain-specific vocabulary

Performance Characteristics:
• Deterministic pronunciation: define a word once and it renders consistently across all voices and calls
• High-throughput concurrent request handling for enterprise contact center volumes
• Pronunciation corrections applied in minutes without model retraining
• SSML support for fine-grained control over pauses, pacing, and speed
‍
Prompting
Together AI API Access:
• Access Rime Mist v3 via Together AI APIs using the endpoint rime-labs/rime-mist-v3
• Authenticate using your Together AI API key in request headers
• Use custom pronunciation by wrapping words in curly brackets with phonemizeBetweenBrackets enabled
• SSML support for controllable pauses and inline speed adjustment
• Available on Together AI dedicated infrastructure co-located with LLM and STT workloads
‍
Applications & use cases
Enterprise Contact Centers:
• High-volume voice agent deployments requiring consistent pronunciation across millions of calls
• Customer support automation with natural conversational cadence
• IVR modernization with deterministic pronunciation control

Healthcare Voice Agents:
• Medication names, medical terms, and clinical vocabulary pronounced correctly every time
• Co-located with LLM and STT on Together AI HIPAA-ready infrastructure
• Deterministic pronunciation eliminates mispronunciation risk in patient interactions

Financial Services:
• Account numbers, routing numbers, and financial product names read clearly and consistently
• Compliance-grade voice output on SOC 2, PCI compliant infrastructure
• Brand name and proprietary term pronunciation locked across all voice channels
‍

Related models

Model specifications

Model data

Model provider
Rime
Type
Audio
Main use cases
Text-to-Speech
Deployment
On-Demand Dedicated
Price
$10 / 1M characters + GPU hourly (by hardware)
Input modalities
Text
Output modalities
Audio

Category
Audio

Quickstart docs

Deploy model

Mist v3

About model

Model card

Prompting

Applications & use cases