Audio

Mist v3 Omni

Multilingual TTS with deterministic pronunciation across four languages

About model

Rime Mist v3 Omni is the multilingual variant of Rime's Mist v3 text-to-speech model, supporting English, Spanish, French, and German with deterministic pronunciation control across all four languages. Built on the same phoneme-first architecture, it delivers consistent pronunciation of brand names, medical terms, and domain vocabulary across languages without model retraining. The model features an updated inference engine for high-throughput concurrent requests, SSML support, and consolidates multilingual TTS into a single model.

Languages

English; Spanish; French; German with deterministic pronunciation

Pronunciation

Deterministic

Same input produces same phonetic output across all languages

Control

SSML

Controllable pauses and inline speed adjustment

Model key capabilities

Multilingual Deterministic Pronunciation: Phoneme-first architecture ensuring consistent pronunciation across English, Spanish, French, and German
Single Model Multilingual: Consolidate four-language TTS into one model without language-specific routing or separate infrastructure
Pronunciation Control: Correct brand names, medical terms, and domain vocabulary across all languages in minutes without model retraining
SSML Support: Controllable pauses and inline speed adjustment for fine-grained voice output control in all supported languages

Model card
Architecture Overview:
• Multilingual variant of the Mist v3 inference engine supporting English, Spanish, French, and German
• Phoneme-first architecture delivering deterministic pronunciation across all four languages
• Optimized for high-throughput concurrent requests in multilingual production environments
• SSML features including controllable pauses and inline speed adjustment
• Same voice catalog compatibility with multilingual coverage

Training Methodology:
• Trained on real conversational data across English, Spanish, French, and German
• Optimized for natural prosody and pronunciation accuracy in each supported language
• Robust text normalization layer providing deterministic control over domain-specific vocabulary across languages

Performance Characteristics:
• Deterministic pronunciation across all four languages: define a word once and it renders consistently
• High-throughput concurrent request handling for multilingual enterprise deployments
• Pronunciation corrections applied in minutes without model retraining across any supported language
• SSML support for fine-grained control over pauses, pacing, and speed in all languages
‍
Prompting
Together AI API Access:
• Access Rime Mist v3 Omni via Together AI APIs using the endpoint rime-labs/rime-mist-v3-omni
• Authenticate using your Together AI API key in request headers
• Supports English, Spanish, French, and German with language selection via API
• Use custom pronunciation by wrapping words in curly brackets with phonemizeBetweenBrackets enabled
• SSML support for controllable pauses and inline speed adjustment
• Available on Together AI dedicated infrastructure co-located with LLM and STT workloads
‍
Applications & use cases
Global Contact Centers:
• Multilingual voice agent deployments across English, Spanish, French, and German markets
• Consistent pronunciation control across all four languages for brand and product terms
• Single model consolidating multilingual TTS infrastructure without language-specific routing

Healthcare Voice Agents:
• Medical terminology pronounced correctly across all supported languages
• Co-located with LLM and STT on Together AI HIPAA-ready infrastructure
• Deterministic pronunciation for patient-facing interactions in multilingual healthcare environments

Financial Services & Retail:
• Account identifiers, product names, and financial terms read consistently across languages
• Compliance-grade multilingual voice output on SOC 2, PCI compliant infrastructure
• Serve European and Latin American markets from a single model and voice pipeline
‍

Related models

Model specifications

Model data

Model provider
Rime
Type
Audio
Main use cases
Text-to-Speech
Deployment
On-Demand Dedicated
Price
$10 / 1M characters + GPU hourly (by hardware)
Input modalities
Text
Output modalities
Audio

Category
Audio

Quickstart docs

Deploy model

Mist v3 Omni

About model

Model card

Prompting

Applications & use cases