Models / Rime
Audio

Mist v3 Omni

Multilingual TTS with deterministic pronunciation across four languages

About model

Rime Mist v3 Omni is the multilingual variant of Rime's Mist v3 text-to-speech model, supporting English, Spanish, French, and German with deterministic pronunciation control across all four languages. Built on the same phoneme-first architecture, it delivers consistent pronunciation of brand names, medical terms, and domain vocabulary across languages without model retraining. The model features an updated inference engine for high-throughput concurrent requests, SSML support, and consolidates multilingual TTS into a single model.

Languages

4

English; Spanish; French; German with deterministic pronunciation

Pronunciation

Deterministic

Same input produces same phonetic output across all languages

Control

SSML

Controllable pauses and inline speed adjustment

Model key capabilities
  • Multilingual Deterministic Pronunciation: Phoneme-first architecture ensuring consistent pronunciation across English, Spanish, French, and German
  • Single Model Multilingual: Consolidate four-language TTS into one model without language-specific routing or separate infrastructure
  • Pronunciation Control: Correct brand names, medical terms, and domain vocabulary across all languages in minutes without model retraining
  • SSML Support: Controllable pauses and inline speed adjustment for fine-grained voice output control in all supported languages
  • Model card

    Architecture Overview:
    • Multilingual variant of the Mist v3 inference engine supporting English, Spanish, French, and German
    • Phoneme-first architecture delivering deterministic pronunciation across all four languages
    • Optimized for high-throughput concurrent requests in multilingual production environments
    • SSML features including controllable pauses and inline speed adjustment
    • Same voice catalog compatibility with multilingual coverage

    Training Methodology:
    • Trained on real conversational data across English, Spanish, French, and German
    • Optimized for natural prosody and pronunciation accuracy in each supported language
    • Robust text normalization layer providing deterministic control over domain-specific vocabulary across languages

    Performance Characteristics:
    • Deterministic pronunciation across all four languages: define a word once and it renders consistently
    • High-throughput concurrent request handling for multilingual enterprise deployments
    • Pronunciation corrections applied in minutes without model retraining across any supported language
    • SSML support for fine-grained control over pauses, pacing, and speed in all languages

  • Prompting

    Together AI API Access:
    • Access Rime Mist v3 Omni via Together AI APIs using the endpoint rime-labs/rime-mist-v3-omni
    • Authenticate using your Together AI API key in request headers
    • Supports English, Spanish, French, and German with language selection via API
    • Use custom pronunciation by wrapping words in curly brackets with phonemizeBetweenBrackets enabled
    • SSML support for controllable pauses and inline speed adjustment
    • Available on Together AI dedicated infrastructure co-located with LLM and STT workloads

  • Applications & use cases

    Global Contact Centers:
    • Multilingual voice agent deployments across English, Spanish, French, and German markets
    • Consistent pronunciation control across all four languages for brand and product terms
    • Single model consolidating multilingual TTS infrastructure without language-specific routing

    Healthcare Voice Agents:
    • Medical terminology pronounced correctly across all supported languages
    • Co-located with LLM and STT on Together AI HIPAA-ready infrastructure
    • Deterministic pronunciation for patient-facing interactions in multilingual healthcare environments

    Financial Services & Retail:
    • Account identifiers, product names, and financial terms read consistently across languages
    • Compliance-grade multilingual voice output on SOC 2, PCI compliant infrastructure
    • Serve European and Latin American markets from a single model and voice pipeline

Related models
  • Model provider
    Rime
  • Type
    Audio
  • Main use cases
    Text-to-Speech
  • Deployment
    On-Demand Dedicated
  • Price

    $10 / 1M characters + GPU hourly (by hardware)

  • Input modalities
    Text
  • Output modalities
    Audio
  • Category
    Audio