Models / Rime
Audio

Arcana v2

Expressive text-to-speech with extensive voice library and multi-lingual support

About model

Arcana v2 is an expressive text-to-speech model featuring 300+ voices with 35 flagship options across multiple languages. With support for multi-lingual code-switching and paralinguistic features like false starts and breathwork, it enables developers to create natural, engaging voice experiences for content, agents, and entertainment applications.
Voices

300+

35 flagship options

Languages

5

EN, ES, FR, DE + bilingual

Switching

Code

Mid-sentence language mixing

Model key capabilities
  • Extensive Voice Library: 300+ voices with diverse accents, ages, and styles
  • Multi-Lingual Code-Switching: Seamless Spanglish, Franglais, Denglish
  • Expressive Speech: False starts, breathwork, vocal nuances
  • 5 Languages: English, Spanish, French, German, bilingual combinations
  • Model card

    Architecture Overview:
    • Autoregressive TTS model with discrete audio tokenization and high-resolution codec.
    • Large language model backbone trained on text and conversational audio data.
    • 300+ voice library: 18 English, 4 Spanish, 3 bilingual English/Spanish, 5 French, 5 German flagship voices.
    • Multi-lingual code-switching enables seamless mid-sentence language transitions.

    Training Methodology:
    • Three-stage training: pre-training, conversational fine-tuning, speaker optimization.
    • Trained on large-scale conversational speech with sociolinguistic annotations.
    • Captures paralinguistic features: false starts, breathwork, glottal stops, vocal fry.
    • Multi-lingual training for code-switching between English, Spanish, French, German.

    Performance Characteristics:
    • 300+ voices with extensive accent, age, and stylistic diversity for varied applications.
    • Paralinguistic features (false starts, breathwork, pauses) create expressive, natural speech.
    • Multi-lingual code-switching supports Spanglish, Franglais, Denglish without interruption.
    • Faster-than-real-time synthesis with natural rhythm and emotional range.

  • Applications & use cases

    Content Production:
    • Audiobook generation with expressive narration and character voices.
    • Podcast creation with natural conversational delivery and multiple speakers.
    • E-learning course voiceovers with clear, engaging presentation.
    • YouTube video narration and explainer content.

    Conversational AI:
    • Voice agents requiring expressive speech and emotional range.
    • Customer service bots with natural personality and varied voice options.
    • Interactive storytelling and narrative experiences.

    Media & Entertainment:
    • Character voices for games, animations, and interactive fiction.
    • Voice acting for indie game development and virtual productions.
    • Voiceover for commercials, trailers, and promotional content.

    Multi-Lingual Applications:
    • Bilingual content creation with code-switching (Spanglish, Franglais, Denglish).
    • Language learning apps with authentic pronunciation and natural speech.
    • International content localization with native-sounding voices.

    Accessibility:
    • Screen readers with high-quality, natural voice output.
    • Text-to-speech for visually impaired users with expressive delivery.
    • Assistive technology requiring diverse voice options and languages.

Related models
  • Model provider
    Rime
  • Type
    Audio
  • Main use cases
    Text-to-Speech
  • Deployment
    On-Demand Dedicated
    Monthly Reserved
  • Input modalities
    Text
  • Output modalities
    Audio