Audio

Arcana v2

Expressive text-to-speech with extensive voice library and multi-lingual support

About model

Arcana v2 is an expressive text-to-speech model featuring 300+ voices with 35 flagship options across multiple languages. With support for multi-lingual code-switching and paralinguistic features like false starts and breathwork, it enables developers to create natural, engaging voice experiences for content, agents, and entertainment applications.

Voices

300+

35 flagship options

Languages

EN, ES, FR, DE + bilingual

Switching

Code

Mid-sentence language mixing

Model key capabilities

Extensive Voice Library: 300+ voices with diverse accents, ages, and styles
Multi-Lingual Code-Switching: Seamless Spanglish, Franglais, Denglish
Expressive Speech: False starts, breathwork, vocal nuances
5 Languages: English, Spanish, French, German, bilingual combinations

Quickstart guides

Audio

Open NotebookLM: PDF to Podcast

Model card
Architecture Overview:
• Autoregressive TTS model with discrete audio tokenization and high-resolution codec.
• Large language model backbone trained on text and conversational audio data.
• 300+ voice library: 18 English, 4 Spanish, 3 bilingual English/Spanish, 5 French, 5 German flagship voices.
• Multi-lingual code-switching enables seamless mid-sentence language transitions.

Training Methodology:
• Three-stage training: pre-training, conversational fine-tuning, speaker optimization.
• Trained on large-scale conversational speech with sociolinguistic annotations.
• Captures paralinguistic features: false starts, breathwork, glottal stops, vocal fry.
• Multi-lingual training for code-switching between English, Spanish, French, German.

Performance Characteristics:
• 300+ voices with extensive accent, age, and stylistic diversity for varied applications.
• Paralinguistic features (false starts, breathwork, pauses) create expressive, natural speech.
• Multi-lingual code-switching supports Spanglish, Franglais, Denglish without interruption.
• Faster-than-real-time synthesis with natural rhythm and emotional range.
‍
Applications & use cases
Content Production:
• Audiobook generation with expressive narration and character voices.
• Podcast creation with natural conversational delivery and multiple speakers.
• E-learning course voiceovers with clear, engaging presentation.
• YouTube video narration and explainer content.

Conversational AI:
• Voice agents requiring expressive speech and emotional range.
• Customer service bots with natural personality and varied voice options.
• Interactive storytelling and narrative experiences.

Media & Entertainment:
• Character voices for games, animations, and interactive fiction.
• Voice acting for indie game development and virtual productions.
• Voiceover for commercials, trailers, and promotional content.

Multi-Lingual Applications:
• Bilingual content creation with code-switching (Spanglish, Franglais, Denglish).
• Language learning apps with authentic pronunciation and natural speech.
• International content localization with native-sounding voices.

Accessibility:
• Screen readers with high-quality, natural voice output.
• Text-to-speech for visually impaired users with expressive delivery.
• Assistive technology requiring diverse voice options and languages.
‍

Related models

Model specifications

Model data

Model provider
Rime
Type
Audio
Main use cases
Text-to-Speech
Deployment
Dedicated
Price
$10 / 1M characters + GPU hourly (by hardware)
Input modalities
Text
Output modalities
Audio

Released
August 19, 2025
External link
Provider docs
Category
Audio

Quickstart docs

Deploy model

Arcana v2

About model

Model card

Applications & use cases