Arcana V3 Turbo
Real-time bilingual TTS with native code-switching for production voice agents.
About model
Rime Arcana V3 supports 11 languages with native code-switching that preserves prosody and accent consistency across language boundaries. When customers switch from French to English for technical terms, then back to French for clarification, V3 handles these transitions while maintaining natural cadence and emphasis—so the conversation doesn't sound stitched together. Teams consolidate what used to require separate models or vendors per language into a single endpoint serving multilingual customers, with co-located deployment alongside LLM and STT workloads on Together AI's unified infrastructure.
11
Native code-switching support
1
Consolidates multilingual infrastructure
99.9%
Production-ready infrastructure
- Multilingual Code-Switching: 11 languages with natural transitions preserving prosody and accent consistency across language boundaries
- Single Model Deployment: Consolidate infrastructure previously requiring separate models or vendors per language into unified endpoint
- Natural Prosody: Transitions between languages preserve cadence and emphasis rather than sounding mechanical or stitched together
- Unified Infrastructure: Co-located with LLM and STT on Together AI—track performance across all languages from single dashboard
Model card
Architecture Overview:
• High-performance TTS optimized for real-time bilingual voice agents
• Native code-switching trained on bilingual speech patterns preserving prosody across language boundaries
• ~120ms time-to-first-audio on Together AI dedicated endpoints
• Efficient concurrency enabling higher GPU utilization for high-volume deployments
• WebSocket streaming support for real-time voice applications
• Co-located with LLM and STT workloads on unified infrastructure
Training Methodology:
• Trained on native bilingual speech patterns including pause placement and stress shifts at language boundaries
• Prosody optimization for English-Spanish code-switching within sentences
• Performance tuning for sub-200ms time-to-first-audio in production environments
• Emphasis and cadence modeling matching natural bilingual speaker behavior
Key Capabilities:
• English-Spanish Code-Switching: Native transitions with consistent prosody when callers switch languages mid-sentence
• Real-Time Performance: ~120ms time-to-first-audio leaves headroom for full voice pipeline processing
• Production Efficiency: Higher concurrency per GPU reduces infrastructure costs for high-volume deployments
• Natural Prosody: Pauses and emphasis match how bilingual speakers actually talk, not mechanical language switching
• Infrastructure Integration: Same API, authentication, and observability as LLM and STT endpoints
Applications & use cases
Bilingual Voice Agents:
• Customer service agents handling English-Spanish code-switching in real-time conversations
• Contact centers in bilingual metro markets where callers naturally mix languages
• Voice assistants for bilingual communities maintaining natural speech patterns
• Automated phone systems responding to code-switched queries without latency spikes
• Reduces transfers to human agents by handling natural language mixing
Regulated Services in Bilingual Markets:
• Banking and financial services serving bilingual customer bases
• Healthcare providers handling mixed-language symptom descriptions
• Government services in bilingual jurisdictions maintaining accessibility
• Insurance claims processing with natural code-switching support
• Single compliance review covering LLM, STT, and TTS on unified infrastructure
High-Volume Contact Centers:
• Enterprise contact centers handling thousands of concurrent bilingual calls
• Customer support for brands serving English-Spanish markets at scale
• Appointment scheduling and confirmation systems in bilingual regions
• Order management and tracking for bilingual customer bases
• Efficient GPU utilization reducing total cost of ownership at production scale
Real-Time Voice Applications:
• Voice agents requiring sub-700ms end-to-end latency
• Interactive voice response (IVR) systems with natural bilingual flow
• Voice assistants co-located with LLM reasoning and STT processing
• Conversational AI maintaining natural cadence across language switches
• WebSocket-based streaming for low-latency voice synthesis
Multilingual Business Operations:
• International business operations in US Hispanic markets
• Cross-border commerce serving English and Spanish speakers
• Tourism and hospitality voice agents in bilingual destinations
• Educational platforms with bilingual voice instruction
• Technical support handling code-switched terminology
- Model providerRime
- TypeAudio
- Main use casesText-to-Speech
- DeploymentOn-Demand DedicatedMonthly Reserved
- Input modalitiesText
- Output modalitiesAudio
- ReleasedFebruary 3, 2026
- External link
- CategoryAudio