Deepgram Aura-2
Professional text-to-speech for real-time voice applications

About model
Aura-2 is DeepGram's text-to-speech model purpose-built for real-time voice applications in business environments. Delivering low latency, 40+ professional voices, and context-aware pronunciation across 7 languages, Aura-2 runs on the same infrastructure as Nova-3 STT for unified voice AI deployment.
Low
Ultra-responsive for real-time voice agents
40+
Business-optimized voices across 7 languages
Context-Aware
Domain-specific handling for medical, legal, financial terms
- Low Latency: Low time-to-first-byte latency supporting thousands of concurrent requests
- Professional Voices: 40+ voices optimized for business contexts across 7 languages
- Context-Aware Delivery: Intelligent pacing, pauses, and pronunciation based on content type
Model card
Architecture Overview:
• Text-to-speech architecture optimized for real-time voice applications
• 40+ professional voice personas purpose-built for business contexts
• Context-aware delivery adjusting pacing, pauses, tone, and expression based on content
• Powered by Deepgram Enterprise Runtime for model hot-swapping and flexible deployment
Training Methodology:
• Trained on conversational data across healthcare, customer support, and financial services
• Each voice developed with defined tonal profile aligned to business use cases
• Optimized for domain-specific pronunciation including medical terms, legal references, and alphanumerics
• Multi-language training across 7 languages with native phonological characteristics
Performance Characteristics:
• Low time-to-first-byte latency for real-time interactions
• Supports thousands of concurrent requests while maintaining low latency
• Context-aware pronunciation for dates, times, currency, phone numbers, and structured data
• Uniform volume and articulation throughout extended conversations
• 7-language support: English, Spanish, Dutch, French, German, Italian, Japanese with multiple accents
Applications & use cases
Voice Agent Applications:
• Customer support automation with professional voice delivery
• Virtual assistants for healthcare, banking, and insurance with domain-specific pronunciation
• Interactive voice response systems with context-aware pacing and tone
• Automated appointment scheduling and reminder systems
Business Communications:
• Internal notification systems with consistent professional voice
• Multilingual customer communications for global deployment
• Accessibility solutions for visually impaired users
• Training and e-learning platforms with clear voice narration
Regulated Industries:
• Medical applications with accurate pronunciation of drug names and medical terminology
• Legal services with proper handling of legal references and case citations
• Financial services for automated reporting and compliance communications
• Government applications requiring on-premises deployment for security
- Model providerDeepgram
- TypeAudio
- DeploymentOn-Demand Dedicated
- Price
$30/1M characters + GPU hourly / 1M characters
- Input modalitiesText
- Output modalitiesAudio
- CategoryAudio