Deepgram Aura-2

Professional text-to-speech for real-time voice applications

About model

Aura-2 is DeepGram's text-to-speech model purpose-built for real-time voice applications in business environments. Delivering low latency, 40+ professional voices, and context-aware pronunciation across 7 languages, Aura-2 runs on the same infrastructure as Nova-3 STT for unified voice AI deployment.

TTFB Latency

Low

Ultra-responsive for real-time voice agents

Professional Voices

40+

Business-optimized voices across 7 languages

Pronunciation

Context-Aware

Domain-specific handling for medical, legal, financial terms

Model key capabilities

Low Latency: Low time-to-first-byte latency supporting thousands of concurrent requests
Professional Voices: 40+ voices optimized for business contexts across 7 languages
Context-Aware Delivery: Intelligent pacing, pauses, and pronunciation based on content type

Model card
Architecture Overview:
• Text-to-speech architecture optimized for real-time voice applications
• 40+ professional voice personas purpose-built for business contexts
• Context-aware delivery adjusting pacing, pauses, tone, and expression based on content
• Powered by Deepgram Enterprise Runtime for model hot-swapping and flexible deployment

Training Methodology:
• Trained on conversational data across healthcare, customer support, and financial services
• Each voice developed with defined tonal profile aligned to business use cases
• Optimized for domain-specific pronunciation including medical terms, legal references, and alphanumerics
• Multi-language training across 7 languages with native phonological characteristics

Performance Characteristics:
• Low time-to-first-byte latency for real-time interactions
• Supports thousands of concurrent requests while maintaining low latency
• Context-aware pronunciation for dates, times, currency, phone numbers, and structured data
• Uniform volume and articulation throughout extended conversations
• 7-language support: English, Spanish, Dutch, French, German, Italian, Japanese with multiple accents
‍
Applications & use cases
Voice Agent Applications:
• Customer support automation with professional voice delivery
• Virtual assistants for healthcare, banking, and insurance with domain-specific pronunciation
• Interactive voice response systems with context-aware pacing and tone
• Automated appointment scheduling and reminder systems

Business Communications:
• Internal notification systems with consistent professional voice
• Multilingual customer communications for global deployment
• Accessibility solutions for visually impaired users
• Training and e-learning platforms with clear voice narration

Regulated Industries:
• Medical applications with accurate pronunciation of drug names and medical terminology
• Legal services with proper handling of legal references and case citations
• Financial services for automated reporting and compliance communications
• Government applications requiring on-premises deployment for security
‍

Related models

Model specifications

Model data

Model provider
Deepgram
Type
Audio
Deployment
Dedicated
Price
$30 / 1M characters + GPU hourly
Input modalities
Text
Output modalities
Audio

Category
Audio

Quickstart docs

Deploy model

Deepgram Aura-2

About model

Model card

Applications & use cases