Models / Deepgram
Audio

Deepgram Aura-2

Professional text-to-speech for real-time voice applications

About model

Aura-2 is DeepGram's text-to-speech model purpose-built for real-time voice applications in business environments. Delivering low latency, 40+ professional voices, and context-aware pronunciation across 7 languages, Aura-2 runs on the same infrastructure as Nova-3 STT for unified voice AI deployment.

TTFB Latency

Low

Ultra-responsive for real-time voice agents

Professional Voices

40+

Business-optimized voices across 7 languages

Pronunciation

Context-Aware

Domain-specific handling for medical, legal, financial terms

Model key capabilities
  • Low Latency: Low time-to-first-byte latency supporting thousands of concurrent requests
  • Professional Voices: 40+ voices optimized for business contexts across 7 languages
  • Context-Aware Delivery: Intelligent pacing, pauses, and pronunciation based on content type
  • Model card

    Architecture Overview:
    • Text-to-speech architecture optimized for real-time voice applications
    • 40+ professional voice personas purpose-built for business contexts
    • Context-aware delivery adjusting pacing, pauses, tone, and expression based on content
    • Powered by Deepgram Enterprise Runtime for model hot-swapping and flexible deployment

    Training Methodology:
    • Trained on conversational data across healthcare, customer support, and financial services
    • Each voice developed with defined tonal profile aligned to business use cases
    • Optimized for domain-specific pronunciation including medical terms, legal references, and alphanumerics
    • Multi-language training across 7 languages with native phonological characteristics

    Performance Characteristics:
    • Low time-to-first-byte latency for real-time interactions
    • Supports thousands of concurrent requests while maintaining low latency
    • Context-aware pronunciation for dates, times, currency, phone numbers, and structured data
    • Uniform volume and articulation throughout extended conversations
    • 7-language support: English, Spanish, Dutch, French, German, Italian, Japanese with multiple accents

  • Applications & use cases

    Voice Agent Applications:
    • Customer support automation with professional voice delivery
    • Virtual assistants for healthcare, banking, and insurance with domain-specific pronunciation
    • Interactive voice response systems with context-aware pacing and tone
    • Automated appointment scheduling and reminder systems

    Business Communications:
    • Internal notification systems with consistent professional voice
    • Multilingual customer communications for global deployment
    • Accessibility solutions for visually impaired users
    • Training and e-learning platforms with clear voice narration

    Regulated Industries:
    • Medical applications with accurate pronunciation of drug names and medical terminology
    • Legal services with proper handling of legal references and case citations
    • Financial services for automated reporting and compliance communications
    • Government applications requiring on-premises deployment for security

Related models
  • Model provider
    Deepgram
  • Type
    Audio
  • Deployment
    On-Demand Dedicated
  • Price

    $30/1M characters + GPU hourly / 1M characters

  • Input modalities
    Text
  • Output modalities
    Audio
  • Category
    Audio