Models / Deepgram
Audio

Deepgram Nova-3

Real-time speech-to-text for voice agents

About model

Nova-3 is DeepGram's speech-to-text model providing real-time transcription with self-serve vocabulary customization. Built on latent space architecture, Nova-3 delivers low latency with robust performance in challenging acoustic environments for enterprise production workloads.

Latency

Low

Real-time transcription for voice agents

Customization

Self-Serve

Instant vocabulary adaptation with keyterm boosting

Performance

Noise Robust

Maintains accuracy in challenging acoustic environments

Model key capabilities
  • Real-Time Transcription: Low latency for voice agent and captioning applications
  • Noise Robustness: Maintains accuracy in challenging acoustic environments with background noise
  • Domain Optimization: Optimized for medical, legal, financial, and technical terminology
  • Model card

    Architecture Overview:
    • Latent space architecture compressing audio into expressive representations while preserving acoustic features
    • Audio embedding framework using representation learning for diverse acoustic conditions
    • Audio-text alignment enabling training on challenging examples

    Training Methodology:
    • Multi-stage training combining synthetic data with real-world conversational datasets
    • Targeted data augmentation for specialized vocabulary in realistic acoustic conditions
    • Trained on conversational data covering challenging environments and domain terminology
    • Optimization for medical, legal, financial, and technical vocabulary

    Performance Characteristics:
    • Maintains accuracy in noisy environments with speaker distance variation and overlapping speech
    • Self-serve customization with keyterm boosting for vocabulary adaptation
    • Optional personal information redaction for compliance requirements

  • Applications & use cases

    Enterprise Transcription:
    • Contact center analytics with transcription for quality monitoring and coaching
    • Medical transcription for clinical documentation with healthcare vocabulary
    • Legal transcription for depositions, court proceedings, and legal discovery
    • Financial services for earnings calls, compliance monitoring, and regulatory documentation

    Real-Time Applications:
    • Voice agent backends requiring speech-to-text for conversational AI
    • Live captioning for meetings, webinars, and virtual events
    • Broadcast media for live subtitling and accessibility compliance

    Production Deployments:
    • Customer support automation with accurate transcription
    • Interactive voice response systems for complex workflows
    • Automated quality monitoring and compliance recording

Related models
  • Model provider
    Deepgram
  • Type
    Audio
  • Deployment
    On-Demand Dedicated
  • Price

    $0.0077/min + GPU hourly / min

  • Input modalities
    Audio
  • Output modalities
    Text
  • Released
    February 11, 2025
  • Category
    Transcribe