Deepgram Nova-3

Real-time speech-to-text for voice agents

About model

Nova-3 is DeepGram's speech-to-text model providing real-time transcription with self-serve vocabulary customization. Built on latent space architecture, Nova-3 delivers low latency with robust performance in challenging acoustic environments for enterprise production workloads.

Latency

Low

Real-time transcription for voice agents

Customization

Self-Serve

Instant vocabulary adaptation with keyterm boosting

Performance

Noise Robust

Maintains accuracy in challenging acoustic environments

Model key capabilities

Real-Time Transcription: Low latency for voice agent and captioning applications
Noise Robustness: Maintains accuracy in challenging acoustic environments with background noise
Domain Optimization: Optimized for medical, legal, financial, and technical terminology

Model card
Architecture Overview:
• Latent space architecture compressing audio into expressive representations while preserving acoustic features
• Audio embedding framework using representation learning for diverse acoustic conditions
• Audio-text alignment enabling training on challenging examples

Training Methodology:
• Multi-stage training combining synthetic data with real-world conversational datasets
• Targeted data augmentation for specialized vocabulary in realistic acoustic conditions
• Trained on conversational data covering challenging environments and domain terminology
• Optimization for medical, legal, financial, and technical vocabulary

Performance Characteristics:
• Maintains accuracy in noisy environments with speaker distance variation and overlapping speech
• Self-serve customization with keyterm boosting for vocabulary adaptation
• Optional personal information redaction for compliance requirements
‍
Applications & use cases
Enterprise Transcription:
• Contact center analytics with transcription for quality monitoring and coaching
• Medical transcription for clinical documentation with healthcare vocabulary
• Legal transcription for depositions, court proceedings, and legal discovery
• Financial services for earnings calls, compliance monitoring, and regulatory documentation

Real-Time Applications:
• Voice agent backends requiring speech-to-text for conversational AI
• Live captioning for meetings, webinars, and virtual events
• Broadcast media for live subtitling and accessibility compliance

Production Deployments:
• Customer support automation with accurate transcription
• Interactive voice response systems for complex workflows
• Automated quality monitoring and compliance recording
‍

Related models

Model specifications

Model data

Model provider
Deepgram
Type
Audio
Transcribe
Deployment
Dedicated
Price
$0.0077 / min + GPU hourly
Input modalities
Audio
Output modalities
Text

Released
February 11, 2025
Category
Transcribe

Quickstart docs

Deploy model

Deepgram Nova-3

About model

Model card

Applications & use cases