Deepgram Nova-3
Real-time speech-to-text for voice agents

About model
Nova-3 is DeepGram's speech-to-text model providing real-time transcription with self-serve vocabulary customization. Built on latent space architecture, Nova-3 delivers low latency with robust performance in challenging acoustic environments for enterprise production workloads.
Low
Real-time transcription for voice agents
Self-Serve
Instant vocabulary adaptation with keyterm boosting
Noise Robust
Maintains accuracy in challenging acoustic environments
- Real-Time Transcription: Low latency for voice agent and captioning applications
- Noise Robustness: Maintains accuracy in challenging acoustic environments with background noise
- Domain Optimization: Optimized for medical, legal, financial, and technical terminology
Model card
Architecture Overview:
• Latent space architecture compressing audio into expressive representations while preserving acoustic features
• Audio embedding framework using representation learning for diverse acoustic conditions
• Audio-text alignment enabling training on challenging examples
Training Methodology:
• Multi-stage training combining synthetic data with real-world conversational datasets
• Targeted data augmentation for specialized vocabulary in realistic acoustic conditions
• Trained on conversational data covering challenging environments and domain terminology
• Optimization for medical, legal, financial, and technical vocabulary
Performance Characteristics:
• Maintains accuracy in noisy environments with speaker distance variation and overlapping speech
• Self-serve customization with keyterm boosting for vocabulary adaptation
• Optional personal information redaction for compliance requirements
Applications & use cases
Enterprise Transcription:
• Contact center analytics with transcription for quality monitoring and coaching
• Medical transcription for clinical documentation with healthcare vocabulary
• Legal transcription for depositions, court proceedings, and legal discovery
• Financial services for earnings calls, compliance monitoring, and regulatory documentation
Real-Time Applications:
• Voice agent backends requiring speech-to-text for conversational AI
• Live captioning for meetings, webinars, and virtual events
• Broadcast media for live subtitling and accessibility compliance
Production Deployments:
• Customer support automation with accurate transcription
• Interactive voice response systems for complex workflows
• Automated quality monitoring and compliance recording
- Model providerDeepgram
- TypeAudio
- DeploymentOn-Demand Dedicated
- Price
$0.0077/min + GPU hourly / min
- Input modalitiesAudio
- Output modalitiesText
- ReleasedFebruary 11, 2025
- CategoryTranscribe