Deepgram Flux
Conversational speech recognition with native turn detection

About model
Flux combines speech recognition and turn detection in a single model for voice agent applications. Built for conversational AI, Flux delivers turn-complete transcripts with native end-of-turn detection, eliminating the need for separate VAD and endpointing systems.
Low
Median detection latency for natural conversation flow
Single Model
Integrated transcription and timing without separate systems
Configurable
Adjustable thresholds for latency-accuracy tradeoffs
- Integrated Turn Detection: Native end-of-turn detection using conversational context, not just silence
- Streaming Transcription: Turn-complete transcripts without partial updates or reconstruction
- Configurable Timing: Adjustable thresholds for latency-accuracy tradeoffs across use cases
Model card
Architecture Overview:
• Fused speech recognition architecture integrating transcription and turn detection in single model
• Model-native turn detection using acoustic, semantic, and conversational context beyond silence thresholds
• Supports configurable turn-taking dynamics with eager end-of-turn for speculative response generation
• Streaming architecture delivering turn-complete transcripts without partial updates
Training Methodology:
• Trained on conversational datasets to model natural dialogue flow and turn-taking patterns
• Multi-stage training combining transcription accuracy with conversational timing objectives
• Optimized for voice agent scenarios including interruption handling and barge-in use cases
Performance Characteristics:
• Transcription accuracy comparable to Nova-3 on conversational audio benchmarks
• Median end-of-turn detection latency under 300ms with p95 at 1.5 seconds
• Configurable eot_threshold parameter for latency-accuracy tradeoffs
• Keyterm prompting support for domain-specific vocabulary recognition
• Reduces false interruptions in voice agent applications vs baseline VAD systems
Applications & use cases
Real-Time Voice Agents:
• Customer service voice bots with natural turn-taking behavior
• Interactive voice response systems with reduced false cutoffs
• Virtual assistants requiring responsive conversation flow
• Healthcare voice agents for patient intake and appointment scheduling
High-Concurrency Applications:
• Contact center automation with concurrent voice agent sessions
• Automated order-taking systems for restaurants and retail
• Banking voice authentication and transaction processing
• Insurance claims processing with multi-turn workflows
Development Simplification:
• Replaces complex ASR+VAD+endpointing pipeline integration
• Single API for speech recognition and turn management
• Conversation-native events for voice agent state machines
- Model providerDeepgram
- TypeAudio
- DeploymentOn-Demand Dedicated
- Price
$0.0077/min + GPU hourly / min
- Input modalitiesAudio
- Output modalitiesText
- CategoryTranscribe