Deepgram Flux

Conversational speech recognition with native turn detection

About model

Flux combines speech recognition and turn detection in a single model for voice agent applications. Built for conversational AI, Flux delivers turn-complete transcripts with native end-of-turn detection, eliminating the need for separate VAD and endpointing systems.

End-of-Turn Latency

Low

Median detection latency for natural conversation flow

STT + Turn Detection

Single Model

Integrated transcription and timing without separate systems

Turn Dynamics

Configurable

Adjustable thresholds for latency-accuracy tradeoffs

Model key capabilities

Integrated Turn Detection: Native end-of-turn detection using conversational context, not just silence
Streaming Transcription: Turn-complete transcripts without partial updates or reconstruction
Configurable Timing: Adjustable thresholds for latency-accuracy tradeoffs across use cases

Model card
Architecture Overview:
• Fused speech recognition architecture integrating transcription and turn detection in single model
• Model-native turn detection using acoustic, semantic, and conversational context beyond silence thresholds
• Supports configurable turn-taking dynamics with eager end-of-turn for speculative response generation
• Streaming architecture delivering turn-complete transcripts without partial updates

Training Methodology:
• Trained on conversational datasets to model natural dialogue flow and turn-taking patterns
• Multi-stage training combining transcription accuracy with conversational timing objectives
• Optimized for voice agent scenarios including interruption handling and barge-in use cases

Performance Characteristics:
• Transcription accuracy comparable to Nova-3 on conversational audio benchmarks
• Median end-of-turn detection latency under 300ms with p95 at 1.5 seconds
• Configurable eot_threshold parameter for latency-accuracy tradeoffs
• Keyterm prompting support for domain-specific vocabulary recognition
• Reduces false interruptions in voice agent applications vs baseline VAD systems
‍
Applications & use cases
Real-Time Voice Agents:
• Customer service voice bots with natural turn-taking behavior
• Interactive voice response systems with reduced false cutoffs
• Virtual assistants requiring responsive conversation flow
• Healthcare voice agents for patient intake and appointment scheduling

High-Concurrency Applications:
• Contact center automation with concurrent voice agent sessions
• Automated order-taking systems for restaurants and retail
• Banking voice authentication and transaction processing
• Insurance claims processing with multi-turn workflows

Development Simplification:
• Replaces complex ASR+VAD+endpointing pipeline integration
• Single API for speech recognition and turn management
• Conversation-native events for voice agent state machines
‍

Related models

Model specifications

Model data

Model provider
Deepgram
Type
Audio
Transcribe
Deployment
Dedicated
Price
$0.0077 / min + GPU hourly
Input modalities
Audio
Output modalities
Text

Category
Transcribe

Quickstart docs

Deploy model

Deepgram Flux

About model

Model card

Applications & use cases