Models / Deepgram
Audio

Deepgram Flux

Conversational speech recognition with native turn detection

About model

Flux combines speech recognition and turn detection in a single model for voice agent applications. Built for conversational AI, Flux delivers turn-complete transcripts with native end-of-turn detection, eliminating the need for separate VAD and endpointing systems.

End-of-Turn Latency

Low

Median detection latency for natural conversation flow

STT + Turn Detection

Single Model

Integrated transcription and timing without separate systems

Turn Dynamics

Configurable

Adjustable thresholds for latency-accuracy tradeoffs

Model key capabilities
  • Integrated Turn Detection: Native end-of-turn detection using conversational context, not just silence
  • Streaming Transcription: Turn-complete transcripts without partial updates or reconstruction
  • Configurable Timing: Adjustable thresholds for latency-accuracy tradeoffs across use cases
  • Model card

    Architecture Overview:
    • Fused speech recognition architecture integrating transcription and turn detection in single model
    • Model-native turn detection using acoustic, semantic, and conversational context beyond silence thresholds
    • Supports configurable turn-taking dynamics with eager end-of-turn for speculative response generation
    • Streaming architecture delivering turn-complete transcripts without partial updates

    Training Methodology:
    • Trained on conversational datasets to model natural dialogue flow and turn-taking patterns
    • Multi-stage training combining transcription accuracy with conversational timing objectives
    • Optimized for voice agent scenarios including interruption handling and barge-in use cases

    Performance Characteristics:
    • Transcription accuracy comparable to Nova-3 on conversational audio benchmarks
    • Median end-of-turn detection latency under 300ms with p95 at 1.5 seconds
    • Configurable eot_threshold parameter for latency-accuracy tradeoffs
    • Keyterm prompting support for domain-specific vocabulary recognition
    • Reduces false interruptions in voice agent applications vs baseline VAD systems

  • Applications & use cases

    Real-Time Voice Agents:
    • Customer service voice bots with natural turn-taking behavior
    • Interactive voice response systems with reduced false cutoffs
    • Virtual assistants requiring responsive conversation flow
    • Healthcare voice agents for patient intake and appointment scheduling

    High-Concurrency Applications:
    • Contact center automation with concurrent voice agent sessions
    • Automated order-taking systems for restaurants and retail
    • Banking voice authentication and transaction processing
    • Insurance claims processing with multi-turn workflows

    Development Simplification:
    • Replaces complex ASR+VAD+endpointing pipeline integration
    • Single API for speech recognition and turn management
    • Conversation-native events for voice agent state machines

Related models
  • Model provider
    Deepgram
  • Type
    Audio
  • Deployment
    On-Demand Dedicated
  • Price

    $0.0077/min + GPU hourly / min

  • Input modalities
    Audio
  • Output modalities
    Text
  • Category
    Transcribe