Models / Canopy Labs / / Orpheus TTS API
Orpheus TTS API

This model is not currently supported on Together AI.
Visit our Models page to view all the latest models.
Orpheus TTS is a breakthrough speech-LLM family built on Llama-3B that achieves human-level speech generation with natural emotion and intonation. Trained on 100k+ hours of English speech data, Orpheus demonstrates that open-source TTS can finally compete with—and surpass—closed-source models in real-world quality.
Orpheus TTS API Usage
Endpoint
How to use Orpheus TTS
Model details
Architecture Overview:
• Llama-3B backbone architecture adapted for speech-LLM applications
• Trained on 100k+ hours of English speech data and billions of text tokens
• SNAC audio tokenizer with 7 tokens per frame decoded as flattened sequence
• CNN-based detokenizer with sliding window modification for streaming without popping
Training Methodology:
• Pretrained on massive scale speech and text data to maintain language understanding
• Text token training boosts TTS performance while preserving semantic reasoning ability
• Trained exclusively on permissive/non-copyrighted audio data
• Fine-tuned models available for production use with 8 distinct voices (tara, leah, jess, leo, dan, mia, zac, zoe)
• Supports custom fine-tuning with as few as 50 examples per speaker
Performance Characteristics:
• Handles disfluencies naturally without artifacts
• Streaming inference faster than real-time playback on A100 40GB for 3B parameter model
• vLLM implementation enables efficient GPU utilization
Prompting Orpheus TTS
Applications & Use Cases
Conversational AI & Virtual Assistants:
• Low-latency streaming enables natural conversational experiences
• Emotional intelligence and empathy expression for human-like interactions
• Multiple voice options for personalized assistant experiences
• Handles natural disfluencies and conversational patterns
Content Creation & Media:
• Audiobook narration with natural emotion and intonation
• Podcast generation with multiple speaker voices
• Video voiceovers with guided emotion control
• Character voices for gaming and animation
Enterprise & Production Applications:
• Contact center automation with empathetic customer service voices
• E-learning and training content with engaging narration
• Accessibility applications for text-to-speech needs
• Real-time translation and dubbing services
Creative Applications:
• Guided emotion and intonation for dramatic readings
• Role-playing and character voice generation
• Music and audio production with vocal synthesis
• Interactive storytelling with dynamic voice expressions
