Models / / / Kokoro-82M API
Kokoro-82M API
Fast, cost-efficient TTS with quality comparable to larger models

This model is not currently supported on Together AI.
Visit our Models page to view all the latest models.
Kokoro-82M API Usage
Endpoint
How to use Kokoro-82M
Model details
Architecture Overview:
• Based on StyleTTS 2 architecture with ISTFTNet vocoder
• 82 million parameter lightweight design optimized for efficiency
• Decoder-only architecture with no diffusion or encoder
• Uses misaki G2P (grapheme-to-phoneme) library for text processing
• Fine-tuned from StyleTTS2-LJSpeech base model
Training Methodology:
• Trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels
• v1.0: Few hundred hours of audio across 8 languages with 54 voices
• v0.19: Less than 100 hours for initial English-only release with 10 voices
• Total training cost: $1000 for 1000 hours of A100 80GB GPU time
• Uses public domain audio, Apache/MIT licensed audio, and synthetic audio from large providers
• Includes CC BY licensed datasets: Koniwa tnc (<1h, CC BY 3.0) and SIWIS (<11h, CC BY 4.0)
Performance Characteristics:
• Delivers comparable quality to larger TTS models despite compact 82M size
• Significantly faster inference than larger alternatives
• Deployed in numerous commercial APIs and production projects
Prompting Kokoro-82M
Applications & Use Cases
Production API Services:
• Deployed in numerous commercial APIs at market-leading prices
• Cost-effective TTS for high-volume applications (under $1 per million characters)
• Ideal for startups and businesses needing affordable voice synthesis
• Apache-2.0 license enables unrestricted commercial deployment
Personal & Developer Projects:
• Lightweight 82M parameters suitable for local deployment
• Easy integration into applications via simple Python API
• Perfect for indie developers and hobbyists
Multilingual Applications:
• Support for 8 languages with 54 voices in v1.0
• International content creation and localization
• Cross-language accessibility solutions
• Global customer service automation
Content Creation:
• Audiobook narration with cost-efficient processing
• Podcast and video voiceovers
• E-learning content with multiple language support
• Social media and marketing content generation
Accessibility & Assistive Technology:
• Screen readers and text-to-speech assistive devices
• Educational tools for language learning
• Communication aids for speech-impaired users
• Document reading applications
