Kokoro-82M
Fast, cost-efficient TTS with quality comparable to larger models
About model
Kokoro is an ultra-lightweight TTS model with just 82 million parameters that proves size doesn't determine quality. Despite being dramatically smaller than competitors, Kokoro delivers comparable speech quality while being significantly faster and more cost-efficient. With Apache-2.0 licensing and $1000 total training cost, it's the most accessible production-grade TTS model available.
82M
Compact architecture, blazing-fast inference
$0.06
Market rate API deployment ($1/M characters)
8 × 54
Multilingual support in v1.0 release
- Extreme Efficiency: Quality matching larger models at a fraction of the computational cost
- Truly Open: Apache-2.0 licensed—deploy in production, personal projects, anywhere without restrictions
- Accessible Training: Total cost under $1000 (1000 A100 GPU hours) makes it reproducible for the community
- Battle-Tested: Deployed in numerous commercial APIs and real-world production environments
API usage
Endpoint:
Model card
Architecture Overview:
• Based on StyleTTS 2 architecture with ISTFTNet vocoder
• 82 million parameter lightweight design optimized for efficiency
• Decoder-only architecture with no diffusion or encoder
• Uses misaki G2P (grapheme-to-phoneme) library for text processing
• Fine-tuned from StyleTTS2-LJSpeech base model
Training Methodology:
• Trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels
• v1.0: Few hundred hours of audio across 8 languages with 54 voices
• v0.19: Less than 100 hours for initial English-only release with 10 voices
• Total training cost: $1000 for 1000 hours of A100 80GB GPU time
• Uses public domain audio, Apache/MIT licensed audio, and synthetic audio from large providers
• Includes CC BY licensed datasets: Koniwa tnc (<1h, CC BY 3.0) and SIWIS (<11h, CC BY 4.0)
Performance Characteristics:
• Delivers comparable quality to larger TTS models despite compact 82M size
• Significantly faster inference than larger alternatives
• Deployed in numerous commercial APIs and production projectsApplications & use cases
Production API Services:
• Deployed in numerous commercial APIs at market-leading prices
• Cost-effective TTS for high-volume applications (under $1 per million characters)
• Ideal for startups and businesses needing affordable voice synthesis
• Apache-2.0 license enables unrestricted commercial deployment
Personal & Developer Projects:
• Lightweight 82M parameters suitable for local deployment
• Easy integration into applications via simple Python API
• Perfect for indie developers and hobbyists
Multilingual Applications:
• Support for 8 languages with 54 voices in v1.0
• International content creation and localization
• Cross-language accessibility solutions
• Global customer service automation
Content Creation:
• Audiobook narration with cost-efficient processing
• Podcast and video voiceovers
• E-learning content with multiple language support
• Social media and marketing content generation
Accessibility & Assistive Technology:
• Screen readers and text-to-speech assistive devices
• Educational tools for language learning
• Communication aids for speech-impaired users
• Document reading applications
- Model providerhexgrad
- TypeAudio
- Main use casesText-to-Speech
- DeploymentServerless
- Endpoint
- Parameters82M
- Price
$10.00 / 1M characters
- Input modalitiesText
- Output modalitiesAudio
- ReleasedDecember 25, 2024
- Last updatedNovember 2, 2025
- External link
- CategoryAudio