Browser TTS Showdown: Kokoro vs Piper vs Kitten — 2026 Benchmark
The idea of running text-to-speech entirely in your browser used to sound like a novelty. In 2026, it’s a practical reality. Three engines have emerged as serious contenders: Kokoro TTS, Piper TTS, and Kitten TTS.
We ran them all through the same tests. Here’s what we found.
The Three Contenders
Before numbers, here’s what each engine brings to the table.
Kokoro TTS (82MB)
Built on the StyleTTS 2 architecture, Kokoro packs 82 million parameters into an ONNX model that runs in the browser via WebGPU or WebAssembly. It ships with 21 premium voices trained for American and British English, plus 33 additional voices across 7 more languages.
Kokoro is the engine behind TTS Studio and OfflineTTS — currently the two most visible browser-based TTS projects.
Piper TTS (75MB)
A veteran in the local TTS space. Piper uses a VITS neural architecture trained on the LibriTTS dataset. Its killer feature is voice count: 904 speakers available. Piper has been the go-to for Home Assistant integrations and Raspberry Pi projects for years.
Kitten TTS (24MB)
The lightweight option. At just 24MB with 15 million parameters, Kitten TTS is designed for environments where every megabyte matters — embedded devices, mobile browsers, or hardware with tight memory budgets.
Benchmark Results
Audio Quality
Quality is subjective, but Mean Opinion Score (MOS) testing gives us a starting point. We also ran informal A/B tests with 20 listeners on a mix of narrative, conversational, and technical text.
| Engine | MOS (est.) | Naturalness | Expressiveness | Consistency |
|---|---|---|---|---|
| Kokoro TTS | 4.3–4.5 | High | High | Very High |
| Piper TTS | 3.8–4.0 | Good | Moderate | High |
| Kitten TTS | 3.2–3.5 | Acceptable | Low | Moderate |
What listeners said:
- Kokoro was consistently picked as “sounds like a real person.” Listeners noted natural intonation, appropriate pauses, and smooth prosody even on complex sentences.
- Piper produced clear, intelligible speech but occasionally sounded robotic on longer passages. Listeners described it as “functional” and “clean.”
- Kitten was described as “understandable but noticeably synthetic.” Fine for short notifications, less suited for narration.
Generation Speed
We tested on a mid-2025 MacBook Pro (M4, 16GB RAM) with Chrome 134.
| Engine | WebGPU | WASM | Realtime Factor* |
|---|---|---|---|
| Kokoro TTS | ~1.5–2x realtime | ~0.8–1x realtime | 1.0x = native speed |
| Piper TTS | N/A (WASM only) | ~3–5x realtime | Extremely fast |
| Kitten TTS | ~2–3x realtime | ~1.5–2x realtime | Very fast |
*Realtime factor: how fast audio is generated compared to playback duration. 2x realtime = 10 seconds of audio generated in 5 seconds.
Key finding: Piper is the speed king on CPU. Kokoro with WebGPU gets close but requires GPU support. Kitten is fast everywhere due to its tiny model.
Model Size & Loading
| Engine | Model Size | First Load | Cached Load |
|---|---|---|---|
| Kokoro TTS | 82MB | 3–5s (WebGPU setup) | <1s |
| Piper TTS | 75MB + config | 2–3s | <1s |
| Kitten TTS | 24MB | 1–2s | <0.5s |
Kitten wins on footprint. Kokoro and Piper are comparable in download size, but Kokoro’s added features (WebGPU, sample rate control, multilingual voices) justify the extra megabytes.
Voice Diversity
| Engine | Voices | Languages | Voice Selection Method |
|---|---|---|---|
| Kokoro TTS | 54 | 9 | Hand-picked, quality-graded |
| Piper TTS | 904 | 1 (English) | Dataset-derived |
| Kitten TTS | 8 | 1 (English) | Expression-based embeddings |
Piper’s 904 voices look impressive on paper. In practice, many sound similar because they’re drawn from the same LibriTTS dataset. The quality varies significantly between speakers.
Kokoro’s 54 voices are curated. Each is quality-graded (A through D), so you know what you’re getting before you generate.
Feature Comparison
| Feature | Kokoro | Piper | Kitten |
|---|---|---|---|
| WebGPU acceleration | ✅ | ❌ | ✅ |
| WASM fallback | ✅ | ✅ | ✅ |
| Sample rate control | ✅ (8–48kHz) | ❌ (22kHz fixed) | ✅ (8–48kHz) |
| Speed control | ✅ | ✅ | ✅ |
| Voice preview | ✅ | ✅ | ✅ |
| Multi-language | ✅ (9) | ❌ | ❌ |
| Offline capable | ✅ | ✅ | ✅ |
| MP3/WAV export | ✅ | ✅ | ✅ |
| Open source | Apache 2.0 | MIT | Apache 2.0 |
Real-World Use Cases
For Content Creators (YouTube, Podcasts, Audiobooks)
Winner: Kokoro
Naturalness matters most when your audience is listening for minutes, not seconds. Kokoro’s Grade A voices hold up under sustained listening. Piper works but sounds more mechanical on longer passages.
For Developers Building Voice Apps
Winner: Depends on needs
- Need maximum voice variety? Piper — 904 voices give users lots of options.
- Need GPU acceleration and multi-language? Kokoro — WebGPU + 9 languages is hard to beat.
- Need minimal footprint? Kitten — 24MB runs on anything.
For Privacy-First and Offline Applications
Winner: Kokoro or Piper
Both are fully offline capable. Kokoro edges ahead if you need multi-language support — 9 languages offline is significant for international privacy tools.
For Embedded and IoT Devices
Winner: Kitten or Piper
Kitten’s 24MB footprint makes it viable on Raspberry Pi Zero or embedded hardware. Piper’s CPU-optimized VITS architecture is the established choice for Home Assistant and similar projects.
The TTS Studio Perspective
The TTS Studio project is notable because it integrates all three engines in a single interface. Their approach — letting users switch between Kokoro, Piper, and Kitten without leaving the page — proves something important:
There is no single “best” TTS engine. There’s the best one for your use case.
TTS Studio’s architecture is instructive: one model loads at a time, cached after first use. This is the same pattern OfflineTTS uses with Kokoro. The “load on demand, cache aggressively” pattern is becoming the standard for browser-based ML.
Our Take
For most users visiting this site, Kokoro TTS is the right choice. Here’s why:
- Quality — consistently natural, even on long-form text
- Languages — 9 of them, all offline
- Hardware — WebGPU when available, WASM when not
- Voices — 54 curated and quality-graded
That said, the ecosystem is better for having all three. Piper pushed the boundary on speed and voice count. Kitten proved that useful TTS can fit in 24MB. Kokoro showed that browser-based quality can rival cloud services.
Try It Yourself
OfflineTTS runs Kokoro TTS in your browser — free, offline, no signup.