Piper TTS
Free AI text to speech with Piper TTS. 25 curated voices from 904-speaker dataset, ~75MB model, WASM. Runs entirely in your browser — offline and private.
TTS works best on desktop
Audio generation uses WebGPU/WASM. Desktop Chrome or Edge gives the most reliable result.
About Piper TTS
Piper TTS is one of the most established open-source TTS engines, widely used in Home Assistant, accessibility tools, and edge computing projects. It uses the VITS neural architecture trained on the LibriTTS dataset.
The full dataset contains 904 distinct English voices. We curate 25 of the most distinct and useful speakers for the browser interface, ranging from warm narrators to professional presenters.
Piper is optimized for CPU inference via WebAssembly, generating audio 3-5x faster than realtime on standard hardware. It has a fixed 22.05kHz sample rate and runs without WebGPU — making it compatible with every modern browser.
Getting Started with Piper TTS
1. Download the Model
Piper's model is ~75MB — a one-time download cached in your browser. It uses the VITS neural architecture trained on the LibriTTS dataset.
2. Browse 25 Curated Voices
Each voice has a distinct vocal character — warm narrators, professional presenters, conversational tones. Pick one that matches your content style.
3. Enter Your Text
Type or paste up to 50,000 characters of English text. Piper handles punctuation naturally — commas, periods, and question marks all create distinct speech patterns.
4. Generate at CPU Speed
Piper runs purely on CPU via WebAssembly — no GPU needed. It generates speech 3-5x faster than realtime, even on modest hardware.
Tips for Piper TTS
No WebGPU required. Unlike Kokoro and Kitten, Piper runs entirely on WASM/CPU. This means it works in every modern browser, including those without WebGPU support like Safari.
Fastest CPU generation. Piper generates audio 3-5x faster than realtime on a standard CPU. If you need bulk generation (batch processing chapters, scripts), Piper is the fastest option.
Explore the full 904-voice dataset. The browser interface curates 25 of the best voices, but the full LibriTTS dataset has 904 speakers. If you need a specific voice character, the broader dataset may have what you need.
Fixed 22.05kHz sample rate. Piper outputs at 22.05kHz. This is fine for most use cases including podcasts and YouTube. If you need higher sample rates, use Kokoro (24kHz) or Kitten (configurable up to 48kHz).