Browser TTS workspace

Kokoro TTS

Free AI text to speech with Kokoro TTS. 54 voices across 9 languages, WebGPU + WASM, highest quality. Runs entirely in your browser — offline and private.

Private generation WAV + MP3 export Default TTS workspace

Switch Tool TTS + STT

🎙️ Kokoro TTS 54 voices · Best quality 🐱 Kitten TTS 8 voices · Lightest 🔊 Piper TTS 25 voices · Fastest CPU 🎤 Supertonic TTS 5 languages · Local 📝 Whisper STT 99 langs · captions

305-326

q4 / fp32

MB model

9 languages

voices

82M

StyleTTS 2

params

WebGPU

+WASM fallback

GPU+CPU

TTS works best on desktop

You can still try lightweight engines on mobile, but desktop Chrome or Edge remains the most reliable setup for large model downloads and long-form generation.

About Kokoro TTS

Kokoro TTS is the flagship engine on OfflineTTS, offering 54 voices across 9 languages including English, Japanese, Chinese, Spanish, French, Hindi, Italian, and Portuguese. Powered by an 82M parameter StyleTTS 2 model with ISTFTNet, it delivers the highest quality speech synthesis available in a browser.

English voices work fully offline after the initial model download. Non-English voices use a lightweight text-to-phoneme step before local audio synthesis, so the browser still handles waveform generation on your device.

It supports two model sizes: q4 (~305MB, recommended) and fp32 (~326MB, full precision). q8 quantization is not available as it produces garbled audio with this model.

Compare engines: Kitten TTS (8 voices, 24MB, lightest) · Piper TTS (25 voices, fastest CPU) · Supertonic TTS (5 languages, local inference)

Getting Started with Kokoro TTS

New to AI text to speech? Here's how to get the best results from Kokoro TTS in under two minutes.

1. Choose Your Model Size

Use q4 (~305MB) for the best quality/size balance. FP32 (~326MB) delivers full precision. Both run on WebGPU or WASM.

2. Pick a Voice

Heart (A-rated) is the best all-rounder for English. Bella (A-rated) adds more expressiveness. Browse all 54 voices to find the tone that matches your project.

3. Write Your Script

Use proper punctuation — commas add pauses, periods create full stops, question marks raise pitch. Well-punctuated text produces the most natural speech.

4. Generate & Download

Click generate, wait for the audio to play, then download as WAV (lossless) or MP3 (compressed). WAV is recommended for further editing.

Tips for Best TTS Quality

Use WebGPU for speed. Chrome 113+ and Edge 113+ support WebGPU, which generates speech 3-5x faster than WASM. The tool auto-detects and uses the best available backend.

Punctuate properly. This is the single most important factor for natural-sounding speech. Commas, periods, question marks, and exclamation marks all create distinct prosodic effects.

Break long text into paragraphs. The tool handles up to 50,000 characters, but shorter paragraphs with clear punctuation produce better rhythm and pacing.

Try multiple voices. Different voices suit different content types. Heart excels at warm narration, Bella at energetic delivery, Michael at professional reviews.

Use WAV for production. WAV preserves full audio quality for editing. MP3 is fine for quick sharing, but use WAV if you plan to mix, master, or further process the audio.