Kokoro TTS
Free AI text to speech with Kokoro TTS. 54 voices across 9 languages, WebGPU + WASM, highest quality. Runs entirely in your browser โ offline and private.
TTS works best on desktop
Audio generation uses WebGPU/WASM. Desktop Chrome or Edge gives the most reliable result.
About Kokoro TTS
Kokoro TTS is the flagship engine on OfflineTTS, offering 54 voices across 9 languages including English, Japanese, Chinese, Spanish, French, Hindi, Italian, and Portuguese. Powered by an 82M parameter StyleTTS 2 model with ISTFTNet, it delivers the highest quality speech synthesis available in a browser.
It supports multiple model sizes (q4 ~90MB, q8 ~300MB, fp32 ~600MB) and runs on both WebGPU and WASM backends, automatically selecting the fastest option for your device.
Getting Started with Kokoro TTS
New to AI text to speech? Here's how to get the best results from Kokoro TTS in under two minutes.
1. Choose Your Model Size
Start with the q4 model (~90MB) for quick testing. Switch to q8 (~300MB) for production quality. The fp32 model (~600MB) delivers the highest quality but takes longer to download.
2. Pick a Voice
Heart (A-rated) is the best all-rounder for English. Bella (A-rated) adds more expressiveness. Browse all 54 voices to find the tone that matches your project.
3. Write Your Script
Use proper punctuation โ commas add pauses, periods create full stops, question marks raise pitch. Well-punctuated text produces the most natural speech.
4. Generate & Download
Click generate, wait for the audio to play, then download as WAV (lossless) or MP3 (compressed). WAV is recommended for further editing.
Tips for Best TTS Quality
Use WebGPU for speed. Chrome 113+ and Edge 113+ support WebGPU, which generates speech 3-5x faster than WASM. The tool auto-detects and uses the best available backend.
Punctuate properly. This is the single most important factor for natural-sounding speech. Commas, periods, question marks, and exclamation marks all create distinct prosodic effects.
Break long text into paragraphs. The tool handles up to 50,000 characters, but shorter paragraphs with clear punctuation produce better rhythm and pacing.
Try multiple voices. Different voices suit different content types. Heart excels at warm narration, Bella at energetic delivery, Michael at professional reviews.
Use WAV for production. WAV preserves full audio quality for editing. MP3 is fine for quick sharing, but use WAV if you plan to mix, master, or further process the audio.