Browser TTS workspace

Kokoro TTS

Free AI text to speech with Kokoro TTS. 54 voices across 9 languages, WebGPU + WASM, highest quality. Runs entirely in your browser — offline and private.

Private generation WAV + MP3 export Default TTS workspace

Switch Tool TTS + STT

Kokoro TTS 54 voices · Best quality Kitten TTS 8 voices · Lightest Piper TTS 25 voices · Fastest CPU Supertonic TTS 5 languages · Local Whisper STT 99 langs · captions

305-326

q4 / fp32

MB model

9 languages

voices

82M

StyleTTS 2

params

WebGPU

+WASM fallback

GPU+CPU

TTS works best on desktop

Audio generation uses WebGPU/WASM. Desktop Chrome or Edge gives the most reliable result.

About Kokoro TTS

Kokoro TTS is the flagship engine on OfflineTTS, offering 54 voices across 9 languages including English, Japanese, Chinese, Spanish, French, Hindi, Italian, and Portuguese. Powered by an 82M parameter StyleTTS 2 model with ISTFTNet, it delivers the highest quality speech synthesis available in a browser.

English voices use kokoro-js's built-in phonemizer and work fully offline after the model download. For non-English languages, a lightweight server API converts text to IPA phonemes (using misaki for Japanese/Chinese and espeak-ng for others), then audio synthesis runs locally via WebGPU/WASM. The server receives only plain text and returns phoneme strings — no audio is sent, no data is stored.

Two model precisions are available: q4 (~305MB, recommended — best quality/size balance) and fp32 (~326MB, full precision). Note: q8 (8-bit dynamic quantization) is not available with this model as it produces garbled audio output.

Compare engines: Kitten TTS (8 voices · Lightest) · Piper TTS (25 voices · Fastest CPU) · Supertonic TTS (5 languages · Local)

Getting Started with Kokoro TTS

Choose Your Model Size

Use q4 (~305MB) for the best quality/size balance. FP32 (~326MB) delivers full precision but marginal quality improvement for most use cases.

Pick a Voice

Heart (A-rated) is the best all-rounder for English. Bella (A-rated) adds more expressiveness. Browse all 54 voices across 9 languages to find the tone that matches your project.

Write Your Script

Use proper punctuation — commas add pauses, periods create full stops, question marks raise pitch. Well-punctuated text produces the most natural speech.

Generate & Download

Click generate, then download as WAV (lossless, for editing) or MP3 (compressed, for sharing). All processing happens on your device.

Tips for Kokoro TTS

Use the right model size for your needs. q4 (~305MB) is the sweet spot — best quality-to-size ratio. fp32 (~326MB) offers full precision for studio-grade output.

English works fully offline. After the initial model download, English TTS never contacts any server. Non-English TTS sends only plain text for phonemization — audio synthesis stays local.

Use WebGPU for speed. Chrome and Edge support WebGPU, which generates speech 3-5x faster than WASM. The tool auto-detects and uses the best available backend.

Try multiple voices for different content. Heart excels at warm narration, Bella at energetic delivery, Michael at professional reviews. Each of the 54 voices has a distinct character suited to different use cases.