Browser TTS workspace

Kitten TTS

Free AI text to speech with Kitten TTS. 8 expression-based voices, ~24MB model, WebGPU + WASM. Runs entirely in your browser — offline and private.

Private generation WAV + MP3 export 8 voices · Lightest

Switch Tool TTS + STT

Kokoro TTS 54 voices · Best quality Kitten TTS 8 voices · Lightest Piper TTS 25 voices · Fastest CPU Supertonic TTS 5 languages · Local Whisper STT 99 langs · captions

lightest

MB model

expressions

voices

15M

compact

params

WebGPU

+WASM

GPU+CPU

TTS works best on desktop

Audio generation uses WebGPU/WASM. Desktop Chrome or Edge gives the most reliable result.

About Kitten TTS

Kitten TTS is the lightest engine available on OfflineTTS. At just 24MB, it downloads in seconds and runs on virtually any device — from desktop browsers to mobile phones and embedded hardware.

The 8 expression-based voices cover a range of tones: cheerful, serious, sad, whisper, excited, gentle, calm, and neutral. Each voice is a compact embedding that shapes the model's output character.

Kitten TTS supports configurable sample rates from 8kHz to 48kHz, making it versatile for different quality and performance needs.

Compare engines: Kokoro TTS (54 voices · Best quality) · Piper TTS (25 voices · Fastest CPU) · Supertonic TTS (5 languages · Local)

Getting Started with Kitten TTS

Instant Model Load

At just 24MB, Kitten TTS downloads in seconds — even on slow connections. The model caches in your browser for instant loading on return visits.

Choose an Expression

Select from 8 expressions: cheerful, serious, sad, whisper, excited, gentle, calm, or neutral. Each expression shapes the emotional character of the output.

Set Your Sample Rate

Kitten supports 8kHz to 48kHz output. Use 16kHz for telephony, 22kHz for general use, or 48kHz for production quality. Lower rates generate faster.

Generate & Iterate

Click generate to hear your text spoken. Try different expressions for the same text to find the right tone. Download as WAV or MP3 when you're satisfied.

Tips for Kitten TTS

Use expressions creatively. Whisper expression is perfect for ASMR-style content and intimate narration. Serious expression suits business and formal content. Cheerful works for welcome messages and children's content.

Adjust sample rate for your use case. 8kHz is fine for phone systems and IVR. 16kHz for podcast-quality drafts. 48kHz for production audio. Lower rates = faster generation = smaller files.

Great for prototyping. Because Kitten loads in seconds, it's the fastest way to test how your text sounds in speech. Draft with Kitten, then produce with Kokoro for highest quality.

Works on virtually any device. At 24MB, Kitten runs on low-end laptops, tablets, and even some mobile devices. If Kokoro or Piper feel too heavy, Kitten is your lightweight alternative.