About OfflineTTS

OfflineTTS is a free, privacy-first AI voice tool that runs entirely in your browser. It offers both text to speech โ€” powered by Kokoro TTS, Kitten TTS, Piper TTS, and Supertonic TTS โ€” and speech to text โ€” powered by OpenAI Whisper. Speech generation and transcription run in your browser; Kokoro non-English text uses the lightweight phonemization path described below.

Our Mission

We believe voice AI should be accessible to everyone โ€” no API keys, no data collection. By running inference directly in the browser, we eliminate server costs and privacy concerns simultaneously. Whether you need text to speech or speech to text, the same principle applies: your data stays on your device.

How It Works

When you first visit a tool, it downloads the AI model and caches it in your browser. TTS models range from lightweight Kitten and Piper downloads to Kokoro quality variants and Supertonic's multi-file ONNX model stack. STT models range from ~40MB (Whisper Tiny) to ~240MB (Whisper Small). After that, audio synthesis happens on your device using WebGPU (or WebAssembly as a fallback). Your audio data never leaves your computer.

Phonemization Architecture

Why server-side phonemization? Kokoro's browser library only supports English phonemization natively. For the other 7 languages (Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese), phonemization requires specialized models โ€” misaki for Japanese and Chinese, and espeak-ng for the rest. These models are too large to bundle in a browser (~50MB+), so we run them on a lightweight server.

What does the server do? The phonemization server converts your text into IPA phoneme strings โ€” a few bytes of pronunciation data. It does not generate audio, store your text, or collect any personal information. The server uses minimal resources (~190MB RAM, no GPU) and each request completes in ~10ms.

Audio synthesis stays local. After receiving the phoneme string, all audio generation happens entirely in your browser via WebGPU/WASM. No audio data is ever sent to any server. English voices are fully offline after model download โ€” they use kokoro-js's built-in phonemizer and never contact the server. Supertonic uses language-tagged text and runs its synthesis path locally after the model files are downloaded.

Technology

  • Kokoro TTS โ€” 82M parameter open-source TTS model (Apache 2.0)
  • Kitten TTS โ€” lightweight TTS with expression-based voices
  • Piper TTS โ€” fast CPU-optimized TTS with 25+ voices
  • Supertonic TTS โ€” multilingual on-device TTS with 10 preset voice styles
  • Whisper STT โ€” OpenAI's speech recognition model, 99 languages
  • browser-whisper โ€” production-oriented Whisper wrapper with Web Workers and streaming
  • ONNX Runtime Web โ€” browser-based ML inference via WebGPU/WASM
  • Astro โ€” fast, SEO-optimized static site framework
  • Cloudflare Pages โ€” edge CDN for global delivery

Open Source

OfflineTTS builds on browser-friendly AI projects including Kokoro, Kitten, Piper, Supertonic, and Whisper. OfflineTTS is built with love for the open-source community and privacy-conscious users everywhere.

Try It Now

4 TTS engines ยท Whisper STT ยท Browser-based ยท Free to use