40-240
tiny/base/small
MB model
99
languages
supported
39M-244M
Whisper
params
WebGPU
+WASM fallback
GPU+CPU

About Whisper STT

Whisper is OpenAI's state-of-the-art speech recognition model, now running entirely in your browser. It supports 99 languages with automatic language detection, and produces highly accurate transcriptions with word-level timestamps.

Choose from three model sizes: Tiny (~40MB, fastest), Base (~76MB, good balance), or Small (~240MB, best quality). The model automatically uses WebGPU when available, falling back to WASM for broad compatibility. Audio is processed in a background thread so the UI stays responsive.

Audio never leaves your browser — all processing happens locally. Export transcriptions as plain text, SRT subtitles, or WebVTT captions.

Try our TTS tool: Kokoro TTS (54 voices · Best quality) · Kitten TTS (8 voices · Lightest) · Piper TTS (25 voices · Fastest CPU)