40-240
tiny/base/small
MB model
99
languages
supported
39M-244M
Whisper
params
WebGPU
+WASM fallback
GPU+CPU
About Whisper STT
Whisper is OpenAI's state-of-the-art speech recognition model, now running entirely in your browser. It supports 99 languages with automatic language detection, and produces highly accurate transcriptions with word-level timestamps.
Choose from three model sizes: Tiny (~40MB, fastest), Base (~76MB, good balance), or Small (~240MB, best quality). The model automatically uses WebGPU when available, falling back to WASM for broad compatibility. Audio is processed in a background thread so the UI stays responsive.
Audio never leaves your browser — all processing happens locally. Export transcriptions as plain text, SRT subtitles, or WebVTT captions.
Try our TTS tool:
Kokoro TTS (54 voices · Best quality) ·
Kitten TTS (8 voices · Lightest) ·
Piper TTS (25 voices · Fastest CPU)