Browser STT workspace

Whisper Speech to Text

Upload audio or video, record from your microphone, or load a direct media URL. Transcribe privately in your browser and export TXT, SRT, or VTT.

Private transcription Subtitle exports Whisper models

Switch Tool TTS + STT

Kokoro TTS 54 voices · best quality Kitten TTS 8 voices · lightest Piper TTS 25 voices · CPU fast Supertonic TTS 31 languages · local Whisper STT 99 langs · captions

120-590

tiny/base/small

MB model

languages

supported

TXT/SRT

+ WebVTT

exports

WebGPU

+WASM fallback

GPU+CPU

STT works best on desktop

Speech recognition uses WebGPU/WASM. Desktop Chrome or Edge gives the most reliable result.

About Whisper STT

Whisper is OpenAI's speech recognition model running directly in your browser. Its multilingual model supports 99 languages, and this tool provides a selector for commonly used languages. Use Fast timing for quick transcripts, or select Precise subtitles mode for word-level timestamps and synchronized word highlighting during playback.

Choose from three model sizes: Tiny (~120MB, fastest), Base (~210MB, good balance), or Small (~590MB, best quality). The model automatically uses WebGPU when available, falling back to WASM for broad compatibility. Audio decoding, waveform analysis, and transcription run away from the main interface so it stays responsive.

Audio never leaves your browser. Review the result with the interactive waveform player, seek by clicking the transcript, and export plain text, readable SRT subtitles, or WebVTT captions.

Try our TTS tool: Kokoro TTS (54 voices · Best quality) · Kitten TTS (8 voices · Lightest) · Piper TTS (25 voices · Fastest CPU) · Supertonic TTS (31 languages · Local)

When a transcript needs more production detail

Move from a private transcript to a structured team workflow

Local Whisper is ideal when audio must stay on your device. For meetings, interviews, or subtitle pipelines that need speaker labels, precise timing, non-speech event tags, or an API, ElevenLabs Scribe is built for that job.

Recommended plan

Free or Starter

Free to test · Starter $6/month

Hosted transcription workflows that need diarization, audio tags, and an API.

What it adds

90+ transcription languages
Speaker diarization
Word-level timestamps and audio tags

Stay with OfflineTTS if: privacy, offline transcription, and TXT, SRT, or VTT export cover the job.

Paid affiliate link: OfflineTTS is an independent ElevenLabs affiliate and may earn a commission.

Prices and offers checked July 26, 2026. Verify current terms before upgrading.

Try Scribe transcription

Getting Started with Whisper STT

Choose Model Size

Tiny (~120MB) for quick tests, Base (~210MB) for balanced speed and accuracy, Small (~590MB) for best quality. Download size is approximate and may vary by model variant.

Upload or Record Audio

Upload an audio file (WAV, MP3, WebM, etc.) or record directly in the browser. The tool decodes audio in a background worker for smooth performance.

Transcribe

Choose Fast timing for a quick segment-level transcript or Precise subtitles mode for exact word timing. Streaming output shows draft segments while transcription progresses.

Export Results

Play the audio against the synchronized transcript, seek from the waveform or text, then download plain text, readable SRT subtitles, or WebVTT captions.

Tips for Accurate Transcription

Use clear audio. Low background noise and clear speech produce the best results. If possible, use a good microphone and record in a quiet environment.

Choose the right model. Tiny works well for quick drafts. For production use — subtitles, meeting notes, accessibility — use Small for highest accuracy.

Select the spoken language. Matching the language selector to the recording helps Whisper decode names, punctuation, and multilingual speech more consistently.

Use WebGPU. Chrome and Edge with WebGPU support transcribe significantly faster. The tool auto-selects the best available backend.