TTS works best on desktop

Audio generation requires WebGPU/WASM which may not work well on mobile browsers. For the best experience, please use a desktop or laptop computer.

5
en, es, pt, fr, ko
languages
10
M1-M5 + F1-F5
styles
~99M
public ONNX assets
params
WebGPU
+WASM fallback
GPU+CPU

About Supertonic TTS

Supertonic 3 is an on-device multilingual text-to-speech model from Supertone. This browser integration exposes the currently supported language set: English, Spanish, Portuguese, French, and Korean.

The browser integration runs the public ONNX assets through ONNX Runtime Web. It loads separate duration predictor, text encoder, vector estimator, and vocoder models, then performs local denoising and waveform generation directly in the browser.

Supertonic uses language-tagged Unicode text rather than a server-side phonemization API, so synthesis stays local after the initial model and voice-style downloads. The asset base URL is configurable and defaults to Hugging Face, making it straightforward to move model delivery to a CDN or R2 later.

Compare engines: Kokoro TTS (54 voices · Best quality) · Kitten TTS (8 voices · Lightest) · Piper TTS (25 voices · Fastest CPU)

Getting Started with Supertonic TTS

1. Load the ONNX Models

Supertonic uses multiple ONNX assets for duration, text encoding, denoising, and vocoding. The first load downloads and caches the model files from the configured asset host.

2. Choose Language and Style

Pick the language that matches your text, then choose one of ten preset voice styles: five male and five female styles.

3. Set Steps and Speed

Use 8 steps for a practical default. Higher step counts can improve quality but take longer. Speed around 0.9-1.5 is usually the useful range.

4. Generate and Export

Generate speech locally, preview it in the browser, then download WAV for editing or MP3 for sharing.

Tips for Supertonic TTS

1.

Match the selected language to the input text. Supertonic wraps text in language tags before inference. Selecting the right language gives the model the best pronunciation context.

2.

Start with eight denoising steps. Eight steps is the upstream browser demo default. Increase steps for quality checks and reduce them for faster drafts.

3.

Use WebGPU when available. Chrome and Edge can run ONNX Runtime Web with WebGPU. The app falls back to WASM when WebGPU is unavailable.

4.

Keep long passages structured. The tool chunks long text and inserts short pauses. Paragraphs and punctuation help preserve natural pacing.