Supertonic TTS Is Now Available on OfflineTTS
Today we are adding Supertonic TTS to OfflineTTS.
This is a new browser-based text-to-speech engine for people who want local speech generation without API keys, accounts, or server-side audio processing. It joins Kokoro, Piper, and Kitten as another option in the OfflineTTS engine selector.
The first public version of our Supertonic integration focuses on five languages:
- English
- Spanish
- Portuguese
- French
- Korean
It also includes 10 preset voice styles, WebGPU acceleration when available, WASM fallback for broader browser support, WAV/MP3 export, and browser caching for the model assets after the first load.
Why Add Supertonic?
OfflineTTS has always been built around a simple idea: text-to-speech should be easy to try, private by default, and useful without a cloud API.
Supertonic fits that direction well. Its browser path uses ONNX Runtime Web, which means the model can run directly inside a modern browser tab. You open the tool, load the model, type text, and generate audio locally.
That makes Supertonic useful for:
- Private scripts and drafts that should not be sent to a TTS API
- Multilingual narration in the currently supported languages
- Fast experiments with voice style, speed, and denoising settings
- Browser-based workflows where installing a desktop app would be friction
- Comparing local TTS engines side by side before choosing one for production
What Is Available Today
The new Supertonic TTS app includes the core controls you need for practical generation:
| Control | What it does |
|---|---|
| Language | Selects the language tag used for the input text |
| Voice Style | Chooses one of 10 preset styles, M1-M5 and F1-F5 |
| Backend | Uses Auto, WebGPU, or WASM |
| Steps | Controls the number of denoising passes |
| Speed | Controls speech rate |
| Export | Downloads generated audio as WAV or MP3 |
The integration currently exposes English, Spanish, Portuguese, French, and Korean. We are keeping the language selector limited to what is supported in this build so users do not waste time trying unsupported language tags.
How Steps Work
The Steps setting controls how many denoising iterations the engine runs during generation.
Think of it as a quality and speed trade-off:
| Steps | Best for | Trade-off |
|---|---|---|
| 4-8 | Quick drafts and previews | Fastest, but less refined |
| 8 | Default everyday use | Balanced quality and latency |
| 12-20 | Final exports and careful listening | Smoother, but slower |
| 20+ | Experiments | Can be slow in the browser |
The default is 8 because it is a practical starting point. If you are testing wording, stay low. If you are exporting final narration, try 12-20 and compare the result.
How Speed Works
The Speed setting controls speech rate.
1.0 is the neutral baseline. Values below 1.0 slow the voice down. Values above 1.0 make it faster.
For most text, the useful range is narrow:
| Speed | Result |
|---|---|
| 0.85-0.95 | Slower narration, clearer pacing |
| 1.0 | Neutral speech rate |
| 1.05-1.25 | Slightly faster delivery |
| 1.5+ | Useful for experiments, but often rushed |
If the output feels too compressed, reduce speed before increasing steps. Pacing problems are often easier to fix with speed than with more denoising.
Browser Caching
Supertonic uses several model files: configuration JSON, a unicode indexer, ONNX sessions for duration prediction, text encoding, denoising, and vocoding, plus voice style JSON.
The first load downloads those assets from the configured model host. After that, OfflineTTS stores them in the browser’s IndexedDB cache, the same cache layer used by the other local engines.
That means repeat visits do not need to re-download the same model files unless the model version changes or browser storage is cleared.
Privacy Model
The Supertonic engine runs text-to-speech inference in the browser.
Your text is normalized, tagged with the selected language, converted into model inputs, and synthesized locally. The generated audio is played in the page and can be exported as WAV or MP3.
There is no account, no API key, and no server-side audio generation in this flow.
How It Compares to Other OfflineTTS Engines
Supertonic is not replacing Kokoro, Piper, or Kitten. It gives you another trade-off profile.
| Engine | Best fit |
|---|---|
| Kokoro | Best all-around quality and a broad curated voice set |
| Piper | Fast CPU generation and many English voices |
| Kitten | Smallest model footprint and quick experiments |
| Supertonic | Local multilingual generation with denoising and style controls |
If you are producing English narration and want the most polished voice selection, start with Kokoro. If you want fast CPU-only English generation, try Piper. If you want a tiny model, try Kitten. If you want local generation across English, Spanish, Portuguese, French, and Korean with Supertonic’s style controls, try the new engine.
Try It
Open Supertonic TTS on OfflineTTS, load the model, select a language and voice style, then start with:
- Steps:
8 - Speed:
1.0to1.05 - Backend:
Auto
For final audio, compare 8 steps against 12-20 steps and keep the speed close to natural speech.
This launch is the first step for Supertonic on OfflineTTS. The immediate goal is to make the browser experience stable, transparent, and useful for real text-to-speech work.