What languages does Supertonic TTS support on OfflineTTS?

The current OfflineTTS Supertonic integration supports English, Spanish, Portuguese, French, and Korean.

Does Supertonic TTS run locally?

Yes. The model runs in your browser through ONNX Runtime Web using WebGPU when available and WASM as a fallback. Model assets are downloaded first, then cached in the browser.

What do the Steps and Speed controls do?

Steps controls the number of denoising passes. More steps can improve smoothness but take longer. Speed controls speech rate, where 1.0 is neutral, lower values are slower, and higher values are faster.

Supertonic TTS Is Now Available on OfflineTTS

Today we are adding Supertonic TTS to OfflineTTS.

This is a new browser-based text-to-speech engine for people who want local speech generation without API keys, accounts, or server-side audio processing. It joins Kokoro, Piper, and Kitten as another option in the OfflineTTS engine selector.

The first public version of our Supertonic integration focuses on five languages:

English
Spanish
Portuguese
French
Korean

It also includes 10 preset voice styles, WebGPU acceleration when available, WASM fallback for broader browser support, WAV/MP3 export, and browser caching for the model assets after the first load.

Why Add Supertonic?

OfflineTTS has always been built around a simple idea: text-to-speech should be easy to try, private by default, and useful without a cloud API.

Supertonic fits that direction well. Its browser path uses ONNX Runtime Web, which means the model can run directly inside a modern browser tab. You open the tool, load the model, type text, and generate audio locally.

That makes Supertonic useful for:

Private scripts and drafts that should not be sent to a TTS API
Multilingual narration in the currently supported languages
Fast experiments with voice style, speed, and denoising settings
Browser-based workflows where installing a desktop app would be friction
Comparing local TTS engines side by side before choosing one for production

What Is Available Today

The new Supertonic TTS app includes the core controls you need for practical generation:

Control	What it does
Language	Selects the language tag used for the input text
Voice Style	Chooses one of 10 preset styles, M1-M5 and F1-F5
Backend	Uses Auto, WebGPU, or WASM
Steps	Controls the number of denoising passes
Speed	Controls speech rate
Export	Downloads generated audio as WAV or MP3

The integration currently exposes English, Spanish, Portuguese, French, and Korean. We are keeping the language selector limited to what is supported in this build so users do not waste time trying unsupported language tags.

How Steps Work

The Steps setting controls how many denoising iterations the engine runs during generation.

Think of it as a quality and speed trade-off:

Steps	Best for	Trade-off
4-8	Quick drafts and previews	Fastest, but less refined
8	Default everyday use	Balanced quality and latency
12-20	Final exports and careful listening	Smoother, but slower
20+	Experiments	Can be slow in the browser

The default is 8 because it is a practical starting point. If you are testing wording, stay low. If you are exporting final narration, try 12-20 and compare the result.

How Speed Works

The Speed setting controls speech rate.

1.0 is the neutral baseline. Values below 1.0 slow the voice down. Values above 1.0 make it faster.

For most text, the useful range is narrow:

Speed	Result
0.85-0.95	Slower narration, clearer pacing
1.0	Neutral speech rate
1.05-1.25	Slightly faster delivery
1.5+	Useful for experiments, but often rushed

If the output feels too compressed, reduce speed before increasing steps. Pacing problems are often easier to fix with speed than with more denoising.

Browser Caching

Supertonic uses several model files: configuration JSON, a unicode indexer, ONNX sessions for duration prediction, text encoding, denoising, and vocoding, plus voice style JSON.

The first load downloads those assets from the configured model host. After that, OfflineTTS stores them in the browser’s IndexedDB cache, the same cache layer used by the other local engines.

That means repeat visits do not need to re-download the same model files unless the model version changes or browser storage is cleared.

Privacy Model

The Supertonic engine runs text-to-speech inference in the browser.

Your text is normalized, tagged with the selected language, converted into model inputs, and synthesized locally. The generated audio is played in the page and can be exported as WAV or MP3.

There is no account, no API key, and no server-side audio generation in this flow.

How It Compares to Other OfflineTTS Engines

Supertonic is not replacing Kokoro, Piper, or Kitten. It gives you another trade-off profile.

Engine	Best fit
Kokoro	Best all-around quality and a broad curated voice set
Piper	Fast CPU generation and many English voices
Kitten	Smallest model footprint and quick experiments
Supertonic	Local multilingual generation with denoising and style controls

If you are producing English narration and want the most polished voice selection, start with Kokoro. If you want fast CPU-only English generation, try Piper. If you want a tiny model, try Kitten. If you want local generation across English, Spanish, Portuguese, French, and Korean with Supertonic’s style controls, try the new engine.

Try It

Open Supertonic TTS on OfflineTTS, load the model, select a language and voice style, then start with:

Steps: 8
Speed: 1.0 to 1.05
Backend: Auto

For final audio, compare 8 steps against 12-20 steps and keep the speed close to natural speech.

This launch is the first step for Supertonic on OfflineTTS. The immediate goal is to make the browser experience stable, transparent, and useful for real text-to-speech work.