← Back to Blog

Supertonic TTS Is Now Available on OfflineTTS

supertonicttsofflinebrowseron-deviceannouncement

Today we are adding Supertonic TTS to OfflineTTS.

This is a new browser-based text-to-speech engine for people who want local speech generation without API keys, accounts, or server-side audio processing. It joins Kokoro, Piper, and Kitten as another option in the OfflineTTS engine selector.

The first public version of our Supertonic integration focuses on five languages:

  • English
  • Spanish
  • Portuguese
  • French
  • Korean

It also includes 10 preset voice styles, WebGPU acceleration when available, WASM fallback for broader browser support, WAV/MP3 export, and browser caching for the model assets after the first load.

Why Add Supertonic?

OfflineTTS has always been built around a simple idea: text-to-speech should be easy to try, private by default, and useful without a cloud API.

Supertonic fits that direction well. Its browser path uses ONNX Runtime Web, which means the model can run directly inside a modern browser tab. You open the tool, load the model, type text, and generate audio locally.

That makes Supertonic useful for:

  • Private scripts and drafts that should not be sent to a TTS API
  • Multilingual narration in the currently supported languages
  • Fast experiments with voice style, speed, and denoising settings
  • Browser-based workflows where installing a desktop app would be friction
  • Comparing local TTS engines side by side before choosing one for production

What Is Available Today

The new Supertonic TTS app includes the core controls you need for practical generation:

ControlWhat it does
LanguageSelects the language tag used for the input text
Voice StyleChooses one of 10 preset styles, M1-M5 and F1-F5
BackendUses Auto, WebGPU, or WASM
StepsControls the number of denoising passes
SpeedControls speech rate
ExportDownloads generated audio as WAV or MP3

The integration currently exposes English, Spanish, Portuguese, French, and Korean. We are keeping the language selector limited to what is supported in this build so users do not waste time trying unsupported language tags.

How Steps Work

The Steps setting controls how many denoising iterations the engine runs during generation.

Think of it as a quality and speed trade-off:

StepsBest forTrade-off
4-8Quick drafts and previewsFastest, but less refined
8Default everyday useBalanced quality and latency
12-20Final exports and careful listeningSmoother, but slower
20+ExperimentsCan be slow in the browser

The default is 8 because it is a practical starting point. If you are testing wording, stay low. If you are exporting final narration, try 12-20 and compare the result.

How Speed Works

The Speed setting controls speech rate.

1.0 is the neutral baseline. Values below 1.0 slow the voice down. Values above 1.0 make it faster.

For most text, the useful range is narrow:

SpeedResult
0.85-0.95Slower narration, clearer pacing
1.0Neutral speech rate
1.05-1.25Slightly faster delivery
1.5+Useful for experiments, but often rushed

If the output feels too compressed, reduce speed before increasing steps. Pacing problems are often easier to fix with speed than with more denoising.

Browser Caching

Supertonic uses several model files: configuration JSON, a unicode indexer, ONNX sessions for duration prediction, text encoding, denoising, and vocoding, plus voice style JSON.

The first load downloads those assets from the configured model host. After that, OfflineTTS stores them in the browser’s IndexedDB cache, the same cache layer used by the other local engines.

That means repeat visits do not need to re-download the same model files unless the model version changes or browser storage is cleared.

Privacy Model

The Supertonic engine runs text-to-speech inference in the browser.

Your text is normalized, tagged with the selected language, converted into model inputs, and synthesized locally. The generated audio is played in the page and can be exported as WAV or MP3.

There is no account, no API key, and no server-side audio generation in this flow.

How It Compares to Other OfflineTTS Engines

Supertonic is not replacing Kokoro, Piper, or Kitten. It gives you another trade-off profile.

EngineBest fit
KokoroBest all-around quality and a broad curated voice set
PiperFast CPU generation and many English voices
KittenSmallest model footprint and quick experiments
SupertonicLocal multilingual generation with denoising and style controls

If you are producing English narration and want the most polished voice selection, start with Kokoro. If you want fast CPU-only English generation, try Piper. If you want a tiny model, try Kitten. If you want local generation across English, Spanish, Portuguese, French, and Korean with Supertonic’s style controls, try the new engine.

Try It

Open Supertonic TTS on OfflineTTS, load the model, select a language and voice style, then start with:

  • Steps: 8
  • Speed: 1.0 to 1.05
  • Backend: Auto

For final audio, compare 8 steps against 12-20 steps and keep the speed close to natural speech.

This launch is the first step for Supertonic on OfflineTTS. The immediate goal is to make the browser experience stable, transparent, and useful for real text-to-speech work.

Share this article

Frequently Asked Questions

What languages does Supertonic TTS support on OfflineTTS?
The current OfflineTTS Supertonic integration supports English, Spanish, Portuguese, French, and Korean.
Does Supertonic TTS run locally?
Yes. The model runs in your browser through ONNX Runtime Web using WebGPU when available and WASM as a fallback. Model assets are downloaded first, then cached in the browser.
What do the Steps and Speed controls do?
Steps controls the number of denoising passes. More steps can improve smoothness but take longer. Speed controls speech rate, where 1.0 is neutral, lower values are slower, and higher values are faster.

Try OfflineTTS

Free. Private. Works offline. 54 voices in 9 languages.

Open TTS Tool