Privacy Policy

Last updated: April 2026

The Short Version

Your audio never leaves your browser. All speech synthesis and recognition happens on your device. For non-English TTS, a small text-to-phoneme request is sent to our server (see Phonemization below). We do not collect, store, or have access to any audio, personal data, or usage patterns.

Data We Collect

Your text and audio: none. We don't collect personal data, text inputs, audio inputs, audio outputs, or usage metrics. There are no accounts and no cookies that identify you personally.

For site improvement, we use Google Analytics (GA4) and Microsoft Clarity to collect anonymized usage data (pages visited, browser type, time on site). See the Third-Party Services section below for details and opt-out options.

Phonemization (Non-English TTS)

Kokoro's browser library natively supports English phonemization โ€” English voices work fully offline. For Japanese, Chinese, Spanish, French, Hindi, Italian, and Portuguese, phonemization requires specialized models (misaki, espeak-ng) that are too large to run in a browser.

  • What is sent: Your plain text is sent to api.offlinetts.com and converted to IPA phoneme strings (pronunciation data, a few bytes)
  • What is received: A phoneme string returned to your browser, where audio synthesis happens locally
  • What is NOT sent: No audio, no personal data, no cookies, no user identifiers
  • Server behavior: The server does not log, store, or retain any text. It processes each request in ~10ms and discards it immediately
  • English voices: Fully offline after model download โ€” no server contact at all

How It Works

  • TTS models (30โ€“90MB) and STT models (40โ€“240MB) are downloaded to your browser on first use and cached in IndexedDB
  • All speech generation and transcription happens on your device using WebGPU or WebAssembly
  • English TTS voices are fully offline after model download โ€” no server contact
  • Non-English TTS voices send plain text to our phonemization server, which returns IPA phoneme strings (pronunciation data). Audio synthesis still happens locally
  • TTS audio is generated in-memory and can be downloaded as WAV files
  • STT audio is decoded in a background Web Worker and transcribed locally
  • No audio data is ever sent to any server

Model Files

AI model files (TTS and STT) are served from Cloudflare R2 (CDN) and Hugging Face. When you load a tool, your browser downloads the required model files. These are the only network requests that occur. The downloads are standard HTTPS file transfers โ€” no tracking data is included.

Cookies

We do not use cookies for tracking. The only browser storage used is IndexedDB for caching model files and OPFS for Web Worker communication, both of which are necessary for offline functionality and performance.

Third-Party Services

We use the following third-party services:

  • Google Analytics (GA4): We use Google Analytics to understand how visitors use our site (pages visited, time on site, browser type). GA4 uses cookies to collect anonymized data. You can opt out by installing the Google Analytics Opt-out Browser Add-on.
  • Microsoft Clarity: We use Microsoft Clarity to analyze user behavior through anonymized heatmaps and session recordings. Clarity uses cookies but does not collect personally identifiable information.
  • Cloudflare Pages: The site is hosted on Cloudflare Pages, which may collect standard web server logs (IP addresses, URLs, timestamps) as part of their infrastructure. These logs are not controlled by OfflineTTS.

Important: These services track website usage patterns โ€” they never have access to your text inputs, audio outputs, or any content you generate. All TTS and STT processing happens entirely on your device and is invisible to these services.

Changes

If we ever change how data is handled, we will update this policy. Our core commitment will never change: your text and audio stay on your device.

Contact

Questions about privacy? Contact us at [email protected].