UNPKG

js-tts-wrapper

Version:

A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services

1,412 lines (1,051 loc) 60.2 kB
# js-tts-wrapper A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by [py3-TTS-Wrapper](https://github.com/willwade/tts-wrapper), it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs. ## Table of Contents - [Features](#features) - [Supported TTS Engines](#supported-tts-engines) - [Installation](#installation) - [Installation](#installation-1) - [Using npm scripts](#using-npm-scripts) - [Quick Start](#quick-start) - [Core Functionality](#core-functionality) - [Voice Management](#voice-management) - [Text Synthesis](#text-synthesis) - [Audio Playback](#audio-playback) - [File Output](#file-output) - [Event Handling](#event-handling) - [SSML Support](#ssml-support) - [Speech Markdown Support](#speech-markdown-support) - [Engine-Specific Examples](#engine-specific-examples) - [Browser Support](#browser-support) - [API Reference](#api-reference) - [Contributing](#contributing) - [License](#license) - [Examples and Demos](#examples-and-demos) ## Features - **Unified API**: Consistent interface across multiple TTS providers. - **SSML Support**: Use Speech Synthesis Markup Language to enhance speech synthesis - **Speech Markdown**: Optional support for easier speech markup - **Voice Selection**: Easily browse and select from available voices - **Streaming Synthesis**: Stream audio as it's being synthesized - **Playback Control**: Pause, resume, and stop audio playback - **Word Boundaries**: Get callbacks for word timing (where supported) - **File Output**: Save synthesized speech to audio files - **Browser Support**: Works in both Node.js (server) and browser environments (see engine support table below) ## Supported TTS Engines | Factory Name | Class Name | Environment | Provider | Dependencies | |--------------|------------|-------------|----------|-------------| | `azure` | `AzureTTSClient` | Both | Microsoft Azure Cognitive Services | `@azure/cognitiveservices-speechservices`, `microsoft-cognitiveservices-speech-sdk` | | `google` | `GoogleTTSClient` | Both | Google Cloud Text-to-Speech | `@google-cloud/text-to-speech` | | `gemini` | `GeminiTTSClient` | Both | Gemini Flash TTS | None (uses fetch API) | | `elevenlabs` | `ElevenLabsTTSClient` | Both | ElevenLabs | `node-fetch@2` (Node.js only) | | `watson` | `WatsonTTSClient` | Both | IBM Watson | None (uses fetch API) | | `openai` | `OpenAITTSClient` | Both | OpenAI | `openai` | | `modelslab` | `ModelsLabTTSClient` | Both | ModelsLab | None (uses fetch API) | | `upliftai` | `UpliftAITTSClient` | Both | UpLiftAI | None (uses fetch API) | | `playht` | `PlayHTTTSClient` | Both | PlayHT | `node-fetch@2` (Node.js only) | | `polly` | `PollyTTSClient` | Both | Amazon Web Services | `@aws-sdk/client-polly` | | `sherpaonnx` | `SherpaOnnxTTSClient` | Node.js | k2-fsa/sherpa-onnx | `sherpa-onnx-node`, `decompress`, `decompress-bzip2`, `decompress-tarbz2`, `decompress-targz`, `tar-stream` | | `sherpaonnx-wasm` | `SherpaOnnxWasmTTSClient` | Browser | k2-fsa/sherpa-onnx | None (WASM included) | | `espeak` | `EspeakNodeTTSClient` | Node.js | eSpeak NG | `text2wav` | | `espeak-wasm` | `EspeakBrowserTTSClient` | Both | eSpeak NG | `mespeak` (Node.js) or meSpeak.js (browser) | | `sapi` | `SAPITTSClient` | Node.js | Windows Speech API (SAPI) | None (uses PowerShell) | | `witai` | `WitAITTSClient` | Both | Wit.ai | None (uses fetch API) | | `cartesia` | `CartesiaTTSClient` | Both | Cartesia | None (uses fetch API) | | `deepgram` | `DeepgramTTSClient` | Both | Deepgram | None (uses fetch API) | | `hume` | `HumeTTSClient` | Both | Hume AI | None (uses fetch API) | | `xai` | `XAITTSClient` | Both | xAI (Grok) | None (uses fetch API) | | `fishaudio` | `FishAudioTTSClient` | Both | Fish Audio | None (uses fetch API) | | `mistral` | `MistralTTSClient` | Both | Mistral AI | None (uses fetch API) | | `murf` | `MurfTTSClient` | Both | Murf AI | None (uses fetch API) | | `unrealspeech` | `UnrealSpeechTTSClient` | Both | Unreal Speech | None (uses fetch API) | | `resemble` | `ResembleTTSClient` | Both | Resemble AI | None (uses fetch API) | **Factory Name**: Use with `createTTSClient('factory-name', credentials)` **Class Name**: Use with direct import `import { ClassName } from 'js-tts-wrapper'` **Environment**: Node.js = server-side only, Browser = browser-compatible, Both = works in both environments ### Important: SherpaONNX is optional and does not affect other engines - Importing `js-tts-wrapper` does NOT load `sherpa-onnx-node`. - Cloud engines (Azure, Google, Polly, OpenAI, etc.) work without any SherpaONNX packages installed. - Only when you instantiate `SherpaOnnxTTSClient` (Node-only) will the library look for `sherpa-onnx-node` and its platform package. If SherpaONNX is not installed, the Sherpa engine will gracefully warn/fallback, and other engines remain unaffected. - See the Installation section below for how to install SherpaONNX dependencies only if you plan to use that engine. ### Timing and Audio Format Capabilities #### Word Boundary and Timing Support | Engine | Word Boundaries | Timing Source | Character-Level | Accuracy | |--------|----------------|---------------|-----------------|----------| | **ElevenLabs** | ✅ | **Real API data** | ✅ **NEW!** | **High** | | **Azure** | ✅ | **Real API data** | ❌ | **High** | | **Google** | ✅ | Estimated | ❌ | Low | | **Watson** | ✅ | Estimated | ❌ | Low | | **UpLiftAI** | ✅ | Estimated | ❌ | Low | | **OpenAI** | ✅ | Estimated | ❌ | Low | | **WitAI** | ✅ | Estimated | ❌ | Low | | **PlayHT** | ✅ | Estimated | ❌ | Low | | **Polly** | ✅ | Estimated | ❌ | Low | | **eSpeak** | ✅ | Estimated | ❌ | Low | | **eSpeak-WASM** | ✅ | Estimated | ❌ | Low | | **SherpaOnnx** | ✅ | Estimated | ❌ | Low | | **SherpaOnnx-WASM** | ✅ | Estimated | ❌ | Low | | **SAPI** | ✅ | Estimated | ❌ | Low | | **Cartesia** | ✅ | Estimated | ❌ | Low | | **Deepgram** | ✅ | Estimated | ❌ | Low | | **Hume** | ✅ | Estimated | ❌ | Low | | **xAI** | ✅ | Estimated | ❌ | Low | | **Fish Audio** | ✅ | Estimated | ❌ | Low | | **Mistral** | ✅ | Estimated | ❌ | Low | | **Murf** | ✅ | Estimated | ❌ | Low | | **Unreal Speech** | ✅ | Estimated | ❌ | Low | | **Resemble** | ✅ | Estimated | ❌ | Low | **Character-Level Timing**: Only ElevenLabs provides precise character-level timing data via the `/with-timestamps` endpoint, enabling the most accurate word highlighting and speech synchronization. #### Audio Format Conversion Support | Engine | Native Format | WAV Support | MP3 Conversion | Conversion Method | |--------|---------------|-------------|----------------|-------------------| | **All Engines** | Varies | ✅ | ✅ | Pure JavaScript (lamejs) | **Format Conversion**: All engines support WAV and MP3 output through automatic format conversion. The wrapper uses pure JavaScript conversion (lamejs) when FFmpeg is not available, ensuring cross-platform compatibility without external dependencies. ## Installation The library uses a modular approach where TTS engine-specific dependencies are optional. You can install the package and its dependencies as follows: ### npm install (longer route but more explicit) ```bash # Install the base package npm install js-tts-wrapper # Install dependencies for specific engines npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk # For Azure npm install @google-cloud/text-to-speech # For Google Cloud npm install @aws-sdk/client-polly # For AWS Polly npm install node-fetch@2 # For ElevenLabs and PlayHT npm install openai # For OpenAI npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream # For SherpaOnnx npm install text2wav # For eSpeak NG (Node.js) npm install mespeak # For eSpeak NG-WASM (Node.js) npm install say # For System TTS (Node.js) npm install sound-play pcm-convert # For Node.js audio playback ``` ### Using npm scripts After installing the base package, you can use the npm scripts provided by the package to install specific engine dependencies: ```bash # Navigate to your project directory where js-tts-wrapper is installed cd your-project # Install Azure dependencies npx js-tts-wrapper@latest run install:azure # Install SherpaOnnx dependencies npx js-tts-wrapper@latest run install:sherpaonnx # Install eSpeak NG dependencies (Node.js) npx js-tts-wrapper@latest run install:espeak # Install eSpeak NG-WASM dependencies (Node.js) npx js-tts-wrapper@latest run install:espeak-wasm # Install System TTS dependencies (Node.js) npx js-tts-wrapper@latest run install:system # Install Node.js audio playback dependencies npx js-tts-wrapper@latest run install:node-audio # Install all development dependencies npx js-tts-wrapper@latest run install:all-dev ``` ## Quick Start ### Direct Instantiation #### ESM (ECMAScript Modules) ```javascript import { AzureTTSClient } from 'js-tts-wrapper'; // Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' }); // List available voices const voices = await tts.getVoices(); console.log(voices); // Set a voice tts.setVoice('en-US-AriaNeural'); // Speak some text await tts.speak('Hello, world!'); // Use SSML for more control const ssml = '<speak>Hello <break time="500ms"/> world!</speak>'; await tts.speak(ssml); ``` #### CommonJS ```javascript const { AzureTTSClient } = require('js-tts-wrapper'); // Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' }); // Use async/await within an async function async function runExample() { // List available voices const voices = await tts.getVoices(); console.log(voices); // Set a voice tts.setVoice('en-US-AriaNeural'); // Speak some text await tts.speak('Hello, world!'); // Use SSML for more control const ssml = '<speak>Hello <break time="500ms"/> world!</speak>'; await tts.speak(ssml); } runExample().catch(console.error); ``` ### Using the Factory Pattern The library provides a factory function to create TTS clients dynamically based on the engine name: #### ESM (ECMAScript Modules) ```javascript import { createTTSClient } from 'js-tts-wrapper'; // Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' }); // Use the client as normal await tts.speak('Hello from the factory pattern!'); ``` #### CommonJS ```javascript const { createTTSClient } = require('js-tts-wrapper'); // Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' }); async function runExample() { // Use the client as normal await tts.speak('Hello from the factory pattern!'); } runExample().catch(console.error); ``` The factory supports all engines: `'azure'`, `'google'`, `'gemini'`, `'polly'`, `'elevenlabs'`, `'openai'`, `'modelslab'`, `'playht'`, `'watson'`, `'witai'`, `'sherpaonnx'`, `'sherpaonnx-wasm'`, `'espeak'`, `'espeak-wasm'`, `'sapi'`, `'cartesia'`, `'deepgram'`, `'hume'`, `'xai'`, `'fishaudio'`, `'mistral'`, `'murf'`, `'unrealspeech'`, `'resemble'`, etc. ## Core Functionality All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers. ### Voice Management ```typescript // Get all available voices const voices = await tts.getVoices(); // Get voices for a specific language const englishVoices = await tts.getVoicesByLanguage('en-US'); // Set the voice to use tts.setVoice('en-US-AriaNeural'); ``` The library includes a robust [Language Normalization](docs/LANGUAGE_NORMALIZATION.md) system that standardizes language codes across different TTS engines. This allows you to: - Use BCP-47 codes (e.g., 'en-US') or ISO 639-3 codes (e.g., 'eng') interchangeably - Get consistent language information regardless of the TTS engine - Filter voices by language using any standard format ### Credential Validation All TTS engines support standardized credential validation to help you verify your setup before making requests: ```typescript // Basic validation - returns boolean const isValid = await tts.checkCredentials(); if (!isValid) { console.error('Invalid credentials!'); } // Detailed validation - returns comprehensive status const status = await tts.getCredentialStatus(); console.log(status); /* { valid: true, engine: 'openai', environment: 'node', requiresCredentials: true, credentialTypes: ['apiKey'], message: 'openai credentials are valid and 10 voices are available' } */ ``` **Engine Requirements:** - **Cloud engines** (OpenAI, Azure, Google, etc.): Require API keys/credentials - **Local engines** (eSpeak, SAPI, SherpaOnnx): No credentials needed - **Environment-specific**: Some engines work only in Node.js or browser See the [Credential Validation Guide](docs/CREDENTIAL_VALIDATION.md) for detailed requirements and troubleshooting. ### Text Synthesis ```typescript // Convert text to audio bytes (Uint8Array) const audioBytes = await tts.synthToBytes('Hello, world!'); // Stream synthesis with word boundary information const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!'); ``` ### Audio Playback ```typescript // Traditional text synthesis and playback await tts.speak('Hello, world!'); // NEW: Play audio from different sources without re-synthesizing // Play from file await tts.speak({ filename: 'path/to/audio.mp3' }); // Play from audio bytes const audioBytes = await tts.synthToBytes('Hello, world!'); await tts.speak({ audioBytes: audioBytes }); // Play from audio stream const { audioStream } = await tts.synthToBytestream('Hello, world!'); await tts.speak({ audioStream: audioStream }); // All input types work with speakStreamed too await tts.speakStreamed({ filename: 'path/to/audio.mp3' }); // Playback control tts.pause(); // Pause playback tts.resume(); // Resume playback tts.stop(); // Stop playback // Stream synthesis and play with word boundary callbacks await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => { console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`); }); ``` #### Benefits of Multi-Source Audio Playback - **Avoid Double Synthesis**: Use `synthToFile()` to save audio, then play the same file with `speak({ filename })` without re-synthesizing - **Platform Independent**: Works consistently across browser and Node.js environments - **Efficient Reuse**: Play the same audio bytes or stream multiple times without regenerating - **Flexible Input**: Choose the most convenient input source for your use case > **Note**: Audio playback with `speak()` and `speakStreamed()` methods is supported in both browser environments and Node.js environments with the optional `sound-play` package installed. To enable Node.js audio playback, install the required packages with `npm install sound-play pcm-convert` or use the npm script `npx js-tts-wrapper@latest run install:node-audio`. ### File Output ```typescript // Save synthesized speech to a file await tts.synthToFile('Hello, world!', 'output', 'mp3'); ``` ### Event Handling ```typescript // Register event handlers tts.on('start', () => console.log('Speech started')); tts.on('end', () => console.log('Speech ended')); tts.on('boundary', (word, start, end) => { console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`); }); // Alternative event connection tts.connect('onStart', () => console.log('Speech started')); tts.connect('onEnd', () => console.log('Speech ended')); ``` ### Word Boundary Events and Timing Word boundary events provide precise timing information for speech synchronization, word highlighting, and interactive applications. #### Basic Word Boundary Usage ```typescript // Enable word boundary events tts.on('boundary', (word, startTime, endTime) => { console.log(`"${word}" spoken from ${startTime}s to ${endTime}s`); }); await tts.speak('Hello world, this is a test.'); // Output: // "Hello" spoken from 0.000s to 0.300s // "world," spoken from 0.300s to 0.600s // "this" spoken from 0.600s to 0.900s // ... ``` #### Advanced Timing with Character-Level Precision (ElevenLabs) ```typescript // ElevenLabs: Enable character-level timing for maximum accuracy const tts = createTTSClient('elevenlabs'); // Method 1: Using synthToBytestream with timestamps const result = await tts.synthToBytestream('Hello world', { useTimestamps: true }); console.log(`Generated ${result.wordBoundaries.length} word boundaries:`); result.wordBoundaries.forEach(wb => { const startSec = wb.offset / 10000; const durationSec = wb.duration / 10000; console.log(`"${wb.text}": ${startSec}s - ${startSec + durationSec}s`); }); // Method 2: Using enhanced callback support await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => { console.log(`Precise timing: "${word}" from ${start}s to ${end}s`); }); ``` #### Real-Time Word Highlighting Example ```typescript // Example: Real-time word highlighting for accessibility const textElement = document.getElementById('text'); const words = 'Hello world, this is a test.'.split(' '); let wordIndex = 0; tts.on('boundary', (word, startTime, endTime) => { // Highlight current word if (wordIndex < words.length) { textElement.innerHTML = words.map((w, i) => i === wordIndex ? `<mark>${w}</mark>` : w ).join(' '); wordIndex++; } }); await tts.speak('Hello world, this is a test.', { useWordBoundary: true }); ``` ## SSML Support The library provides comprehensive SSML (Speech Synthesis Markup Language) support with engine-specific capabilities: ### SSML-Supported Engines The following engines **support SSML**: - **Google Cloud TTS** - Full SSML support with all elements - **Microsoft Azure** - Full SSML support with voice-specific features - **Amazon Polly** - Dynamic SSML support based on voice engine type (standard/long-form: full, neural/generative: limited) - **WitAI** - Full SSML support - **SAPI (Windows)** - Full SSML support - **eSpeak/eSpeak-WASM** - SSML support with subset of elements ### Non-SSML Engines The following engines **automatically strip SSML tags** and convert to plain text: - **ElevenLabs** - SSML tags are removed, plain text is synthesized - **OpenAI** - SSML tags are removed, plain text is synthesized - **PlayHT** - SSML tags are removed, plain text is synthesized - **ModelsLab** - SSML tags are removed, plain text is synthesized - **SherpaOnnx/SherpaOnnx-WASM** - SSML tags are removed, plain text is synthesized - **Cartesia** - SSML tags removed; audio tags (`[laugh]`, `[sigh]`, etc.) mapped to `<emotion>` for sonic-3, stripped for others - **Deepgram** - SSML tags are removed, plain text is synthesized - **Hume** - SSML tags are removed, plain text is synthesized - **Gemini** - SSML tags are removed; Gemini audio tags are passed natively - **xAI** - SSML tags are removed; audio tags passed natively for grok-tts - **Fish Audio** - SSML tags removed; audio tags passed natively for s2-pro - **Mistral** - SSML tags are removed, plain text is synthesized - **Murf** - SSML tags are removed, plain text is synthesized - **Unreal Speech** - SSML tags are removed, plain text is synthesized - **Resemble** - SSML tags are removed, plain text is synthesized ### Usage Examples ```typescript // Use SSML directly (works with supported engines) const ssml = ` <speak> <prosody rate="slow" pitch="low"> This text will be spoken slowly with a low pitch. </prosody> <break time="500ms"/> <emphasis level="strong">This text is emphasized.</emphasis> </speak> `; await tts.speak(ssml); // Or use the SSML builder const ssmlText = tts.ssml .prosody({ rate: 'slow', pitch: 'low' }, 'This text will be spoken slowly with a low pitch.') .break(500) .emphasis('strong', 'This text is emphasized.') .toString(); await tts.speak(ssmlText); ``` ### Engine-Specific SSML Notes - **Amazon Polly**: SSML support varies by voice engine type: - **Standard voices**: Full SSML support including all tags - **Long-form voices**: Full SSML support including all tags - **Neural voices**: Limited SSML support (no emphasis, limited prosody) - **Generative voices**: Limited SSML support (partial tag support) - The library automatically detects voice engine types and handles SSML appropriately - **Microsoft Azure**: Supports voice-specific SSML elements and custom voice tags - Supports MS-specific tags like `<mstts:express-as>` for emotional styles - The library automatically injects the required `xmlns:mstts` namespace when needed - **Google Cloud**: Supports the most comprehensive set of SSML elements - **WitAI**: Full SSML support according to W3C specification - **SAPI**: Windows-native SSML support with system voice capabilities - **eSpeak**: Supports SSML subset including prosody, breaks, and emphasis elements ### Raw SSML Pass-Through Speech Markdown and the built-in SSML helpers cover most use cases, but there are times when you need to send hand-crafted SSML—custom namespaces, experimental tags, or markup generated by another tool. In those cases you can use the `rawSSML` flag to bypass Speech Markdown conversion and SSML validation: ```typescript // Example: Azure multi-speaker dialog, currently easier to author as raw SSML const azureDialogSSML = `<speak xmlns:mstts="https://www.w3.org/2001/mstts"> <mstts:dialog> <mstts:dialogturn speaker="narrator"> Welcome to the demo. </mstts:dialogturn> <mstts:dialogturn speaker="character"> Hey there! This turn uses a different voice. </mstts:dialogturn> </mstts:dialog> </speak>`; await tts.speak(azureDialogSSML, { rawSSML: true }); // When rawSSML=true the wrapper will: // 1. Skip Speech Markdown conversion // 2. Skip SSML validation / normalization // 3. Pass the SSML through unchanged (aside from ensuring <speak> exists) ``` **Important:** When you opt into `rawSSML`, you are responsible for producing provider-compliant SSML. The wrapper only wraps the payload with `<speak>` if missing and adds obvious namespaces, but it does not attempt to sanitize or validate the markup. Need to mix-and-match? Convert Speech Markdown to SSML yourself (using `speechmarkdown-js` directly) and then send it through `rawSSML: true` to avoid duplicate parsing: ```typescript import { SpeechMarkdown } from "speechmarkdown-js"; const markdown = "(Hello!)[excited:\"1.5\"] with (characters)[character:superhero]"; const ssml = await SpeechMarkdown.toSSML(markdown, { platform: "microsoft-azure" }); await tts.speak(ssml, { rawSSML: true }); ``` If you hit a Speech Markdown feature gap, consider contributing upstream—the library powers our conversion pipeline, so improvements there benefit every js-tts-wrapper user. ## Speech Markdown Support The library supports Speech Markdown for easier speech formatting across **all engines**. Speech Markdown is powered by the **[speechmarkdown-js](https://github.com/speechmarkdown/speechmarkdown-js)** library, which provides comprehensive platform-specific support. ### How Speech Markdown Works - **SSML-supported engines**: Speech Markdown is converted to SSML (with platform-specific optimizations), then processed natively - **Non-SSML engines**: Speech Markdown is converted to SSML, then SSML tags are stripped to plain text ### Platform-Specific Features The speechmarkdown-js library ships dedicated formatters for every major provider: - **Microsoft Azure**: Automatic `mstts` namespace injection, inline `<lang>` sections, and 27 `mstts:express-as` styles (excited, chat, newscaster, customerservice, etc.) with optional `styledegree` intensity. Section modifiers such as `#[excited]` are supported as long as you leave a blank line before the section and close it with `#[defaults]` (or another section tag). - **Amazon Polly**: Emotional styles and neural/standard voice effects that map cleanly onto Polly’s SSML dialect. - **Google Cloud**: Google Assistant style tags, multi-language voices, and automatic `<lang>` handling. - **ElevenLabs**: A formatter that emits ElevenLabs’ prompt markup (`<break time="…">`, IPA phonemes, etc.), so you can feed Speech Markdown directly into ElevenLabs if you bypass the wrapper or use `rawSSML`. - **WitAI / Microsoft SAPI / IBM Watson / W3C**: Full SSML support for their respective dialects. - **And more**: See the [speechmarkdown-js README](https://github.com/speechmarkdown/speechmarkdown-js?tab=readme-ov-file#speechmarkdown-js) for the complete, always up-to-date list. ### Usage ```typescript // Use Speech Markdown with any engine const markdown = "Hello [500ms] world! ++This text is emphasized++ (slowly)[rate:\"slow\"] (high)[pitch:\"high\"] (loudly)[volume:\"loud\"]"; await tts.speak(markdown, { useSpeechMarkdown: true }); // If you omit useSpeechMarkdown, the wrapper auto-enables it when Speech Markdown syntax is detected. // Platform-specific Speech Markdown features // Azure: Section modifiers map to mstts:express-as const azureMarkdown = ` #[excited] This entire section is excited! Multiple sentences work too. #[defaults] Back to the neutral voice. `; await azureTts.speak(azureMarkdown, { useSpeechMarkdown: true }); // Speech Markdown works with all engines const ttsGoogle = new TTSClient('google'); const ttsElevenLabs = new TTSClient('elevenlabs'); // Both will handle Speech Markdown appropriately await ttsGoogle.speak(markdown, { useSpeechMarkdown: true }); // Converts to SSML await ttsElevenLabs.speak(markdown, { useSpeechMarkdown: true }); // Uses the ElevenLabs formatter (tags are stripped before hitting the API) // Need ElevenLabs prompt markup? // Convert directly and pass through rawSSML or your own API client: import { SpeechMarkdown as SMSpeechMarkdown } from "speechmarkdown-js"; const elevenLabsMarkup = await new SMSpeechMarkdown().toSSML(markdown, { platform: "elevenlabs" }); await elevenLabsClient.speak(elevenLabsMarkup, { rawSSML: true }); ``` ### Supported Speech Markdown Elements - `[500ms]` or `[break:"500ms"]` - Pauses/breaks - `++text++` or `+text+` - Text emphasis - `(text)[rate:"slow"]` - Speech rate control - `(text)[pitch:"high"]` - Pitch control - `(text)[volume:"loud"]` - Volume control - **Platform-specific**: See [speechmarkdown-js documentation](https://github.com/speechmarkdown/speechmarkdown-js) for platform-specific features like Azure's express-as styles ### Node & CI: Configuring the Speech Markdown Converter The full **speechmarkdown-js** converter now loads by default in both Node and browser environments. If you need to opt out (for very small lambda bundles or for deterministic tests), you can: ```bash # Disable globally SPEECHMARKDOWN_DISABLE=true npm test # Or force-enable/disable explicitly SPEECHMARKDOWN_ENABLE=false npm test SPEECHMARKDOWN_ENABLE=true npm test ``` Or disable/enable programmatically: ```ts import { SpeechMarkdown } from "js-tts-wrapper"; SpeechMarkdown.configureSpeechMarkdown({ enabled: false }); // fallback-only SpeechMarkdown.configureSpeechMarkdown({ enabled: true }); // ensure full parser ``` Alternatively, you can import the function directly: ```ts import { configureSpeechMarkdown } from "js-tts-wrapper"; configureSpeechMarkdown({ enabled: true }); // ensure full parser ``` When disabled, js-tts-wrapper falls back to the lightweight built-in converter (suitable for basic `[break]` patterns). Re-enable it to regain advanced tags (Azure express-as, Polly styles, google:style, etc.). ### Engine Compatibility | Engine | Speech Markdown Support | Processing Method | |--------|------------------------|-------------------| | Google Cloud TTS | ✅ Full | → SSML → Native processing | | Microsoft Azure | ✅ Full | → SSML → Native processing | | Amazon Polly | ✅ Full | → SSML → Dynamic processing (engine-dependent) | | WitAI | ✅ Full | → SSML → Native processing | | SAPI | ✅ Full | → SSML → Native processing | | eSpeak | ✅ Full | → SSML → Native processing | | ElevenLabs | ✅ Converted | → SSML → Plain text | | OpenAI | ✅ Converted | → SSML → Plain text | | PlayHT | ✅ Converted | → SSML → Plain text | | SherpaOnnx | ✅ Converted | → SSML → Plain text | | Cartesia | ✅ Converted | → SSML → Plain text | | Deepgram | ✅ Converted | → SSML → Plain text | | Hume | ✅ Converted | → SSML → Plain text | | Gemini | ✅ Converted | → SSML → Plain text | | xAI | ✅ Converted | → SSML → Plain text | | Fish Audio | ✅ Converted | → SSML → Plain text | | Mistral | ✅ Converted | → SSML → Plain text | | Murf | ✅ Converted | → SSML → Plain text | | Unreal Speech | ✅ Converted | → SSML → Plain text | | Resemble | ✅ Converted | → SSML → Plain text | ### Speech Markdown vs Raw SSML: When to Use Each The library provides two complementary approaches for controlling speech synthesis: | Approach | Use Case | Example | |----------|----------|---------| | **Speech Markdown** | Easy, readable syntax for common features | `(Hello!)[excited:"1.5"]` | | **Raw SSML** | Direct control, advanced features, provider-specific tags | `<mstts:express-as style="friendly">Hello!</mstts:express-as>` | **Speech Markdown Flow:** ``` Speech Markdown → speechmarkdown-js → Platform-specific SSML → Provider ``` **Raw SSML Flow:** ``` Raw SSML → Minimal processing → Provider ``` **When to use Speech Markdown:** - You want readable, maintainable code - You're using common features (breaks, emphasis, rate, pitch, volume) - You want platform-specific optimizations automatically applied - You want the same code to work across multiple TTS engines **When to use Raw SSML with `rawSSML: true`:** - You need advanced provider-specific features (e.g., Azure's mstts:dialog for multi-speaker) - You're working with SSML generated by other tools - You need fine-grained control over SSML structure - You want to bypass validation for experimental features **Combining both approaches:** ```typescript // Use speechmarkdown-js directly for advanced features import { SpeechMarkdown } from 'speechmarkdown-js'; const markdown = "(This is exciting!)[excited:\"1.5\"] with (multi-speaker)[mstts:dialog]"; const ssml = await SpeechMarkdown.toSSML(markdown, "microsoft-azure"); // Pass the result with rawSSML to bypass wrapper validation await tts.speak(ssml, { rawSSML: true }); ``` ## Engine-Specific Examples Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats: ### Azure #### ESM ```javascript import { AzureTTSClient } from 'js-tts-wrapper'; const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' }); await tts.speak('Hello from Azure!'); ``` #### CommonJS ```javascript const { AzureTTSClient } = require('js-tts-wrapper'); const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' }); // Inside an async function await tts.speak('Hello from Azure!'); ``` ### Google Cloud Note: Google Cloud TTS supports both authentication methods — Service Account (Node SDK) and API key (REST, browser‑safe). #### ESM ```javascript import { GoogleTTSClient } from 'js-tts-wrapper'; const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' }); await tts.speak('Hello from Google Cloud!'); ``` #### CommonJS ```javascript const { GoogleTTSClient } = require('js-tts-wrapper'); const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' }); // Inside an async function await tts.speak('Hello from Google Cloud!'); ``` #### API key mode (Node or Browser) Google Cloud Text-to-Speech also supports an API key over the REST API. This is browser-safe and requires no service account file. Restrict the key in Google Cloud Console (enable only Text-to-Speech API and restrict by HTTP referrer for browser use). ESM (Node or Browser): ```javascript import { GoogleTTSClient } from 'js-tts-wrapper'; const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key', // optional defaults voiceId: 'en-US-Wavenet-D', lang: 'en-US' }); await tts.speak('Hello from Google TTS with API key!'); ``` CommonJS (Node): ```javascript const { GoogleTTSClient } = require('js-tts-wrapper'); const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key' }); (async () => { await tts.speak('Hello from Google TTS with API key!'); })(); ``` Notes: - REST v1 does not return word timepoints; the wrapper provides estimated timings for boundary events. - For true timings, use service account credentials (Node) where the beta client can be used. - Environment variable supported by examples/tests: `GOOGLECLOUDTTS_API_KEY`. ### Gemini Flash TTS Gemini Flash TTS uses the Gemini API, not Google Cloud Text-to-Speech. Configure `GEMINI_API_KEY` or pass `apiKey` directly. Enable the **Gemini API** (`generativelanguage.googleapis.com`) in your Google Cloud project. Google Cloud Text-to-Speech (`texttospeech.googleapis.com`) is not used by this engine. #### ESM ```javascript import { GeminiTTSClient } from 'js-tts-wrapper'; const tts = new GeminiTTSClient({ apiKey: process.env.GEMINI_API_KEY, model: 'gemini-3.1-flash-tts-preview', voice: 'Kore' }); const audio = await tts.synthToBytes('Say cheerfully: Have a wonderful day!'); ``` #### Factory ```javascript import { createTTSClient } from 'js-tts-wrapper'; const tts = createTTSClient('gemini', { apiKey: process.env.GEMINI_API_KEY, voice: 'Puck' }); await tts.speak('[excitedly] Hello from Gemini Flash TTS!'); ``` Notes: - Supported models: `gemini-3.1-flash-tts-preview` (default) and `gemini-2.5-flash-preview-tts`. - Supported voices: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat. - `getVoices()` returns documented Gemini voice gender, plus `metadata.style` for each voice. - `languageCodes` includes documented Gemini-TTS language/accent BCP-47 codes; synthesis still uses Gemini's automatic language detection. - `metadata.languageReadiness` identifies documented language launch readiness as `GA` or `Preview`. - Gemini TTS does not support SSML; SSML tags are stripped before synthesis. - Gemini TTS does not provide true streaming; `synthToBytestream()` wraps the completed audio bytes in a stream. - Output is WAV by default. Use `{ format: 'pcm' }` to return raw PCM. - Gemini audio tags can be included directly in text, such as `[whispers]`, `[laughs]`, or `[excitedly]`. ### AWS Polly #### ESM ```javascript import { PollyTTSClient } from 'js-tts-wrapper'; const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' }); await tts.speak('Hello from AWS Polly!'); ``` #### CommonJS ```javascript const { PollyTTSClient } = require('js-tts-wrapper'); const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' }); // Inside an async function await tts.speak('Hello from AWS Polly!'); ``` ### ElevenLabs #### ESM ```javascript import { ElevenLabsTTSClient } from 'js-tts-wrapper'; const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' }); await tts.speak('Hello from ElevenLabs!'); ``` #### CommonJS ```javascript const { ElevenLabsTTSClient } = require('js-tts-wrapper'); const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' }); // Inside an async function await tts.speak('Hello from ElevenLabs!'); ``` ### OpenAI #### ESM ```javascript import { OpenAITTSClient } from 'js-tts-wrapper'; const tts = new OpenAITTSClient({ apiKey: 'your-api-key' }); await tts.speak('Hello from OpenAI!'); ``` #### CommonJS ```javascript const { OpenAITTSClient } = require('js-tts-wrapper'); const tts = new OpenAITTSClient({ apiKey: 'your-api-key' }); // Inside an async function await tts.speak('Hello from OpenAI!'); ``` ### PlayHT #### ESM ```javascript import { PlayHTTTSClient } from 'js-tts-wrapper'; const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' }); await tts.speak('Hello from PlayHT!'); ``` #### CommonJS ```javascript const { PlayHTTTSClient } = require('js-tts-wrapper'); const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' }); // Inside an async function await tts.speak('Hello from PlayHT!'); ``` ### IBM Watson #### ESM ```javascript import { WatsonTTSClient } from 'js-tts-wrapper'; const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' }); await tts.speak('Hello from IBM Watson!'); ``` #### CommonJS ```javascript const { WatsonTTSClient } = require('js-tts-wrapper'); const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' }); // Inside an async function await tts.speak('Hello from IBM Watson!'); ``` ### Wit.ai #### ESM ```javascript import { WitAITTSClient } from 'js-tts-wrapper'; const tts = new WitAITTSClient({ token: 'your-wit-ai-token' }); await tts.speak('Hello from Wit.ai!'); ``` #### CommonJS ```javascript const { WitAITTSClient } = require('js-tts-wrapper'); const tts = new WitAITTSClient({ token: 'your-wit-ai-token' }); // Inside an async function await tts.speak('Hello from Wit.ai!'); ``` ### SherpaOnnx (Offline TTS) #### ESM ```javascript import { SherpaOnnxTTSClient } from 'js-tts-wrapper'; const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed await tts.speak('Hello from SherpaOnnx!'); ``` #### CommonJS ```javascript const { SherpaOnnxTTSClient } = require('js-tts-wrapper'); const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed // Inside an async function await tts.speak('Hello from SherpaOnnx!'); ``` > **Note**: SherpaOnnx is a server-side only engine and requires specific environment setup. See the [SherpaOnnx documentation](docs/sherpaonnx.md) for details on setup and configuration. For browser environments, use [SherpaOnnx-WASM](docs/sherpaonnx-wasm.md) instead. ### eSpeak NG (Node.js) #### ESM ```javascript import { EspeakNodeTTSClient } from 'js-tts-wrapper'; const tts = new EspeakNodeTTSClient(); await tts.speak('Hello from eSpeak NG!'); ``` #### CommonJS ```javascript const { EspeakNodeTTSClient } = require('js-tts-wrapper'); const tts = new EspeakNodeTTSClient(); // Inside an async function await tts.speak('Hello from eSpeak NG!'); ``` > **Note**: This engine uses the `text2wav` package and is designed for Node.js environments only. For browser environments, use the eSpeak NG Browser engine instead. ### eSpeak NG (Browser) #### ESM ```javascript import { EspeakBrowserTTSClient } from 'js-tts-wrapper'; const tts = new EspeakBrowserTTSClient(); await tts.speak('Hello from eSpeak NG Browser!'); ``` #### CommonJS ```javascript const { EspeakBrowserTTSClient } = require('js-tts-wrapper'); const tts = new EspeakBrowserTTSClient(); // Inside an async function await tts.speak('Hello from eSpeak NG Browser!'); ``` > **Note**: This engine works in both Node.js (using the `mespeak` package) and browser environments (using meSpeak.js). For browser use, include meSpeak.js in your HTML before using this engine. #### Backward Compatibility For backward compatibility, the old class names are still available: - `EspeakTTSClient` (alias for `EspeakNodeTTSClient`) - `EspeakWasmTTSClient` (alias for `EspeakBrowserTTSClient`) However, we recommend using the new, clearer names in new code. ### Windows SAPI (Windows-only) #### ESM ```javascript import { SAPITTSClient } from 'js-tts-wrapper'; const tts = new SAPITTSClient(); await tts.speak('Hello from Windows SAPI!'); ``` #### CommonJS ```javascript const { SAPITTSClient } = require('js-tts-wrapper'); const tts = new SAPITTSClient(); // Inside an async function await tts.speak('Hello from Windows SAPI!'); ``` > **Note**: This engine is **Windows-only** ### Cartesia ```javascript import { CartesiaTTSClient } from 'js-tts-wrapper'; const tts = new CartesiaTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('sonic-3'); // or 'sonic-2' await tts.speak('Hello from Cartesia!'); ``` > Audio tags like `[laugh]`, `[sigh]` are mapped to `<emotion>` SSML for sonic-3, stripped for other models. ### Deepgram ```javascript import { DeepgramTTSClient } from 'js-tts-wrapper'; const tts = new DeepgramTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('aura-2-asteria-en'); await tts.speak('Hello from Deepgram!'); ``` > Uses a static voice list. Model and voice are combined in the URL parameter. ### Hume AI ```javascript import { HumeTTSClient } from 'js-tts-wrapper'; const tts = new HumeTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('ito'); // or any Hume voice name await tts.speak('Hello from Hume!'); ``` > Supports `octave-2` and `octave-1` models. Streaming uses a separate `/tts/stream/file` endpoint. ### xAI (Grok) ```javascript import { XAITTSClient } from 'js-tts-wrapper'; const tts = new XAITTSClient({ apiKey: 'your-api-key' }); await tts.speak('Hello from xAI!'); ``` > Native audio tag passthrough for grok-tts model. Language can be configured via properties. ### Fish Audio ```javascript import { FishAudioTTSClient } from 'js-tts-wrapper'; const tts = new FishAudioTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('your-voice-reference-id'); await tts.speak('Hello from Fish Audio!'); ``` > Model ID is passed as a header. Audio tags passed natively for s2-pro model. ### Mistral ```javascript import { MistralTTSClient } from 'js-tts-wrapper'; const tts = new MistralTTSClient({ apiKey: 'your-api-key' }); await tts.speak('Hello from Mistral!'); ``` > Uses SSE streaming with base64 audio chunks. Non-streaming returns base64 JSON. ### Murf ```javascript import { MurfTTSClient } from 'js-tts-wrapper'; const tts = new MurfTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('en-US-natalie'); await tts.speak('Hello from Murf!'); ``` > Two models: GEN2 (base64 response) and FALCON (binary streaming). Uses static voice list. ### Unreal Speech ```javascript import { UnrealSpeechTTSClient } from 'js-tts-wrapper'; const tts = new UnrealSpeechTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('Scarlett'); await tts.speak('Hello from Unreal Speech!'); ``` > Non-streaming uses two-step URI-based flow. Streaming returns audio directly. ### Resemble ```javascript import { ResembleTTSClient } from 'js-tts-wrapper'; const tts = new ResembleTTSClient({ apiKey: 'your-api-key' }); await tts.setVoice('your-voice-id'); await tts.speak('Hello from Resemble!'); ``` > Non-streaming returns base64 JSON. Streaming returns raw binary audio. ## API Reference ### Factory Function | Function | Description | Return Type | |--------|-------------|-------------| | `createTTSClient(engine, credentials)` | Create a TTS client for the specified engine | `AbstractTTSClient` | ### Common Methods (All Engines) | Method | Description | Return Type | |--------|-------------|-------------| | `getVoices()` | Get all available voices | `Promise<UnifiedVoice[]>` | | `getVoicesByLanguage(language)` | Get voices for a specific language | `Promise<UnifiedVoice[]>` | | `setVoice(voiceId, lang?)` | Set the voice to use | `void` | | `synthToBytes(text, options?)` | Convert text to audio bytes | `Promise<Uint8Array>` | | `synthToBytestream(text, options?)` | Stream synthesis with word boundaries | `Promise<{audioStream, wordBoundaries}>` | | `speak(text, options?)` | Synthesize and play audio | `Promise<void>` | | `speakStreamed(text, options?)` | Stream synthesis and play | `Promise<void>` | | `synthToFile(text, filename, format?, options?)` | Save synthesized speech to a file | `Promise<void>` | | `startPlaybackWithCallbacks(text, callback, options?)` | Play with word boundary callbacks | `Promise<void>` | | `pause()` | Pause audio playback | `void` | | `resume()` | Resume audio playback | `void` | | `stop()` | Stop audio playback | `void` | | `on(event, callback)` | Register event handler | `void` | | `connect(event, callback)` | Connect to event | `void` | | `checkCredentials()` | Check if credentials are valid | `Promise<boolean>` | | `checkCredentialsDetailed()` | Check if credentials are valid with detailed response | `Promise<CredentialsCheckResult>` | | `getProperty(propertyName)` | Get a property value | `PropertyType` | | `setProperty(propertyName, value)` | Set a property value | `void` | The `checkCredentialsDetailed()` method returns a `CredentialsCheckResult` object with the following properties: ```typescript { success: boolean; // Whether the credentials are valid error?: string; // Error message if credentials are invalid voiceCount?: number; // Number of voices available if credentials are valid } ``` ### SSML Builder Methods The `ssml` property provides a builder for creating SSML: | Method | Description | |--------|-------------| | `prosody(attrs, text)` | Add prosody element | | `break(time)` | Add break element | | `emphasis(level, text)` | Add emphasis element | | `sayAs(interpretAs, text)` | Add say-as element | | `phoneme(alphabet, ph, text)` | Add phoneme element | | `sub(alias, text)` | Add substitution element | | `toString()` | Convert to SSML string | ## Browser Support The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle: ```html <!-- Using ES modules (recommended) --> <script type="module"> import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser'; // Create a new SherpaOnnx WebAssembly TTS client const ttsClient = new SherpaOnnxWasmTTSClient(); // Initialize the WebAssembly module await ttsClient.initializeWasm('./sherpaonnx-wasm/sherpaonnx.js'); // Get available voices const voices = await ttsClient.getVoices(); console.log(`Found ${voices.length} voices`); // Set the voice await ttsClient.setVoice(voices[0].id); // Speak some text await ttsClient.speak('Hello, world!'); </script> ``` ### SherpaOnnx-WASM (Browser) – options and capabilities - Auto-load WASM: pass either `wasmBaseUrl` (directory with sherpaonnx.js + .wasm) or `wasmPath` (full glue JS URL). The runtime loads the glue and points Module.locateFile to fetch the .wasm. - Models index: set `mergedModelsUrl` to your hosted merged_models.json (defaults to ./data/merged_models.json when available). - Capabilities: each client exposes `client.capabilities` to help UIs filter engines. ```html <script type="module"> import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser'; const tts = new SherpaOnnxWasmTTSClient({ wasmBaseUrl: '/assets/sherpaonnx', // or: wasmPath: '/assets/sherpaonnx/sherpaonnx.js' mergedModelsUrl: '/assets/data/merged_models.json', }); console.log(tts.capabilities); // { browserSupported: true, nodeSupported: false, needsWasm: true } await tts.speak('Hello from SherpaONNX WASM'); </script> ``` #### Hosted WASM assets (optional) For convenience, we publish prebuilt SherpaONNX TTS WebAssembly files to a separate assets repository. You can use these as a quick-start base URL, or self-host them for production. - Default CDN base (via jsDelivr): - https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/tts/vocoder-models - Files included (loader-only build: no .data file): - sherpa-onnx-tts.js (glue; sometimes named sherpa-onnx.js depending on upstream tag) - sherpa-onnx-wasm-main-tts.wasm (or sherpa-onnx-wasm-main.wasm) - sherpa-onnx-wasm-main-tts.js (or sherpa-onnx-wasm-main.js) - Models index (merged_models.json): - Canonical latest: https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/models/merged_models.json - Snapshot for this WASM tag: https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/tts/<sherpa_tag>/merged_models.json - Example (using hosted artifacts): ```html <script type="module"> import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser'; const base = 'https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/tts/<sherpa_tag>'; const tts = new SherpaOnnxWasmTTSClient({ // Prefer explicit glue filename from upstream wasmPath: `${base}/sherpa-onnx-tts.js`, // Use canonical models index (or the per-tag snapshot URL) mergedModelsUrl: 'https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/models/merged_models.json', }); await tts.speak('Hello from SherpaONNX WASM'); </script> ``` Notes: #### Hosting on Hugging Face (avoids jsDelivr 50 MB cap) You can self-host the loader-only WASM on Hugging Face (recommended for large artifacts): - Create a Dataset or Model repo, e.g. datasets/willwade/js-tts-wrapper-wasm - Upload these files into a folder like sherpaonnx/tts/vocoder-models: - sherpa-onnx-tts.js - sherpa-onnx-wasm-main-tts.wasm - (optionally) sherpa-onnx-wasm-main-tts.js - Optional: also upload merged_models.json to sherpaonnx/models/merged_models.json - Use the Hugging Face raw URLs with the “resolve” path: - wasmPath: https://huggingface.co/datasets/your-user/your-repo/resolve/main/sherpaonnx/tts/vocoder-models/sherpa-onnx-tts.js - mergedModelsUrl: https://huggingface.co/datasets/your-user/your-repo/resolve/main/sherpaonnx/models/merged_models.json Example: ```html <script type="module"> import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/