js-tts-wrapper
Version:
A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services
930 lines (681 loc) • 28.6 kB
Markdown
# js-tts-wrapper
A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by [py3-TTS-Wrapper](https://github.com/willwade/tts-wrapper), it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs.
## Table of Contents
- [Features](#features)
- [Supported TTS Engines](#supported-tts-engines)
- [Installation](#installation)
- [Installation](#installation-1)
- [Using npm scripts](#using-npm-scripts)
- [Quick Start](#quick-start)
- [Core Functionality](#core-functionality)
- [Voice Management](#voice-management)
- [Text Synthesis](#text-synthesis)
- [Audio Playback](#audio-playback)
- [File Output](#file-output)
- [Event Handling](#event-handling)
- [SSML Support](#ssml-support)
- [Speech Markdown Support](#speech-markdown-support)
- [Engine-Specific Examples](#engine-specific-examples)
- [Browser Support](#browser-support)
- [API Reference](#api-reference)
- [Contributing](#contributing)
- [License](#license)
## Features
- **Unified API**: Consistent interface across multiple TTS providers.
- **SSML Support**: Use Speech Synthesis Markup Language to enhance speech synthesis
- **Speech Markdown**: Optional support for easier speech markup
- **Voice Selection**: Easily browse and select from available voices
- **Streaming Synthesis**: Stream audio as it's being synthesized
- **Playback Control**: Pause, resume, and stop audio playback
- **Word Boundaries**: Get callbacks for word timing (where supported)
- **File Output**: Save synthesized speech to audio files
- **Browser Support**: Works in both Node.js (server) and browser environments (see engine support table below)
## Supported TTS Engines
| Factory Name | Class Name | Environment | Provider | Dependencies |
|--------------|------------|-------------|----------|-------------|
| `azure` | `AzureTTSClient` | Both | Microsoft Azure Cognitive Services | `@azure/cognitiveservices-speechservices`, `microsoft-cognitiveservices-speech-sdk` |
| `google` | `GoogleTTSClient` | Both | Google Cloud Text-to-Speech | `@google-cloud/text-to-speech` |
| `elevenlabs` | `ElevenLabsTTSClient` | Both | ElevenLabs | `node-fetch@2` (Node.js only) |
| `watson` | `WatsonTTSClient` | Both | IBM Watson | None (uses fetch API) |
| `openai` | `OpenAITTSClient` | Both | OpenAI | `openai` |
| `playht` | `PlayHTTTSClient` | Both | PlayHT | `node-fetch@2` (Node.js only) |
| `polly` | `PollyTTSClient` | Both | Amazon Web Services | `@aws-sdk/client-polly` |
| `sherpaonnx` | `SherpaOnnxTTSClient` | Node.js | k2-fsa/sherpa-onnx | `sherpa-onnx-node`, `decompress`, `decompress-bzip2`, `decompress-tarbz2`, `decompress-targz`, `tar-stream` |
| `sherpaonnx-wasm` | `SherpaOnnxWasmTTSClient` | Browser | k2-fsa/sherpa-onnx | None (WASM included) |
| `espeak` | `EspeakNodeTTSClient` | Node.js | eSpeak NG | `text2wav` |
| `espeak-wasm` | `EspeakBrowserTTSClient` | Both | eSpeak NG | `mespeak` (Node.js) or meSpeak.js (browser) |
| `sapi` | `SAPITTSClient` | Node.js | Windows Speech API (SAPI) | None (uses PowerShell) |
| `witai` | `WitAITTSClient` | Both | Wit.ai | None (uses fetch API) |
**Factory Name**: Use with `createTTSClient('factory-name', credentials)`
**Class Name**: Use with direct import `import { ClassName } from 'js-tts-wrapper'`
**Environment**: Node.js = server-side only, Browser = browser-compatible, Both = works in both environments
## Installation
The library uses a modular approach where TTS engine-specific dependencies are optional. You can install the package and its dependencies as follows:
### npm install (longer route but more explicit)
```bash
# Install the base package
npm install js-tts-wrapper
# Install dependencies for specific engines
npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk # For Azure
npm install @google-cloud/text-to-speech # For Google Cloud
npm install @aws-sdk/client-polly # For AWS Polly
npm install node-fetch@2 # For ElevenLabs and PlayHT
npm install openai # For OpenAI
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream # For SherpaOnnx
npm install text2wav # For eSpeak NG (Node.js)
npm install mespeak # For eSpeak NG-WASM (Node.js)
npm install say # For System TTS (Node.js)
npm install sound-play pcm-convert # For Node.js audio playback
```
### Using npm scripts
After installing the base package, you can use the npm scripts provided by the package to install specific engine dependencies:
```bash
# Navigate to your project directory where js-tts-wrapper is installed
cd your-project
# Install Azure dependencies
npx js-tts-wrapper@latest run install:azure
# Install SherpaOnnx dependencies
npx js-tts-wrapper@latest run install:sherpaonnx
# Install eSpeak NG dependencies (Node.js)
npx js-tts-wrapper@latest run install:espeak
# Install eSpeak NG-WASM dependencies (Node.js)
npx js-tts-wrapper@latest run install:espeak-wasm
# Install System TTS dependencies (Node.js)
npx js-tts-wrapper@latest run install:system
# Install Node.js audio playback dependencies
npx js-tts-wrapper@latest run install:node-audio
# Install all development dependencies
npx js-tts-wrapper@latest run install:all-dev
```
## Quick Start
### Direct Instantiation
#### ESM (ECMAScript Modules)
```javascript
import { AzureTTSClient } from 'js-tts-wrapper';
// Initialize the client with your credentials
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// List available voices
const voices = await tts.getVoices();
console.log(voices);
// Set a voice
tts.setVoice('en-US-AriaNeural');
// Speak some text
await tts.speak('Hello, world!');
// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);
```
#### CommonJS
```javascript
const { AzureTTSClient } = require('js-tts-wrapper');
// Initialize the client with your credentials
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Use async/await within an async function
async function runExample() {
// List available voices
const voices = await tts.getVoices();
console.log(voices);
// Set a voice
tts.setVoice('en-US-AriaNeural');
// Speak some text
await tts.speak('Hello, world!');
// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);
}
runExample().catch(console.error);
```
### Using the Factory Pattern
The library provides a factory function to create TTS clients dynamically based on the engine name:
#### ESM (ECMAScript Modules)
```javascript
import { createTTSClient } from 'js-tts-wrapper';
// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Use the client as normal
await tts.speak('Hello from the factory pattern!');
```
#### CommonJS
```javascript
const { createTTSClient } = require('js-tts-wrapper');
// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
async function runExample() {
// Use the client as normal
await tts.speak('Hello from the factory pattern!');
}
runExample().catch(console.error);
```
The factory supports all engines: `'azure'`, `'google'`, `'polly'`, `'elevenlabs'`, `'openai'`, `'playht'`, `'watson'`, `'witai'`, `'sherpaonnx'`, `'sherpaonnx-wasm'`, `'espeak'`, `'espeak-wasm'`, `'sapi'`, etc.
## Core Functionality
All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers.
### Voice Management
```typescript
// Get all available voices
const voices = await tts.getVoices();
// Get voices for a specific language
const englishVoices = await tts.getVoicesByLanguage('en-US');
// Set the voice to use
tts.setVoice('en-US-AriaNeural');
```
The library includes a robust [Language Normalization](docs/LANGUAGE_NORMALIZATION.md) system that standardizes language codes across different TTS engines. This allows you to:
- Use BCP-47 codes (e.g., 'en-US') or ISO 639-3 codes (e.g., 'eng') interchangeably
- Get consistent language information regardless of the TTS engine
- Filter voices by language using any standard format
### Text Synthesis
```typescript
// Convert text to audio bytes (Uint8Array)
const audioBytes = await tts.synthToBytes('Hello, world!');
// Stream synthesis with word boundary information
const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!');
```
### Audio Playback
```typescript
// Traditional text synthesis and playback
await tts.speak('Hello, world!');
// NEW: Play audio from different sources without re-synthesizing
// Play from file
await tts.speak({ filename: 'path/to/audio.mp3' });
// Play from audio bytes
const audioBytes = await tts.synthToBytes('Hello, world!');
await tts.speak({ audioBytes: audioBytes });
// Play from audio stream
const { audioStream } = await tts.synthToBytestream('Hello, world!');
await tts.speak({ audioStream: audioStream });
// All input types work with speakStreamed too
await tts.speakStreamed({ filename: 'path/to/audio.mp3' });
// Playback control
tts.pause(); // Pause playback
tts.resume(); // Resume playback
tts.stop(); // Stop playback
// Stream synthesis and play with word boundary callbacks
await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => {
console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});
```
#### Benefits of Multi-Source Audio Playback
- **Avoid Double Synthesis**: Use `synthToFile()` to save audio, then play the same file with `speak({ filename })` without re-synthesizing
- **Platform Independent**: Works consistently across browser and Node.js environments
- **Efficient Reuse**: Play the same audio bytes or stream multiple times without regenerating
- **Flexible Input**: Choose the most convenient input source for your use case
> **Note**: Audio playback with `speak()` and `speakStreamed()` methods is supported in both browser environments and Node.js environments with the optional `sound-play` package installed. To enable Node.js audio playback, install the required packages with `npm install sound-play pcm-convert` or use the npm script `npx js-tts-wrapper@latest run install:node-audio`.
### File Output
```typescript
// Save synthesized speech to a file
await tts.synthToFile('Hello, world!', 'output', 'mp3');
```
### Event Handling
```typescript
// Register event handlers
tts.on('start', () => console.log('Speech started'));
tts.on('end', () => console.log('Speech ended'));
tts.on('boundary', (word, start, end) => {
console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});
// Alternative event connection
tts.connect('onStart', () => console.log('Speech started'));
tts.connect('onEnd', () => console.log('Speech ended'));
```
## SSML Support
All engines support SSML (Speech Synthesis Markup Language) for advanced control over speech synthesis:
```typescript
// Use SSML directly
const ssml = `
<speak>
<prosody rate="slow" pitch="low">
This text will be spoken slowly with a low pitch.
</prosody>
<break time="500ms"/>
<emphasis level="strong">This text is emphasized.</emphasis>
</speak>
`;
await tts.speak(ssml);
// Or use the SSML builder
const ssmlText = tts.ssml
.prosody({ rate: 'slow', pitch: 'low' }, 'This text will be spoken slowly with a low pitch.')
.break(500)
.emphasis('strong', 'This text is emphasized.')
.toString();
await tts.speak(ssmlText);
```
## Speech Markdown Support
The library supports Speech Markdown for easier speech formatting:
```typescript
// Use Speech Markdown
const markdown = "Hello (pause:500ms) world! This is (emphasis:strong) important.";
await tts.speak(markdown, { useSpeechMarkdown: true });
```
## Engine-Specific Examples
Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats:
### Azure
#### ESM
```javascript
import { AzureTTSClient } from 'js-tts-wrapper';
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
await tts.speak('Hello from Azure!');
```
#### CommonJS
```javascript
const { AzureTTSClient } = require('js-tts-wrapper');
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Inside an async function
await tts.speak('Hello from Azure!');
```
### Google Cloud
#### ESM
```javascript
import { GoogleTTSClient } from 'js-tts-wrapper';
const tts = new GoogleTTSClient({
keyFilename: '/path/to/service-account-key.json'
});
await tts.speak('Hello from Google Cloud!');
```
#### CommonJS
```javascript
const { GoogleTTSClient } = require('js-tts-wrapper');
const tts = new GoogleTTSClient({
keyFilename: '/path/to/service-account-key.json'
});
// Inside an async function
await tts.speak('Hello from Google Cloud!');
```
### AWS Polly
#### ESM
```javascript
import { PollyTTSClient } from 'js-tts-wrapper';
const tts = new PollyTTSClient({
region: 'us-east-1',
accessKeyId: 'your-access-key-id',
secretAccessKey: 'your-secret-access-key'
});
await tts.speak('Hello from AWS Polly!');
```
#### CommonJS
```javascript
const { PollyTTSClient } = require('js-tts-wrapper');
const tts = new PollyTTSClient({
region: 'us-east-1',
accessKeyId: 'your-access-key-id',
secretAccessKey: 'your-secret-access-key'
});
// Inside an async function
await tts.speak('Hello from AWS Polly!');
```
### ElevenLabs
#### ESM
```javascript
import { ElevenLabsTTSClient } from 'js-tts-wrapper';
const tts = new ElevenLabsTTSClient({
apiKey: 'your-api-key'
});
await tts.speak('Hello from ElevenLabs!');
```
#### CommonJS
```javascript
const { ElevenLabsTTSClient } = require('js-tts-wrapper');
const tts = new ElevenLabsTTSClient({
apiKey: 'your-api-key'
});
// Inside an async function
await tts.speak('Hello from ElevenLabs!');
```
### OpenAI
#### ESM
```javascript
import { OpenAITTSClient } from 'js-tts-wrapper';
const tts = new OpenAITTSClient({
apiKey: 'your-api-key'
});
await tts.speak('Hello from OpenAI!');
```
#### CommonJS
```javascript
const { OpenAITTSClient } = require('js-tts-wrapper');
const tts = new OpenAITTSClient({
apiKey: 'your-api-key'
});
// Inside an async function
await tts.speak('Hello from OpenAI!');
```
### PlayHT
#### ESM
```javascript
import { PlayHTTTSClient } from 'js-tts-wrapper';
const tts = new PlayHTTTSClient({
apiKey: 'your-api-key',
userId: 'your-user-id'
});
await tts.speak('Hello from PlayHT!');
```
#### CommonJS
```javascript
const { PlayHTTTSClient } = require('js-tts-wrapper');
const tts = new PlayHTTTSClient({
apiKey: 'your-api-key',
userId: 'your-user-id'
});
// Inside an async function
await tts.speak('Hello from PlayHT!');
```
### IBM Watson
#### ESM
```javascript
import { WatsonTTSClient } from 'js-tts-wrapper';
const tts = new WatsonTTSClient({
apiKey: 'your-api-key',
region: 'us-south',
instanceId: 'your-instance-id'
});
await tts.speak('Hello from IBM Watson!');
```
#### CommonJS
```javascript
const { WatsonTTSClient } = require('js-tts-wrapper');
const tts = new WatsonTTSClient({
apiKey: 'your-api-key',
region: 'us-south',
instanceId: 'your-instance-id'
});
// Inside an async function
await tts.speak('Hello from IBM Watson!');
```
### Wit.ai
#### ESM
```javascript
import { WitAITTSClient } from 'js-tts-wrapper';
const tts = new WitAITTSClient({
token: 'your-wit-ai-token'
});
await tts.speak('Hello from Wit.ai!');
```
#### CommonJS
```javascript
const { WitAITTSClient } = require('js-tts-wrapper');
const tts = new WitAITTSClient({
token: 'your-wit-ai-token'
});
// Inside an async function
await tts.speak('Hello from Wit.ai!');
```
### SherpaOnnx (Offline TTS)
#### ESM
```javascript
import { SherpaOnnxTTSClient } from 'js-tts-wrapper';
const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed
await tts.speak('Hello from SherpaOnnx!');
```
#### CommonJS
```javascript
const { SherpaOnnxTTSClient } = require('js-tts-wrapper');
const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed
// Inside an async function
await tts.speak('Hello from SherpaOnnx!');
```
> **Note**: SherpaOnnx is a server-side only engine and requires specific environment setup. See the [SherpaOnnx documentation](docs/sherpaonnx.md) for details on setup and configuration. For browser environments, use [SherpaOnnx-WASM](docs/sherpaonnx-wasm.md) instead.
### eSpeak NG (Node.js)
#### ESM
```javascript
import { EspeakNodeTTSClient } from 'js-tts-wrapper';
const tts = new EspeakNodeTTSClient();
await tts.speak('Hello from eSpeak NG!');
```
#### CommonJS
```javascript
const { EspeakNodeTTSClient } = require('js-tts-wrapper');
const tts = new EspeakNodeTTSClient();
// Inside an async function
await tts.speak('Hello from eSpeak NG!');
```
> **Note**: This engine uses the `text2wav` package and is designed for Node.js environments only. For browser environments, use the eSpeak NG Browser engine instead.
### eSpeak NG (Browser)
#### ESM
```javascript
import { EspeakBrowserTTSClient } from 'js-tts-wrapper';
const tts = new EspeakBrowserTTSClient();
await tts.speak('Hello from eSpeak NG Browser!');
```
#### CommonJS
```javascript
const { EspeakBrowserTTSClient } = require('js-tts-wrapper');
const tts = new EspeakBrowserTTSClient();
// Inside an async function
await tts.speak('Hello from eSpeak NG Browser!');
```
> **Note**: This engine works in both Node.js (using the `mespeak` package) and browser environments (using meSpeak.js). For browser use, include meSpeak.js in your HTML before using this engine.
#### Backward Compatibility
For backward compatibility, the old class names are still available:
- `EspeakTTSClient` (alias for `EspeakNodeTTSClient`)
- `EspeakWasmTTSClient` (alias for `EspeakBrowserTTSClient`)
However, we recommend using the new, clearer names in new code.
### Windows SAPI (Windows-only)
#### ESM
```javascript
import { SAPITTSClient } from 'js-tts-wrapper';
const tts = new SAPITTSClient();
await tts.speak('Hello from Windows SAPI!');
```
#### CommonJS
```javascript
const { SAPITTSClient } = require('js-tts-wrapper');
const tts = new SAPITTSClient();
// Inside an async function
await tts.speak('Hello from Windows SAPI!');
```
> **Note**: This engine is **Windows-only**
## API Reference
### Factory Function
| Function | Description | Return Type |
|--------|-------------|-------------|
| `createTTSClient(engine, credentials)` | Create a TTS client for the specified engine | `AbstractTTSClient` |
### Common Methods (All Engines)
| Method | Description | Return Type |
|--------|-------------|-------------|
| `getVoices()` | Get all available voices | `Promise<UnifiedVoice[]>` |
| `getVoicesByLanguage(language)` | Get voices for a specific language | `Promise<UnifiedVoice[]>` |
| `setVoice(voiceId, lang?)` | Set the voice to use | `void` |
| `synthToBytes(text, options?)` | Convert text to audio bytes | `Promise<Uint8Array>` |
| `synthToBytestream(text, options?)` | Stream synthesis with word boundaries | `Promise<{audioStream, wordBoundaries}>` |
| `speak(text, options?)` | Synthesize and play audio | `Promise<void>` |
| `speakStreamed(text, options?)` | Stream synthesis and play | `Promise<void>` |
| `synthToFile(text, filename, format?, options?)` | Save synthesized speech to a file | `Promise<void>` |
| `startPlaybackWithCallbacks(text, callback, options?)` | Play with word boundary callbacks | `Promise<void>` |
| `pause()` | Pause audio playback | `void` |
| `resume()` | Resume audio playback | `void` |
| `stop()` | Stop audio playback | `void` |
| `on(event, callback)` | Register event handler | `void` |
| `connect(event, callback)` | Connect to event | `void` |
| `checkCredentials()` | Check if credentials are valid | `Promise<boolean>` |
| `checkCredentialsDetailed()` | Check if credentials are valid with detailed response | `Promise<CredentialsCheckResult>` |
| `getProperty(propertyName)` | Get a property value | `PropertyType` |
| `setProperty(propertyName, value)` | Set a property value | `void` |
The `checkCredentialsDetailed()` method returns a `CredentialsCheckResult` object with the following properties:
```typescript
{
success: boolean; // Whether the credentials are valid
error?: string; // Error message if credentials are invalid
voiceCount?: number; // Number of voices available if credentials are valid
}
```
### SSML Builder Methods
The `ssml` property provides a builder for creating SSML:
| Method | Description |
|--------|-------------|
| `prosody(attrs, text)` | Add prosody element |
| `break(time)` | Add break element |
| `emphasis(level, text)` | Add emphasis element |
| `sayAs(interpretAs, text)` | Add say-as element |
| `phoneme(alphabet, ph, text)` | Add phoneme element |
| `sub(alias, text)` | Add substitution element |
| `toString()` | Convert to SSML string |
## Browser Support
The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle:
```html
<!-- Using ES modules (recommended) -->
<script type="module">
import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser';
// Create a new SherpaOnnx WebAssembly TTS client
const ttsClient = new SherpaOnnxWasmTTSClient();
// Initialize the WebAssembly module
await ttsClient.initializeWasm('./sherpaonnx-wasm/sherpaonnx.js');
// Get available voices
const voices = await ttsClient.getVoices();
console.log(`Found ${voices.length} voices`);
// Set the voice
await ttsClient.setVoice(voices[0].id);
// Speak some text
await ttsClient.speak('Hello, world!');
</script>
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Optional Dependencies
The library uses a peer dependencies approach to minimize the installation footprint. You can install only the dependencies you need for the engines you plan to use.
```bash
# Install the base package
npm install js-tts-wrapper
# Install dependencies for specific engines
npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk # For Azure TTS
npm install @google-cloud/text-to-speech # For Google TTS
npm install @aws-sdk/client-polly # For AWS Polly
npm install openai # For OpenAI TTS
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream # For SherpaOnnx TTS
npm install text2wav # For eSpeak NG (Node.js)
npm install mespeak # For eSpeak NG-WASM (Node.js)
# Install dependencies for Node.js audio playback
npm install sound-play speaker pcm-convert # For audio playback in Node.js
```
You can also use the npm scripts provided by the package to install specific engine dependencies:
```bash
# Navigate to your project directory where js-tts-wrapper is installed
cd your-project
# Install specific engine dependencies
npx js-tts-wrapper@latest run install:azure
npx js-tts-wrapper@latest run install:google
npx js-tts-wrapper@latest run install:polly
npx js-tts-wrapper@latest run install:openai
npx js-tts-wrapper@latest run install:sherpaonnx
npx js-tts-wrapper@latest run install:espeak
npx js-tts-wrapper@latest run install:espeak-wasm
npx js-tts-wrapper@latest run install:system
# Install Node.js audio playback dependencies
npx js-tts-wrapper@latest run install:node-audio
# Install all development dependencies
npx js-tts-wrapper@latest run install:all-dev
```
## Node.js Audio Playback
The library supports audio playback in Node.js environments with the optional `sound-play` package. This allows you to use the `speak()` and `speakStreamed()` methods in Node.js applications, just like in browser environments.
To enable Node.js audio playback:
1. Install the required dependencies:
```bash
npm install sound-play pcm-convert
```
Or use the npm script:
```bash
npx js-tts-wrapper@latest run install:node-audio
```
2. Use the TTS client as usual:
```typescript
import { TTSFactory } from 'js-tts-wrapper';
const tts = TTSFactory.createTTSClient('mock');
// Play audio in Node.js
await tts.speak('Hello, world!');
```
If the `sound-play` package is not installed, the library will fall back to providing informative messages and suggest installing the package.
## Testing and Troubleshooting
### Unified Test Runner
The library includes a comprehensive unified test runner that supports multiple testing modes and engines:
```bash
# Basic usage - test all engines
node examples/unified-test-runner.js
# Test a specific engine
node examples/unified-test-runner.js [engine-name]
# Test with different modes
node examples/unified-test-runner.js [engine-name] --mode=[MODE]
```
### Available Test Modes
| Mode | Description | Usage |
|------|-------------|-------|
| `basic` | Basic synthesis tests (default) | `node examples/unified-test-runner.js azure` |
| `audio` | Audio-only tests with playback | `PLAY_AUDIO=true node examples/unified-test-runner.js azure --mode=audio` |
| `playback` | Playback control tests (pause/resume/stop) | `node examples/unified-test-runner.js azure --mode=playback` |
| `features` | Comprehensive feature tests | `node examples/unified-test-runner.js azure --mode=features` |
| `example` | Full examples with SSML, streaming, word boundaries | `node examples/unified-test-runner.js azure --mode=example` |
| `debug` | Debug mode for troubleshooting | `node examples/unified-test-runner.js sherpaonnx --mode=debug` |
| `stream` | Streaming tests with real-time playback | `PLAY_AUDIO=true node examples/unified-test-runner.js playht --mode=stream` |
### Testing Audio Playback
To test audio playback with any TTS engine, use the `PLAY_AUDIO` environment variable:
```bash
# Test a specific engine with audio playback
PLAY_AUDIO=true node examples/unified-test-runner.js [engine-name] --mode=audio
# Examples:
PLAY_AUDIO=true node examples/unified-test-runner.js witai --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js azure --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js polly --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js system --mode=audio
```
### SherpaOnnx Specific Testing
SherpaOnnx requires special environment setup. Use the helper script:
```bash
# Test SherpaOnnx with audio playback
PLAY_AUDIO=true node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=audio
# Debug SherpaOnnx issues
node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=debug
# Use npm scripts (recommended)
npm run example:sherpaonnx:mac
PLAY_AUDIO=true npm run example:sherpaonnx:mac
```
### Using npm Scripts
The package provides convenient npm scripts for testing specific engines:
```bash
# Test specific engines using npm scripts
npm run example:azure
npm run example:google
npm run example:polly
npm run example:openai
npm run example:elevenlabs
npm run example:playht
npm run example:system
npm run example:sherpaonnx:mac # For SherpaOnnx with environment setup
# With audio playback
PLAY_AUDIO=true npm run example:azure
PLAY_AUDIO=true npm run example:system
PLAY_AUDIO=true npm run example:sherpaonnx:mac
```
### Getting Help
For detailed help and available options:
```bash
# Show help and available engines
node examples/unified-test-runner.js --help
# Show available test modes
node examples/unified-test-runner.js --mode=help
```
### Common Issues
1. **No Audio in Node.js**: Install audio dependencies with `npm install sound-play speaker pcm-convert`
2. **SherpaOnnx Not Working**: Use the helper script and ensure environment variables are set correctly
3. **WitAI Audio Issues**: The library automatically handles WitAI's raw PCM format conversion
4. **Sample Rate Issues**: Different engines use different sample rates (WitAI: 24kHz, Polly: 16kHz) - this is handled automatically
For detailed troubleshooting, see the [docs/](docs/) directory, especially:
- [SherpaOnnx Documentation](docs/sherpaonnx.md)
- [SherpaOnnx Troubleshooting](docs/sherpaonnx-troubleshooting.md)
## License
This project is licensed under the MIT License - see the LICENSE file for details.