multi-voice-sdk
Version:
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities
272 lines (201 loc) • 8.74 kB
Markdown
# Multi-Voice SDK
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.
## Features
- 🎵 **Multi-Provider TTS**: Gemini, Deepgram, OpenAI, Groq PlayAI, and Cartesia TTS
- 🎙️ **Speech-to-Text**: Deepgram and AssemblyAI STT with advanced features
- 🔧 **Audio Merging**: Combine multiple audio files seamlessly
- 🎯 **Simple API**: Easy-to-use functions with consistent interface
- 📦 **ESM Ready**: Modern ES modules support
## Installation
```bash
npm install multi-voice-sdk
```
## Quick Start
```javascript
import { tts, stt, merge } from "multi-voice-sdk";
// Generate speech with OpenAI
tts({
provider: "openai",
apiKey: "your-api-key",
text: "Hello, world!",
voice: "nova",
outputFile: "output.mp3",
});
// Transcribe audio with Deepgram
stt({
apiKey: "your-deepgram-key",
audioFile: "https://example.com/audio.wav", // Can be URL or local file
});
// Merge multiple audio files
merge({
inputFiles: ["file1.mp3", "file2.mp3"],
outputFile: "combined.mp3",
});
```
## API Reference
### `tts(options)`
Generate speech from text using various TTS providers.
#### Parameters
| Parameter | Type | Required | Description |
| ------------ | -------- | -------- | ----------------------------------------------------------------------------- |
| `provider` | `string` | ✅ | TTS provider: `"gemini"`, `"deepgram"`, `"openai"`, `"groq"`, or `"cartesia"` |
| `apiKey` | `string` | ✅ | API key for the chosen provider |
| `text` | `string` | ✅ | Text to convert to speech |
| `voice` | `string` | ✅ | Voice identifier (provider-specific, for Cartesia use voice ID) |
| `outputFile` | `string` | optional | Output file path (default: `"output.mp3"`) |
| `model` | `string` | optional | Model to use (provider-specific) |
| `prompt` | `string` | optional | Additional instructions for speech generation |
#### Examples
**OpenAI TTS**
```javascript
tts({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4o-mini-tts",
text: "Hello from OpenAI!",
voice: "nova",
prompt: "Speak in a cheerful tone",
outputFile: "openai_output.mp3",
});
```
**Google Gemini TTS**
```javascript
tts({
provider: "gemini",
apiKey: process.env.GEMINI_API_KEY,
text: "Hello from Gemini!",
voice: "iapetus",
prompt: "In a pleasant and calm tone",
outputFile: "gemini_output.mp3",
});
```
**Deepgram TTS**
```javascript
tts({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
text: "Hello from Deepgram!",
voice: "aura-2-luna-en",
outputFile: "deepgram_output.mp3",
});
```
**Groq PlayAI TTS**
```javascript
tts({
provider: "groq",
apiKey: process.env.GROQ_API_KEY,
text: "Hello from Groq PlayAI!",
voice: "Arista-PlayAI",
outputFile: "groq_output.wav",
});
```
**Cartesia TTS**
```javascript
tts({
provider: "cartesia",
apiKey: process.env.CARTESIA_API_KEY,
text: "Hello from Cartesia!",
voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID
outputFile: "cartesia_output.mp3",
});
```
### `stt(options)`
Transcribe audio to text using Speech-to-Text providers.
#### Parameters
| Parameter | Type | Required | Description |
| ----------------- | --------- | -------- | ------------------------------------------------------------------------- |
| `provider` | `string` | ✅ | STT provider: `"deepgram"` or `"assemblyai"` |
| `apiKey` | `string` | ✅ | API key for the chosen provider |
| `audioFile` | `string` | ✅ | Path to local audio file or URL of remote audio file to transcribe |
| `outputFile` | `string` | optional | Output file path for results (default: `"transcription.json"`) |
| `model` | `string` | optional | Model to use (default: `"nova-3"`) |
| `smartFormat` | `boolean` | optional | Enable smart formatting (default: `true`) |
| `detect_language` | `boolean` | optional | Automatic language detection (default: `true`) |
| `punctuate` | `boolean` | optional | Enable punctuation (default: `true`) |
| `diarize` | `boolean` | optional | Enable speaker diarization (default: `false`) |
| `channels` | `number` | optional | Number of audio channels (default: `1`) |
| `fullResponse` | `boolean` | optional | Return full response object instead of just transcript (default: `false`) |
#### Returns
- **Default**: Returns transcript as a string
- **With `fullResponse: true`**: Returns object with transcript, confidence, words, and metadata
#### Examples
### `Deepgram : Basic Transcription (Remote URL)`
```javascript
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
});
```
### `Deepgram : Local File Transcription`
```javascript
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
});
```
### `AssemblyAI : Basic Transcription (Remote URL)`
```javascript
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
outputFile: "transcription.json",
});
```
### `AssemblyAI : Local File Transcription`
```javascript
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
fullResponse: true, // Get detailed response
});
```
### `merge(options)`
Merge multiple audio files into a single file.
#### Parameters
| Parameter | Type | Required | Description |
| ------------ | ---------- | -------- | ------------------------- |
| `inputFiles` | `string[]` | ✅ | Array of input file paths |
| `outputFile` | `string` | ✅ | Output file path |
#### Example
```javascript
merge({
inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],
outputFile: "complete_audio.mp3",
});
```
## Supported Voices
### OpenAI
- `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`, `verse`
### Gemini
- `zephyr` (Bright), `puck` (Upbeat), `charon` (Informative), `kore` (Firm), `fenrir` (Excitable), `leda` (Youthful), `orus` (Firm), `aoede` (Breezy), `autonoe` (Bright), `enceladus` (Breathy), `iapetus` (Clear)
For a complete list of available Gemini voices, see: [Gemini Speech Generation Documentation](https://ai.google.dev/gemini-api/docs/speech-generation#voices)
### Deepgram
- `aura-2-luna-en`, `aura-2-stella-en`, `aura-2-arcas-en`, and more
For a complete list of available Deepgram voices, see: [Deepgram TTS Models Documentation](https://developers.deepgram.com/docs/tts-models#featured-voices)
### Groq PlayAI
- `Atlas-PlayAI`, `Arista-PlayAI`, `Basil-PlayAI`, `Briggs-PlayAI`, and more
For a complete list of available Groq PlayAI voices, see: [Groq TTS Documentation](https://console.groq.com/docs/text-to-speech)
### Cartesia
Cartesia uses voice IDs instead of voice names. Example voice IDs:
- `694f9389-aac1-45b6-b726-9d9369183238` (Default voice)
- Use the Cartesia console to find available voice IDs for your account
For more information about Cartesia voices, see: [Cartesia Console](https://play.cartesia.ai/voices)
## Environment Variables
Create a `.env` file in your project root:
```env
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
GROQ_API_KEY=your_groq_api_key
CARTESIA_API_KEY=your_cartesia_api_key
```
## Requirements
- Node.js 16.x or higher
## License
ISC