UNPKG

voiceai-sdk

Version:

Official SDK for SLNG.AI Voice API - Text-to-Speech, Speech-to-Text, and LLM services

419 lines (318 loc) 9.82 kB
# VoiceAI SDK > 🎙️ The official Node.js/TypeScript SDK for [SLNG.AI](https://slng.ai) - Simple, powerful voice AI for developers. [![npm version](https://img.shields.io/npm/v/voiceai-sdk.svg)](https://www.npmjs.com/package/voiceai-sdk) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## Quick Start ```bash npm install voiceai-sdk ``` ```typescript import { VoiceAI, tts, stt, llm } from 'voiceai-sdk'; // Initialize once in your app new VoiceAI({ apiKey: 'your-api-key' // Get yours at https://slng.ai/signup }); // Text to Speech const audio = await tts.synthesize('Hello world', 'orpheus'); // Speech to Text const transcript = await stt.transcribe(audioFile, 'whisper-v3'); // LLM Completion const response = await llm.complete('What is the meaning of life?', 'llama-4-scout'); ``` ## Why SLNG.AI? - 🚀 **All Voice AI in One Place** - TTS, STT, and LLMs through a single API - 🎯 **Best Models** - Access to Orpheus, ElevenLabs, Whisper, and more - 💳 **Simple Pricing** - Pay-as-you-go with transparent credit system - 👩‍💻 **Developer First** - Clean API, great docs, responsive founders - ⚡ **Fast Integration** - Get started in minutes, not hours - 🌍 **Multi-language** - Support for 29+ languages across models ## Installation ```bash npm install voiceai-sdk # or yarn add voiceai-sdk # or pnpm add voiceai-sdk ``` ## Authentication Get your API key at [https://slng.ai/signup](https://slng.ai/signup) ```typescript import { VoiceAI } from 'voiceai-sdk'; new VoiceAI({ apiKey: process.env.VOICEAI_API_KEY, timeout: 60000 // Optional: custom timeout in ms (default: 30000) }); ``` ## Text-to-Speech (TTS) ### Simple Usage ```typescript import { tts } from 'voiceai-sdk'; // Quick synthesis with model name const audio = await tts.synthesize('Hello world', 'orpheus'); // Use convenience methods const audio = await tts.orpheus('Hello world'); const audio = await tts.vui('Hello world'); const audio = await tts.koroko('Hello world'); // Orpheus Indic for Indian languages (Mumbai region - low latency) const audio = await tts.orpheusIndic('नमस्ते', { language: 'hi' }); const audio = await tts.orpheusIndic('வணக்கம்', { language: 'ta' }); ``` ### Advanced Options ```typescript // With voice and language options const audio = await tts.orpheus('Bonjour le monde', { voice: 'pierre', language: 'fr', stream: false }); // ElevenLabs models const audio = await tts.elevenlabs.multiV2('Hello world', { voice: 'Rachel', language: 'en', stability: 0.5, similarity_boost: 0.75 }); // Voice cloning with XTTS const audio = await tts.xtts('Hello world', { speakerVoice: 'base64_encoded_audio', // 6+ seconds of reference audio language: 'en' }); // Voice cloning with MARS6 const audio = await tts.mars6('Hello world', { audioRef: 'base64_encoded_audio', language: 'en-us', refText: 'Reference transcript' // Optional but recommended }); ``` ### Streaming ```typescript const stream = await tts.synthesize('Long text...', 'orpheus', { stream: true }); // Handle streaming response for await (const chunk of stream) { // Process audio chunks } ``` ### Available Models ```typescript console.log(tts.models); // ['vui', 'orpheus', 'orpheus-indic', 'koroko', 'xtts-v2', 'mars6', 'elevenlabs/multi-v2', ...] // Get voices for a model const voices = tts.getVoices('orpheus'); // ['tara', 'leah', 'jess', 'leo', 'dan', ...] // Get supported languages const languages = tts.getLanguages('orpheus'); // ['en', 'fr', 'de', 'ko', 'zh', 'es', 'it', 'hi'] // Orpheus Indic supports 8 major Indian languages const indicLanguages = tts.getLanguages('orpheus-indic'); // ['hi', 'ta', 'te', 'bn', 'mr', 'gu', 'kn', 'ml'] // Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam ``` ## Speech-to-Text (STT) ### Basic Transcription ```typescript import { stt } from 'voiceai-sdk'; // Transcribe audio file const result = await stt.transcribe(audioFile, 'whisper-v3'); console.log(result.text); // Convenience methods const result = await stt.whisper(audioFile); const result = await stt.kyutai(audioFile, { language: 'fr' }); ``` ### With Options ```typescript // Whisper with options const result = await stt.whisper(audioFile, { language: 'es', timestamps: true, diarization: true }); // Kyutai - optimized for French and English (Mumbai region) const result = await stt.kyutai(audioFile, { language: 'fr', // 'en' or 'fr' only timestamps: true }); // Access segments with timestamps result.segments?.forEach(segment => { console.log(`[${segment.start}-${segment.end}]: ${segment.text}`); }); ``` ### Supported Input Types ```typescript // File object (browser) const file = document.getElementById('audio-input').files[0]; const result = await stt.whisper(file); // Blob const blob = new Blob([audioData], { type: 'audio/wav' }); const result = await stt.whisper(blob); // ArrayBuffer const buffer = await fetch('audio.mp3').then(r => r.arrayBuffer()); const result = await stt.whisper(buffer); // Base64 string const base64Audio = 'data:audio/wav;base64,...'; const result = await stt.whisper(base64Audio); ``` ## Language Models (LLM) ### Simple Completion ```typescript import { llm } from 'voiceai-sdk'; // Single prompt const result = await llm.complete('Explain quantum computing', 'llama-4-scout'); console.log(result.content); // Convenience method const result = await llm.llamaScout('What is the speed of light?'); ``` ### Chat Format ```typescript const messages = [ { role: 'system', content: 'You are a helpful assistant' }, { role: 'user', content: 'What is the capital of France?' } ]; const result = await llm.llamaScout(messages, { temperature: 0.7, maxTokens: 500 }); ``` ### Streaming Responses ```typescript const stream = await llm.llamaScout('Write a story...', { stream: true }); for await (const chunk of stream) { process.stdout.write(chunk); } ``` ## Handling Cold Starts Some models may take 60-90 seconds to start up on first use. The SDK handles this automatically with: - **Smart timeouts**: Models known to be slow get longer timeouts - **Clear messages**: Timeout errors explain cold starts - **Warmup utilities**: Pre-warm models before use ### Pre-warming Models ```typescript import { warmup } from 'voiceai-sdk'; // Warm up a single model await warmup.tts('orpheus'); await warmup.stt('whisper-v3'); await warmup.llm('llama-4-scout'); // Warm up multiple models in parallel await warmup.multiple([ { type: 'tts', model: 'orpheus' }, { type: 'stt', model: 'whisper-v3' }, { type: 'llm', model: 'llama-4-scout' } ]); ``` ### Custom Timeouts ```typescript // Global timeout for all requests new VoiceAI({ apiKey: 'your-key', timeout: 120000 // 2 minutes }); // The SDK automatically uses longer timeouts for known slow models: // - Orpheus: 90s // - Orpheus Indic: 90s // - XTTS-v2: 90s // - Whisper-v3: 120s // - MARS6: 90s ``` ## Error Handling ```typescript try { const audio = await tts.synthesize('Hello', 'orpheus'); } catch (error) { if (error.message.includes('Authentication failed')) { // Invalid API key } else if (error.message.includes('Insufficient credits')) { // Need more credits } else if (error.message.includes('Rate limit')) { // Too many requests } else if (error.message.includes('timed out')) { // Model may be cold starting - retry in a moment } } ``` ## Examples ### Build a Voice Assistant ```typescript import { VoiceAI, tts, stt, llm } from 'voiceai-sdk'; new VoiceAI({ apiKey: process.env.VOICEAI_API_KEY }); async function voiceAssistant(audioInput) { // 1. Transcribe user's speech const transcript = await stt.whisper(audioInput); // 2. Generate AI response const response = await llm.llamaScout(transcript.text); // 3. Convert response to speech const audio = await tts.orpheus(response.content, { voice: 'tara', language: 'en' }); return audio; } ``` ### Multilingual TTS ```typescript const languages = { en: 'Hello world', fr: 'Bonjour le monde', de: 'Hallo Welt', es: 'Hola mundo' }; for (const [lang, text] of Object.entries(languages)) { const audio = await tts.orpheus(text, { language: lang, voice: getVoiceForLanguage(lang) }); // Save or play audio } ``` ### Voice Cloning ```typescript // Clone voice with XTTS-v2 const referenceAudio = await loadAudioAsBase64('speaker.wav'); const clonedSpeech = await tts.xtts('This is my cloned voice', { speakerVoice: referenceAudio, language: 'en' }); // Clone with MARS6 (supports prosody) const clonedWithProsody = await tts.mars6('Excited speech!', { audioRef: referenceAudio, refText: 'This is how I normally speak', language: 'en-us', temperature: 0.8 }); ``` ## TypeScript Support Full TypeScript support with exported types: ```typescript import { VoiceAI, TTSOptions, TTSResult, STTOptions, STTResult, LLMMessage, LLMOptions, LLMResult } from 'voiceai-sdk'; ``` ## Need Help? - **Documentation**: [https://slng.ai/docs](https://slng.ai/docs) - **Dashboard**: [https://slng.ai/dashboard](https://slng.ai/dashboard) - **Pricing**: [https://slng.ai/pricing](https://slng.ai/pricing) ## Feedback & Support We're building this for developers like you. Your feedback matters! - 📧 **Email founders**: hello@slng.ai - 🤖 **Request models**: Need a specific model? Just ask! - 🐛 **Report issues**: hello@slng.ai - 💬 **Discord**: Coming soon! ## Contributing We welcome contributions! Feel free to: - Report bugs - Suggest new features - Submit pull requests - Request new models ## License MIT © SLNG.AI --- Built with ❤️ by the SLNG.AI team. Making voice AI simple for developers everywhere.