voice-stream

Version:

A powerful React hook for real-time voice streaming, designed for AI-powered applications. Perfect for real-time transcription, voice assistants, and audio processing with features like silence detection and configurable audio processing.

github.com/danieloquelis/voice-stream

danieloquelis/voice-stream

218 lines (170 loc) • 5.7 kB

Markdown

# Voice Stream A powerful TypeScript library for real-time voice streaming in React applications, designed for AI-powered voice applications, real-time transcription, and audio processing. ## Features - 🎙️ Real-time voice streaming with configurable audio processing - 🔊 Automatic silence detection and handling - ⚡ Configurable sample rate and buffer size - 🔄 Base64 encoded audio chunks for easy transmission - 🛠️ TypeScript support with full type definitions - 📦 Zero dependencies (except for React) ## Installation ```bash yarn add voice-stream # or npm install voice-stream ``` ## Requirements - React 18 or higher - Modern browser with Web Audio API support ## Basic Usage ```typescript import { useVoiceStream } from "voice-stream"; function App() { const { startStreaming, stopStreaming, isStreaming } = useVoiceStream({ onStartStreaming: () => { console.log("Streaming started"); }, onStopStreaming: () => { console.log("Streaming stopped"); }, onAudioChunked: (chunkBase64) => { // Handle the audio chunk console.log("Received audio chunk"); }, }); return ( <div> <button onClick={startStreaming} disabled={isStreaming}> Start Recording </button> <button onClick={stopStreaming} disabled={!isStreaming}> Stop Recording </button> </div> ); } ``` ## Advanced Configuration The `useVoiceStream` hook accepts several configuration options for advanced use cases: ```typescript const options = { // Basic callbacks onStartStreaming: () => void, onStopStreaming: () => void, onAudioChunked: (base64Data: string) => void, onError: (error: Error) => void, // Audio processing options targetSampleRate: 16000, // Default: 16000 bufferSize: 4096, // Default: 4096 // Silence detection options enableSilenceDetection: true, // Default: false silenceThreshold: -50, // Default: -50 (dB) silenceDuration: 1000, // Default: 1000 (ms) autoStopOnSilence: true, // Default: false // Audio routing includeDestination: true, // Default: true - routes audio to speakers }; ``` ## Use Cases ### 1. OpenAI Whisper API Integration Real-time speech-to-text using OpenAI's Whisper API: ```typescript function WhisperTranscription() { const [transcript, setTranscript] = useState(""); const { startStreaming, stopStreaming } = useVoiceStream({ targetSampleRate: 16000, // Whisper's preferred sample rate onAudioChunked: async (base64Data) => { const response = await fetch('https://api.openai.com/v1/audio/transcriptions', { method: 'POST', headers: { 'Authorization': `Bearer ${OPENAI_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ audio: base64Data, model: 'whisper-1', response_format: 'text' }) }); const text = await response.text(); setTranscript(text); } }); return ( // ... UI implementation ); } ``` ### 2. ElevenLabs WebSocket Integration Real-time text-to-speech using ElevenLabs' WebSocket API: ```typescript function ElevenLabsStreaming() { const ws = useRef<WebSocket | null>(null); const { startStreaming, stopStreaming } = useVoiceStream({ targetSampleRate: 44100, // ElevenLabs preferred sample rate onAudioChunked: (base64Data) => { if (ws.current?.readyState === WebSocket.OPEN) { ws.current.send(JSON.stringify({ audio: base64Data, voice_settings: { stability: 0.5, similarity_boost: 0.75 } })); } } }); useEffect(() => { ws.current = new WebSocket('wss://api.elevenlabs.io/v1/text-to-speech'); return () => { ws.current?.close(); }; }, []); return ( // ... UI implementation ); } ``` ### 3. Real-time Voice Activity Detection Implement voice activity detection with automatic silence handling: ```typescript function VoiceActivityDetection() { const { startStreaming, stopStreaming } = useVoiceStream({ enableSilenceDetection: true, silenceThreshold: -50, silenceDuration: 1000, autoStopOnSilence: true, onStartStreaming: () => console.log("Voice detected"), onStopStreaming: () => console.log("Silence detected"), }); return ( // ... UI implementation ); } ``` ## API Reference ### useVoiceStream Hook #### Returns - `startStreaming: () => Promise<void>` - Function to start voice streaming - `stopStreaming: () => void` - Function to stop voice streaming - `isStreaming: boolean` - Current streaming status #### Options - `onStartStreaming?: () => void` - Called when streaming starts - `onStopStreaming?: () => void` - Called when streaming stops - `onAudioChunked?: (chunkBase64: string) => void` - Called with each audio chunk - `onError?: (error: Error) => void` - Called when an error occurs - `targetSampleRate?: number` - Target sample rate for audio processing - `bufferSize?: number` - Size of the audio processing buffer - `enableSilenceDetection?: boolean` - Enable silence detection - `silenceThreshold?: number` - Threshold for silence detection in dB - `silenceDuration?: number` - Duration of silence before trigger in ms - `autoStopOnSilence?: boolean` - Automatically stop streaming on silence - `includeDestination?: boolean` - Route audio to speakers ## Contributing We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please feel free to reach out or submit a pull request. ### Development Setup 1. Fork the repository 2. Install dependencies: `yarn install` 3. Run tests: `yarn test` ## License MIT