vocal-call-sdk

Version:

A JavaScript SDK that provides a complete voice calling interface with WebSocket communication, audio recording/playback, and automatic UI management.

github.com/Vocallabs/vocal-call-sdk

Vocallabs/vocal-call-sdk

294 lines (227 loc) • 9.9 kB

Markdown

# VocalCallSDK A JavaScript SDK for real-time voice calls with intelligent audio processing and WebSocket communication. ## Installation ```javascript import { VocalCallSDK } from './dist/vocalcallsdk.js'; ``` ## Basic Usage ```javascript const sdk = new VocalCallSDK({ agentId: 'your-agent-uuid', // Required: Get from vocallabs.ai callId: 'unique-call-id', // Required: Get from arc.vocallabs.ai inactiveText: "Start Call", // Optional: Button text when idle activeText: "End Call", // Optional: Button text when active size: 'large', // Optional: 'small', 'medium', 'large' className: 'custom-button-class', // Optional: Additional CSS classes container: '#call-button-container', // Required for renderButton() config: { endpoints: { websocket: 'wss://call.vocallabs.ai/ws/' // Optional: Custom WebSocket URL }, audio: { userInputSampleRate: 32000, // Optional: User microphone sample rate agentOutputSampleRate: 24000, // Optional: Agent audio sample rate (24k recommended) echoCancellation: true, // Optional: Enable echo cancellation noiseSuppression: true // Optional: Enable noise suppression } } }); // Render the call button in the specified container sdk.renderButton(); ``` ## Configuration Parameters ### Required Parameters - **`agentId`**: Agent identifier from vocallabs.ai - **`callId`**: Unique identifier for each call session ### Optional Parameters - **`inactiveText`**: Button text when idle (default: "Talk to Assistant") - **`activeText`**: Button text when recording (default: "Listening...") - **`size`**: Button size - "small", "medium", "large" (default: "medium") - **`className`**: Additional CSS classes for the button (default: "") - **`container`**: DOM container selector for button rendering (required for `renderButton()`) ### Configuration Object The `config` object supports the following options: #### `config.endpoints` - **`websocket`**: Custom WebSocket URL (default: "wss://call.vocallabs.ai/ws/") #### `config.audio` - **`userInputSampleRate`**: Microphone sample rate (default: 32000) - **`agentOutputSampleRate`**: Agent audio sample rate - supports 48k, 24k, 16k (default: 24000) - **`echoCancellation`**: Enable echo cancellation (default: true) - **`noiseSuppression`**: Enable noise suppression (default: true) ## Event Handling The SDK provides several event hooks for monitoring call status and handling errors: ```javascript sdk.on('onCallStart', () => { console.log('Call started'); }) .on('onCallEnd', (reason) => { console.log('Call ended:', reason); // Possible reasons: 'user', 'agent', 'server_initiated', 'connection_timeout', etc. }) .on('onStatusChange', (status) => { console.log('Status changed:', status); // Status object includes: status, isRecording, isConnected, lastDisconnectReason }) .on('onError', (error) => { console.error('SDK Error:', error); }); // Remove event listeners sdk.off('onCallStart', callStartHandler); ``` ### Available Events - **`onCallStart`**: Fired when a call begins and WebSocket connection is established - **`onCallEnd`**: Fired when a call ends, includes reason parameter - **`onStatusChange`**: Fired when SDK status changes (connecting, connected, error, idle) - **`onError`**: Fired when an error occurs ## API Methods ### Core Methods - **`renderButton(container?)`**: Render the call button in the specified container - **`startCall()`**: Programmatically start a call - **`endCall()`**: Programmatically end a call (only works if currently recording) - **`getStatus()`**: Get current SDK status object - **`destroy()`**: Clean up resources and remove event listeners ### Status Object The `getStatus()` method returns an object with the following properties: ```javascript { status: 'idle' | 'connecting' | 'connected' | 'error', isRecording: boolean, isConnected: boolean, lastDisconnectReason: string | null } ``` ### Event Management - **`on(event, callback)`**: Add event listener (returns SDK instance for chaining) - **`off(event, callback)`**: Remove specific event listener ## How It Works The SDK provides a complete real-time voice communication system with intelligent audio processing and WebSocket-based communication. ### Architecture Overview 1. **WebSocket Connection**: Establishes real-time bidirectional communication with the Vocallabs voice service 2. **Audio Capture**: Captures user microphone input with configurable sample rates and audio processing 3. **Real-Time Processing**: Processes and transmits audio data in real-time chunks 4. **Agent Response**: Receives and plays back agent audio responses with automatic buffering 5. **Call Management**: Handles call state, disconnection reasons, and cleanup ### Key Features **Modern Audio Processing**: - Uses AudioWorkletNode for modern browsers with automatic fallback to ScriptProcessorNode - Configurable sample rates (32kHz user input, 24kHz agent output by default) - Built-in echo cancellation and noise suppression - Automatic audio normalization and gain control **Intelligent Connection Management**: - Automatic reconnection handling - Connection timeout detection (8 seconds) - Graceful disconnect with reason tracking - Page unload protection to properly close connections **Real-Time Audio Streaming**: - Low-latency audio transmission using WebSocket - Buffered playback for smooth agent responses - Automatic audio queue management - Cross-browser compatibility **User Experience**: - Responsive button UI with status indicators - Visual feedback for connection states - Configurable button sizes and text - Accessibility support with ARIA labels ### WebSocket Protocol The SDK communicates using a structured WebSocket protocol: - **Connection**: `wss://call.vocallabs.ai/ws/?agent={agentId}_{callId}_web_{sampleRate}` - **Events**: JSON-based event system for call control and media streaming - **Audio Format**: Base64-encoded 16-bit PCM audio data - **Status Tracking**: Real-time call status and hangup source reporting ## Advanced Configuration ### Audio Settings ```javascript const sdk = new VocalCallSDK({ agentId: 'your-agent-id', callId: 'your-call-id', container: '#call-button', config: { audio: { userInputSampleRate: 32000, // User microphone sample rate agentOutputSampleRate: 24000, // Agent audio sample rate (24k/16k/48k) echoCancellation: true, // Microphone echo cancellation noiseSuppression: true // Microphone noise suppression } } }); ``` ### Custom Styling The SDK automatically applies Tailwind CSS classes for styling. You can customize the appearance by: 1. **Using the `className` parameter**: ```javascript const sdk = new VocalCallSDK({ // ... other options className: 'custom-call-button' }); ``` 2. **Overriding default styles**: ```css .vocal-call-wrapper button { /* Your custom styles */ } ``` ### Button Sizes Available button sizes with their default styling: - **`small`**: `px-3 py-1 text-sm rounded-md` - **`medium`**: `px-4 py-2 text-base rounded-lg` (default) - **`large`**: `px-6 py-3 text-lg rounded-xl` ## Error Handling The SDK provides comprehensive error handling: ```javascript sdk.on('onError', (error) => { console.error('VocalCallSDK Error:', error); // Handle specific error types if (error.type === 'microphone_access_denied') { // Show user-friendly message about microphone permissions } else if (error.type === 'connection_failed') { // Handle connection issues } }); sdk.on('onCallEnd', (reason) => { // Handle different disconnect reasons switch (reason) { case 'user': console.log('User ended the call'); break; case 'agent': console.log('Agent ended the call'); break; case 'connection_timeout': console.log('Connection timed out'); break; case 'page_unload': console.log('Page was refreshed/closed during call'); break; } }); ``` ## Browser Compatibility The SDK supports all modern browsers with WebRTC capabilities: - Chrome 66+ (recommended) - Firefox 60+ - Safari 12+ - Edge 79+ **Features**: - Automatic fallback from AudioWorkletNode to ScriptProcessorNode for older browsers - WebSocket support with automatic reconnection - MediaDevices API for microphone access ## Best Practices 1. **Always handle errors**: Implement `onError` event handlers for graceful error handling 2. **Check microphone permissions**: Ensure users grant microphone access before starting calls 3. **Provide visual feedback**: Use the status events to show connection state to users 4. **Clean up resources**: Call `destroy()` when the component is unmounted 5. **Test across browsers**: Verify functionality across different browser versions ## Troubleshooting ### Common Issues **"Microphone access denied"** - Ensure HTTPS is used (required for microphone access) - Check browser microphone permissions - Verify the site isn't blocked from accessing media **"Connection timeout"** - Check network connectivity - Verify the WebSocket URL is accessible - Ensure firewall doesn't block WebSocket connections **"No audio from agent"** - Check audio output devices - Verify browser audio permissions - Test with different audio sample rates