vocal-call-sdk
Version:
A JavaScript SDK that provides a complete voice calling interface with WebSocket communication, audio recording/playback, and automatic UI management.
294 lines (227 loc) • 9.9 kB
Markdown
A JavaScript SDK for real-time voice calls with intelligent audio processing and WebSocket communication.
```javascript
import { VocalCallSDK } from './dist/vocalcallsdk.js';
```
```javascript
const sdk = new VocalCallSDK({
agentId: 'your-agent-uuid', // Required: Get from vocallabs.ai
callId: 'unique-call-id', // Required: Get from arc.vocallabs.ai
inactiveText: "Start Call", // Optional: Button text when idle
activeText: "End Call", // Optional: Button text when active
size: 'large', // Optional: 'small', 'medium', 'large'
className: 'custom-button-class', // Optional: Additional CSS classes
container: '#call-button-container', // Required for renderButton()
config: {
endpoints: {
websocket: 'wss://call.vocallabs.ai/ws/' // Optional: Custom WebSocket URL
},
audio: {
userInputSampleRate: 32000, // Optional: User microphone sample rate
agentOutputSampleRate: 24000, // Optional: Agent audio sample rate (24k recommended)
echoCancellation: true, // Optional: Enable echo cancellation
noiseSuppression: true // Optional: Enable noise suppression
}
}
});
// Render the call button in the specified container
sdk.renderButton();
```
- **`agentId`**: Agent identifier from vocallabs.ai
- **`callId`**: Unique identifier for each call session
- **`inactiveText`**: Button text when idle (default: "Talk to Assistant")
- **`activeText`**: Button text when recording (default: "Listening...")
- **`size`**: Button size - "small", "medium", "large" (default: "medium")
- **`className`**: Additional CSS classes for the button (default: "")
- **`container`**: DOM container selector for button rendering (required for `renderButton()`)
### Configuration Object
The `config` object supports the following options:
#### `config.endpoints`
- **`websocket`**: Custom WebSocket URL (default: "wss://call.vocallabs.ai/ws/")
#### `config.audio`
- **`userInputSampleRate`**: Microphone sample rate (default: 32000)
- **`agentOutputSampleRate`**: Agent audio sample rate - supports 48k, 24k, 16k (default: 24000)
- **`echoCancellation`**: Enable echo cancellation (default: true)
- **`noiseSuppression`**: Enable noise suppression (default: true)
## Event Handling
The SDK provides several event hooks for monitoring call status and handling errors:
```javascript
sdk.on('onCallStart', () => {
console.log('Call started');
})
.on('onCallEnd', (reason) => {
console.log('Call ended:', reason);
// Possible reasons: 'user', 'agent', 'server_initiated', 'connection_timeout', etc.
})
.on('onStatusChange', (status) => {
console.log('Status changed:', status);
// Status object includes: status, isRecording, isConnected, lastDisconnectReason
})
.on('onError', (error) => {
console.error('SDK Error:', error);
});
// Remove event listeners
sdk.off('onCallStart', callStartHandler);
```
- **`onCallStart`**: Fired when a call begins and WebSocket connection is established
- **`onCallEnd`**: Fired when a call ends, includes reason parameter
- **`onStatusChange`**: Fired when SDK status changes (connecting, connected, error, idle)
- **`onError`**: Fired when an error occurs
## API Methods
### Core Methods
- **`renderButton(container?)`**: Render the call button in the specified container
- **`startCall()`**: Programmatically start a call
- **`endCall()`**: Programmatically end a call (only works if currently recording)
- **`getStatus()`**: Get current SDK status object
- **`destroy()`**: Clean up resources and remove event listeners
### Status Object
The `getStatus()` method returns an object with the following properties:
```javascript
{
status: 'idle' | 'connecting' | 'connected' | 'error',
isRecording: boolean,
isConnected: boolean,
lastDisconnectReason: string | null
}
```
- **`on(event, callback)`**: Add event listener (returns SDK instance for chaining)
- **`off(event, callback)`**: Remove specific event listener
## How It Works
The SDK provides a complete real-time voice communication system with intelligent audio processing and WebSocket-based communication.
### Architecture Overview
1. **WebSocket Connection**: Establishes real-time bidirectional communication with the Vocallabs voice service
2. **Audio Capture**: Captures user microphone input with configurable sample rates and audio processing
3. **Real-Time Processing**: Processes and transmits audio data in real-time chunks
4. **Agent Response**: Receives and plays back agent audio responses with automatic buffering
5. **Call Management**: Handles call state, disconnection reasons, and cleanup
### Key Features
**Modern Audio Processing**:
- Uses AudioWorkletNode for modern browsers with automatic fallback to ScriptProcessorNode
- Configurable sample rates (32kHz user input, 24kHz agent output by default)
- Built-in echo cancellation and noise suppression
- Automatic audio normalization and gain control
**Intelligent Connection Management**:
- Automatic reconnection handling
- Connection timeout detection (8 seconds)
- Graceful disconnect with reason tracking
- Page unload protection to properly close connections
**Real-Time Audio Streaming**:
- Low-latency audio transmission using WebSocket
- Buffered playback for smooth agent responses
- Automatic audio queue management
- Cross-browser compatibility
**User Experience**:
- Responsive button UI with status indicators
- Visual feedback for connection states
- Configurable button sizes and text
- Accessibility support with ARIA labels
### WebSocket Protocol
The SDK communicates using a structured WebSocket protocol:
- **Connection**: `wss://call.vocallabs.ai/ws/?agent={agentId}_{callId}_web_{sampleRate}`
- **Events**: JSON-based event system for call control and media streaming
- **Audio Format**: Base64-encoded 16-bit PCM audio data
- **Status Tracking**: Real-time call status and hangup source reporting
## Advanced Configuration
### Audio Settings
```javascript
const sdk = new VocalCallSDK({
agentId: 'your-agent-id',
callId: 'your-call-id',
container: '#call-button',
config: {
audio: {
userInputSampleRate: 32000, // User microphone sample rate
agentOutputSampleRate: 24000, // Agent audio sample rate (24k/16k/48k)
echoCancellation: true, // Microphone echo cancellation
noiseSuppression: true // Microphone noise suppression
}
}
});
```
The SDK automatically applies Tailwind CSS classes for styling. You can customize the appearance by:
1. **Using the `className` parameter**:
```javascript
const sdk = new VocalCallSDK({
// ... other options
className: 'custom-call-button'
});
```
2. **Overriding default styles**:
```css
.vocal-call-wrapper button {
/* Your custom styles */
}
```
Available button sizes with their default styling:
- **`small`**: `px-3 py-1 text-sm rounded-md`
- **`medium`**: `px-4 py-2 text-base rounded-lg` (default)
- **`large`**: `px-6 py-3 text-lg rounded-xl`
The SDK provides comprehensive error handling:
```javascript
sdk.on('onError', (error) => {
console.error('VocalCallSDK Error:', error);
// Handle specific error types
if (error.type === 'microphone_access_denied') {
// Show user-friendly message about microphone permissions
} else if (error.type === 'connection_failed') {
// Handle connection issues
}
});
sdk.on('onCallEnd', (reason) => {
// Handle different disconnect reasons
switch (reason) {
case 'user':
console.log('User ended the call');
break;
case 'agent':
console.log('Agent ended the call');
break;
case 'connection_timeout':
console.log('Connection timed out');
break;
case 'page_unload':
console.log('Page was refreshed/closed during call');
break;
}
});
```
The SDK supports all modern browsers with WebRTC capabilities:
- Chrome 66+ (recommended)
- Firefox 60+
- Safari 12+
- Edge 79+
**Features**:
- Automatic fallback from AudioWorkletNode to ScriptProcessorNode for older browsers
- WebSocket support with automatic reconnection
- MediaDevices API for microphone access
1. **Always handle errors**: Implement `onError` event handlers for graceful error handling
2. **Check microphone permissions**: Ensure users grant microphone access before starting calls
3. **Provide visual feedback**: Use the status events to show connection state to users
4. **Clean up resources**: Call `destroy()` when the component is unmounted
5. **Test across browsers**: Verify functionality across different browser versions
**"Microphone access denied"**
- Ensure HTTPS is used (required for microphone access)
- Check browser microphone permissions
- Verify the site isn't blocked from accessing media
**"Connection timeout"**
- Check network connectivity
- Verify the WebSocket URL is accessible
- Ensure firewall doesn't block WebSocket connections
**"No audio from agent"**
- Check audio output devices
- Verify browser audio permissions
- Test with different audio sample rates