UNPKG

@volley/recognition-client-sdk

Version:

Recognition Service TypeScript/Node.js Client SDK

345 lines (265 loc) 10.8 kB
# @volley/recognition-client-sdk TypeScript SDK for real-time speech recognition via WebSocket. ## Installation ```bash npm install @volley/recognition-client-sdk ``` ## Quick Start ```typescript import { createClientWithBuilder, RecognitionProvider, DeepgramModel, STAGES } from '@volley/recognition-client-sdk'; // Create client with builder pattern (recommended) const client = createClientWithBuilder(builder => builder .stage(STAGES.STAGING) // ✨ Simple environment selection using enum .provider(RecognitionProvider.DEEPGRAM) .model(DeepgramModel.NOVA_2) .onTranscript(result => { console.log('Final:', result.finalTranscript); console.log('Interim:', result.pendingTranscript); }) .onError(error => console.error(error)) ); // Stream audio await client.connect(); client.sendAudio(pcm16AudioChunk); // Call repeatedly with audio chunks await client.stopRecording(); // Wait for final transcript // Check the actual URL being used console.log('Connected to:', client.getUrl()); ``` ### Alternative: Direct Client Creation ```typescript import { RealTimeTwoWayWebSocketRecognitionClient, RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk'; const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, // ✨ Recommended: Use STAGES enum for type safety asrRequestConfig: { provider: RecognitionProvider.DEEPGRAM, model: DeepgramModel.NOVA_2, language: Language.ENGLISH_US }, onTranscript: (result) => console.log(result), onError: (error) => console.error(error) }); // Check the actual URL being used console.log('Connected to:', client.getUrl()); ``` ## Configuration ### Environment Selection **Recommended: Use `stage` parameter with STAGES enum** for automatic environment configuration: ```typescript import { RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk'; builder .stage(STAGES.STAGING) // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION .provider(RecognitionProvider.DEEPGRAM) // DEEPGRAM, GOOGLE .model(DeepgramModel.NOVA_2) // Provider-specific model enum .language(Language.ENGLISH_US) // Language enum .interimResults(true) // Enable partial transcripts ``` **Available Stages and URLs:** | Stage | Enum | WebSocket URL | |-------|------|---------------| | **Local** | `STAGES.LOCAL` | `ws://localhost:3101/ws/v1/recognize` | | **Development** | `STAGES.DEV` | `wss://recognition-service-dev.volley-services.net/ws/v1/recognize` | | **Staging** | `STAGES.STAGING` | `wss://recognition-service-staging.volley-services.net/ws/v1/recognize` | | **Production** | `STAGES.PRODUCTION` | `wss://recognition-service.volley-services.net/ws/v1/recognize` | > 💡 Using the `stage` parameter automatically constructs the correct URL for each environment. **Automatic Connection Retry:** The SDK **automatically retries failed connections** with sensible defaults - no configuration needed! **Default behavior (works out of the box):** - 4 connection attempts (try once, retry 3 times if failed) - 200ms delay between retries - Handles temporary service unavailability (503) - Fast failure (~600ms total on complete failure) - Timing: `Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4` ```typescript import { STAGES } from '@volley/recognition-client-sdk'; // ✅ Automatic retry - no config needed! const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, // connectionRetry works automatically with defaults }); ``` **Optional: Customize retry behavior** (only if needed): ```typescript const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, connectionRetry: { maxAttempts: 2, // Fewer attempts (min: 1, max: 5) delayMs: 500 // Longer delay between attempts } }); ``` > ⚠️ **Note**: Retry only applies to **initial connection establishment**. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this). **Advanced: Custom URL** for non-standard endpoints: ```typescript builder .url('wss://custom-endpoint.example.com/ws/v1/recognize') // Custom WebSocket URL .provider(RecognitionProvider.DEEPGRAM) // ... rest of config ``` > 💡 **Note**: If both `stage` and `url` are provided, `url` takes precedence. ### Event Handlers ```typescript builder .onTranscript(result => {}) // Handle transcription results .onError(error => {}) // Handle errors .onConnected(() => {}) // Connection established .onDisconnected((code) => {}) // Connection closed .onMetadata(meta => {}) // Timing information ``` ### Optional Parameters ```typescript builder .gameContext({ // Context for better recognition gameId: 'session-123', prompt: 'Expected responses: yes, no, maybe' }) .userId('user-123') // User identification .platform('web') // Platform identifier .logger((level, msg, data) => {}) // Custom logging ``` ## API Reference ### Client Methods ```typescript await client.connect(); // Establish connection client.sendAudio(chunk); // Send PCM16 audio await client.stopRecording(); // End and get final transcript client.getAudioUtteranceId(); // Get session UUID client.getUrl(); // Get actual WebSocket URL being used client.getState(); // Get current state client.isConnected(); // Check connection status ``` ### TranscriptionResult ```typescript { type: 'Transcription'; // Message type discriminator audioUtteranceId: string; // Session UUID finalTranscript: string; // Confirmed text (won't change) finalTranscriptConfidence?: number; // Confidence 0-1 for final transcript pendingTranscript?: string; // In-progress text (may change) pendingTranscriptConfidence?: number; // Confidence 0-1 for pending transcript is_finished: boolean; // Transcription complete (last message) voiceStart?: number; // Voice activity start time (ms from stream start) voiceDuration?: number; // Voice duration (ms) voiceEnd?: number; // Voice activity end time (ms from stream start) startTimestamp?: number; // Transcription start timestamp (ms) endTimestamp?: number; // Transcription end timestamp (ms) receivedAtMs?: number; // Server receive timestamp (ms since epoch) accumulatedAudioTimeMs?: number; // Total audio duration sent (ms) } ``` ## Providers ### Deepgram ```typescript import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk'; builder .provider(RecognitionProvider.DEEPGRAM) .model(DeepgramModel.NOVA_2); // NOVA_2, NOVA_3, FLUX_GENERAL_EN ``` ### Google Cloud Speech-to-Text ```typescript import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk'; builder .provider(RecognitionProvider.GOOGLE) .model(GoogleModel.LATEST_SHORT); // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc. ``` Available Google models: - `LATEST_SHORT` - Optimized for short audio (< 1 minute) - `LATEST_LONG` - Optimized for long audio (> 1 minute) - `TELEPHONY` - Optimized for phone audio - `TELEPHONY_SHORT` - Short telephony audio - `MEDICAL_DICTATION` - Medical dictation (premium) - `MEDICAL_CONVERSATION` - Medical conversations (premium) ## Audio Format The SDK expects PCM16 audio: - Format: Linear PCM (16-bit signed integers) - Sample Rate: 16kHz recommended - Channels: Mono Please reach out to AI team if ther are essential reasons that we need other formats. ## Error Handling ```typescript builder.onError(error => { console.error(`Error ${error.code}: ${error.message}`); }); // Check disconnection type import { isNormalDisconnection } from '@volley/recognition-client-sdk'; builder.onDisconnected((code, reason) => { if (!isNormalDisconnection(code)) { console.error('Unexpected disconnect:', code); } }); ``` ## Troubleshooting ### Connection Issues **WebSocket fails to connect** - Verify the recognition service is running - Check the WebSocket URL format: `ws://` or `wss://` - Ensure network allows WebSocket connections **Authentication errors** - Verify `audioUtteranceId` is provided - Check if service requires additional auth headers ### Audio Issues **No transcription results** - Confirm audio format is PCM16, 16kHz, mono - Check if audio chunks are being sent (use `onAudioSent` callback) - Verify audio data is not empty or corrupted **Poor transcription quality** - Try different models (e.g., `NOVA_2` vs `NOVA_2_GENERAL`) - Adjust language setting to match audio - Ensure audio sample rate matches configuration ### Performance Issues **High latency** - Use smaller audio chunks (e.g., 100ms instead of 500ms) - Choose a model optimized for real-time (e.g., Deepgram Nova 2) - Check network latency to service **Memory issues** - Call `disconnect()` when done to clean up resources - Avoid keeping multiple client instances active ## Publishing This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC). ### First-Time Setup (One-time) After the first manual publish, configure npm Trusted Publishers: 1. Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access 2. Click "Add publisher" → Select "GitHub Actions" 3. Configure: - **Organization**: `Volley-Inc` - **Repository**: `recognition-service` - **Workflow**: `sdk-release.yml` - **Environment**: Leave empty (not required) ### How It Works - **Automated releases**: Push to `dev` branch triggers semantic-release - **Version bumping**: Based on conventional commits (feat/fix/BREAKING CHANGE) - **No tokens needed**: Uses OIDC authentication with npm - **Provenance**: Automatic supply chain attestation - **Path filtering**: Only releases when SDK or libs change ### Manual Publishing (Not Recommended) If needed for testing: ```bash cd packages/client-sdk-ts npm login --scope=@volley pnpm build npm publish --provenance --access public ``` ## Contributing This SDK is part of the Recognition Service monorepo. To contribute: 1. Make changes to SDK or libs 2. Test locally with `pnpm test` 3. Create PR to `dev` branch with conventional commit messages (`feat:`, `fix:`, etc.) 4. After merge, automated workflow will publish new version to npm ## License Proprietary