@volley/recognition-client-sdk
Version:
Recognition Service TypeScript/Node.js Client SDK
345 lines (265 loc) • 10.8 kB
Markdown
TypeScript SDK for real-time speech recognition via WebSocket.
```bash
npm install @volley/recognition-client-sdk
```
```typescript
import {
createClientWithBuilder,
RecognitionProvider,
DeepgramModel,
STAGES
} from '@volley/recognition-client-sdk';
// Create client with builder pattern (recommended)
const client = createClientWithBuilder(builder =>
builder
.stage(STAGES.STAGING) // ✨ Simple environment selection using enum
.provider(RecognitionProvider.DEEPGRAM)
.model(DeepgramModel.NOVA_2)
.onTranscript(result => {
console.log('Final:', result.finalTranscript);
console.log('Interim:', result.pendingTranscript);
})
.onError(error => console.error(error))
);
// Stream audio
await client.connect();
client.sendAudio(pcm16AudioChunk); // Call repeatedly with audio chunks
await client.stopRecording(); // Wait for final transcript
// Check the actual URL being used
console.log('Connected to:', client.getUrl());
```
```typescript
import {
RealTimeTwoWayWebSocketRecognitionClient,
RecognitionProvider,
DeepgramModel,
Language,
STAGES
} from '@volley/recognition-client-sdk';
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING, // ✨ Recommended: Use STAGES enum for type safety
asrRequestConfig: {
provider: RecognitionProvider.DEEPGRAM,
model: DeepgramModel.NOVA_2,
language: Language.ENGLISH_US
},
onTranscript: (result) => console.log(result),
onError: (error) => console.error(error)
});
// Check the actual URL being used
console.log('Connected to:', client.getUrl());
```
**Recommended: Use `stage` parameter with STAGES enum** for automatic environment configuration:
```typescript
import {
RecognitionProvider,
DeepgramModel,
Language,
STAGES
} from '@volley/recognition-client-sdk';
builder
.stage(STAGES.STAGING) // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION
.provider(RecognitionProvider.DEEPGRAM) // DEEPGRAM, GOOGLE
.model(DeepgramModel.NOVA_2) // Provider-specific model enum
.language(Language.ENGLISH_US) // Language enum
.interimResults(true) // Enable partial transcripts
```
**Available Stages and URLs:**
| Stage | Enum | WebSocket URL |
|-------|------|---------------|
| **Local** | `STAGES.LOCAL` | `ws://localhost:3101/ws/v1/recognize` |
| **Development** | `STAGES.DEV` | `wss://recognition-service-dev.volley-services.net/ws/v1/recognize` |
| **Staging** | `STAGES.STAGING` | `wss://recognition-service-staging.volley-services.net/ws/v1/recognize` |
| **Production** | `STAGES.PRODUCTION` | `wss://recognition-service.volley-services.net/ws/v1/recognize` |
> 💡 Using the `stage` parameter automatically constructs the correct URL for each environment.
**Automatic Connection Retry:**
The SDK **automatically retries failed connections** with sensible defaults - no configuration needed!
**Default behavior (works out of the box):**
- 4 connection attempts (try once, retry 3 times if failed)
- 200ms delay between retries
- Handles temporary service unavailability (503)
- Fast failure (~600ms total on complete failure)
- Timing: `Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4`
```typescript
import { STAGES } from '@volley/recognition-client-sdk';
// ✅ Automatic retry - no config needed!
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING,
// connectionRetry works automatically with defaults
});
```
**Optional: Customize retry behavior** (only if needed):
```typescript
const client = new RealTimeTwoWayWebSocketRecognitionClient({
stage: STAGES.STAGING,
connectionRetry: {
maxAttempts: 2, // Fewer attempts (min: 1, max: 5)
delayMs: 500 // Longer delay between attempts
}
});
```
> ⚠️ **Note**: Retry only applies to **initial connection establishment**. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).
**Advanced: Custom URL** for non-standard endpoints:
```typescript
builder
.url('wss://custom-endpoint.example.com/ws/v1/recognize') // Custom WebSocket URL
.provider(RecognitionProvider.DEEPGRAM)
// ... rest of config
```
> 💡 **Note**: If both `stage` and `url` are provided, `url` takes precedence.
```typescript
builder
.onTranscript(result => {}) // Handle transcription results
.onError(error => {}) // Handle errors
.onConnected(() => {}) // Connection established
.onDisconnected((code) => {}) // Connection closed
.onMetadata(meta => {}) // Timing information
```
```typescript
builder
.gameContext({ // Context for better recognition
gameId: 'session-123',
prompt: 'Expected responses: yes, no, maybe'
})
.userId('user-123') // User identification
.platform('web') // Platform identifier
.logger((level, msg, data) => {}) // Custom logging
```
```typescript
await client.connect(); // Establish connection
client.sendAudio(chunk); // Send PCM16 audio
await client.stopRecording(); // End and get final transcript
client.getAudioUtteranceId(); // Get session UUID
client.getUrl(); // Get actual WebSocket URL being used
client.getState(); // Get current state
client.isConnected(); // Check connection status
```
```typescript
{
type: 'Transcription'; // Message type discriminator
audioUtteranceId: string; // Session UUID
finalTranscript: string; // Confirmed text (won't change)
finalTranscriptConfidence?: number; // Confidence 0-1 for final transcript
pendingTranscript?: string; // In-progress text (may change)
pendingTranscriptConfidence?: number; // Confidence 0-1 for pending transcript
is_finished: boolean; // Transcription complete (last message)
voiceStart?: number; // Voice activity start time (ms from stream start)
voiceDuration?: number; // Voice duration (ms)
voiceEnd?: number; // Voice activity end time (ms from stream start)
startTimestamp?: number; // Transcription start timestamp (ms)
endTimestamp?: number; // Transcription end timestamp (ms)
receivedAtMs?: number; // Server receive timestamp (ms since epoch)
accumulatedAudioTimeMs?: number; // Total audio duration sent (ms)
}
```
```typescript
import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';
builder
.provider(RecognitionProvider.DEEPGRAM)
.model(DeepgramModel.NOVA_2); // NOVA_2, NOVA_3, FLUX_GENERAL_EN
```
```typescript
import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';
builder
.provider(RecognitionProvider.GOOGLE)
.model(GoogleModel.LATEST_SHORT); // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.
```
Available Google models:
- `LATEST_SHORT` - Optimized for short audio (< 1 minute)
- `LATEST_LONG` - Optimized for long audio (> 1 minute)
- `TELEPHONY` - Optimized for phone audio
- `TELEPHONY_SHORT` - Short telephony audio
- `MEDICAL_DICTATION` - Medical dictation (premium)
- `MEDICAL_CONVERSATION` - Medical conversations (premium)
## Audio Format
The SDK expects PCM16 audio:
- Format: Linear PCM (16-bit signed integers)
- Sample Rate: 16kHz recommended
- Channels: Mono
Please reach out to AI team if ther are essential reasons that we need other formats.
## Error Handling
```typescript
builder.onError(error => {
console.error(`Error ${error.code}: ${error.message}`);
});
// Check disconnection type
import { isNormalDisconnection } from '@volley/recognition-client-sdk';
builder.onDisconnected((code, reason) => {
if (!isNormalDisconnection(code)) {
console.error('Unexpected disconnect:', code);
}
});
```
**WebSocket fails to connect**
- Verify the recognition service is running
- Check the WebSocket URL format: `ws://` or `wss://`
- Ensure network allows WebSocket connections
**Authentication errors**
- Verify `audioUtteranceId` is provided
- Check if service requires additional auth headers
**No transcription results**
- Confirm audio format is PCM16, 16kHz, mono
- Check if audio chunks are being sent (use `onAudioSent` callback)
- Verify audio data is not empty or corrupted
**Poor transcription quality**
- Try different models (e.g., `NOVA_2` vs `NOVA_2_GENERAL`)
- Adjust language setting to match audio
- Ensure audio sample rate matches configuration
### Performance Issues
**High latency**
- Use smaller audio chunks (e.g., 100ms instead of 500ms)
- Choose a model optimized for real-time (e.g., Deepgram Nova 2)
- Check network latency to service
**Memory issues**
- Call `disconnect()` when done to clean up resources
- Avoid keeping multiple client instances active
## Publishing
This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).
### First-Time Setup (One-time)
After the first manual publish, configure npm Trusted Publishers:
1. Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access
2. Click "Add publisher" → Select "GitHub Actions"
3. Configure:
- **Organization**: `Volley-Inc`
- **Repository**: `recognition-service`
- **Workflow**: `sdk-release.yml`
- **Environment**: Leave empty (not required)
### How It Works
- **Automated releases**: Push to `dev` branch triggers semantic-release
- **Version bumping**: Based on conventional commits (feat/fix/BREAKING CHANGE)
- **No tokens needed**: Uses OIDC authentication with npm
- **Provenance**: Automatic supply chain attestation
- **Path filtering**: Only releases when SDK or libs change
### Manual Publishing (Not Recommended)
If needed for testing:
```bash
cd packages/client-sdk-ts
npm login --scope=@volley
pnpm build
npm publish --provenance --access public
```
This SDK is part of the Recognition Service monorepo. To contribute:
1. Make changes to SDK or libs
2. Test locally with `pnpm test`
3. Create PR to `dev` branch with conventional commit messages (`feat:`, `fix:`, etc.)
4. After merge, automated workflow will publish new version to npm
## License
Proprietary