UNPKG

@sayna-ai/node-sdk

Version:

Node.js SDK for Sayna.ai server-side WebSocket connections

406 lines (279 loc) 13.2 kB
# @sayna/node-sdk Sayna's Node.js SDK enables real-time voice interactions through WebSocket connections. Send audio for speech recognition, receive synthesized speech, and manage voice sessions from your Node.js applications. ## Features - WebSocket-based real-time voice communication - Speech-to-text (STT) with configurable providers - Text-to-speech (TTS) with audio streaming - LiveKit integration support - Type-safe message handling ## Installation ```bash npm install @sayna/node-sdk ``` ## Usage ```ts import { SaynaClient } from "@sayna/node-sdk"; const client = new SaynaClient( "https://api.sayna.ai", { provider: "deepgram", model: "nova-2" }, { provider: "cartesia", voice_id: "example-voice" } ); client.registerOnSttResult((result) => { console.log("Transcription:", result.transcript); }); client.registerOnTtsAudio((audio) => { // Handle audio buffer }); await client.connect(); await client.speak("Hello, world!"); ``` ## API ### REST API Methods These methods use HTTP endpoints and don't require an active WebSocket connection: ### `await client.health()` Performs a health check on the Sayna server. **Returns**: `Promise<{ status: string }>` - Status object with "OK" when healthy. **Example**: ```typescript const health = await client.health(); console.log(health.status); // "OK" ``` ### `await client.getVoices()` Retrieves the catalogue of text-to-speech voices grouped by provider. **Returns**: `Promise<Record<string, Voice[]>>` - Object where keys are provider names and values are arrays of voice descriptors. **Example**: ```typescript const voices = await client.getVoices(); for (const [provider, voiceList] of Object.entries(voices)) { console.log( `${provider}:`, voiceList.map((v) => v.name) ); } ``` ### `await client.speakRest(text, ttsConfig)` Synthesizes text into audio using the REST API. This is a standalone method that doesn't require an active WebSocket connection. | parameter | type | purpose | | ----------- | ----------- | --------------------------------------- | | `text` | `string` | Text to synthesize (must be non-empty). | | `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. | **Returns**: `Promise<ArrayBuffer>` - Raw audio data. **Example**: ```typescript const audioBuffer = await client.speakRest("Hello, world!", { provider: "elevenlabs", voice_id: "21m00Tcm4TlvDq8ikWAM", model: "eleven_turbo_v2", speaking_rate: 1.0, audio_format: "mp3", sample_rate: 24000, connection_timeout: 30, request_timeout: 60, pronunciations: [], }); ``` ### `await client.getLiveKitToken(roomName, participantName, participantIdentity)` Issues a LiveKit access token for a participant. | parameter | type | purpose | | --------------------- | -------- | -------------------------------------- | | `roomName` | `string` | LiveKit room to join or create. | | `participantName` | `string` | Display name for the participant. | | `participantIdentity` | `string` | Unique identifier for the participant. | **Returns**: `Promise<LiveKitTokenResponse>` - Object containing token, room name, participant identity, and LiveKit URL. **Example**: ```typescript const tokenInfo = await client.getLiveKitToken( "my-room", "John Doe", "user-123" ); console.log("Token:", tokenInfo.token); console.log("LiveKit URL:", tokenInfo.livekit_url); ``` ### `await client.getSipHooks()` Retrieves all configured SIP webhook hooks from the runtime cache. **Returns**: `Promise<SipHooksResponse>` - Object containing an array of configured hooks. **Example**: ```typescript const response = await client.getSipHooks(); for (const hook of response.hooks) { console.log(`Host: ${hook.host}, URL: ${hook.url}`); } ``` ### `await client.setSipHooks(hooks)` Sets or updates SIP webhook hooks in the runtime cache. Hooks with matching hosts will be replaced; new hosts will be added. | parameter | type | purpose | | --------- | ----------- | ---------------------------------------- | | `hooks` | `SipHook[]` | Array of SIP hook configurations to set. | Each `SipHook` object contains: | field | type | description | | ------ | -------- | --------------------------------------- | | `host` | `string` | SIP domain pattern (case-insensitive). | | `url` | `string` | HTTPS URL to forward webhook events to. | **Returns**: `Promise<SipHooksResponse>` - Object containing the merged list of all configured hooks. **Example**: ```typescript const response = await client.setSipHooks([ { host: "example.com", url: "https://webhook.example.com/events" }, { host: "another.com", url: "https://webhook.another.com/events" }, ]); console.log("Total hooks configured:", response.hooks.length); ``` --- ### WebSocket API Methods These methods require an active WebSocket connection: ### `new SaynaClient(url, sttConfig, ttsConfig, livekitConfig?, withoutAudio?)` | parameter | type | purpose | | --------------- | --------------- | ------------------------------------------------------- | | `url` | `string` | Sayna server URL (http://, https://, ws://, or wss://). | | `sttConfig` | `STTConfig` | Speech-to-text provider configuration. | | `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. | | `livekitConfig` | `LiveKitConfig` | Optional LiveKit room configuration. | | `withoutAudio` | `boolean` | Disable audio streaming (defaults to `false`). | ### `await client.connect()` Establishes WebSocket connection and sends initial configuration. Resolves when server sends ready message. ### `client.registerOnSttResult(callback)` Registers a callback for speech-to-text transcription results. ### `client.registerOnTtsAudio(callback)` Registers a callback for text-to-speech audio data (ArrayBuffer). ### `client.registerOnError(callback)` Registers a callback for error messages. ### `client.registerOnMessage(callback)` Registers a callback for participant messages. ### `client.registerOnParticipantDisconnected(callback)` Registers a callback for participant disconnection events. ### `client.registerOnTtsPlaybackComplete(callback)` Registers a callback for TTS playback completion events. ### `await client.speak(text, flush?, allowInterruption?)` Sends text to be synthesized as speech. | parameter | type | default | purpose | | ------------------- | --------- | ------- | -------------------------------- | | `text` | `string` | - | Text to synthesize. | | `flush` | `boolean` | `true` | Clear TTS queue before speaking. | | `allowInterruption` | `boolean` | `true` | Allow speech to be interrupted. | ### `await client.onAudioInput(audioData)` Sends raw audio data (ArrayBuffer) to the server for speech recognition. ### `await client.sendMessage(message, role, topic?, debug?)` Sends a message to the Sayna session with role and optional metadata. ### `await client.clear()` Clears the text-to-speech queue. ### `await client.ttsFlush(allowInterruption?)` Flushes the TTS queue by sending an empty speak command. ### `await client.disconnect()` Disconnects from the WebSocket server and cleans up resources. ### `client.ready` Boolean indicating whether the client is ready to send/receive data. ### `client.connected` Boolean indicating whether the WebSocket connection is active. ### `client.livekitRoomName` LiveKit room name acknowledged by the server, if available (present when LiveKit is enabled). ### `client.livekitUrl` LiveKit WebSocket URL configured on the server, if available. ### `client.saynaParticipantIdentity` Identity assigned to the agent participant when LiveKit is enabled, if available. ### `client.saynaParticipantName` Display name assigned to the agent participant when LiveKit is enabled, if available. --- ## Webhook Receiver The SDK includes a `WebhookReceiver` class for securely receiving and verifying cryptographically signed webhooks from Sayna's SIP service. ### Security Features - **HMAC-SHA256 Signature Verification**: Ensures webhook authenticity - **Constant-Time Comparison**: Prevents timing attack vulnerabilities - **Replay Protection**: 5-minute timestamp window prevents replay attacks - **Strict Validation**: Comprehensive checks on all required fields ### `new WebhookReceiver(secret?)` Creates a new webhook receiver instance. | parameter | type | purpose | | --------- | -------- | --------------------------------------------------------------------------- | | `secret` | `string` | HMAC signing secret (min 16 chars). Defaults to `SAYNA_WEBHOOK_SECRET` env. | **Example**: ```typescript import { WebhookReceiver } from "@sayna/node-sdk"; // Explicit secret const receiver = new WebhookReceiver("your-secret-key-min-16-chars"); // Or use environment variable process.env.SAYNA_WEBHOOK_SECRET = "your-secret-key"; const receiver = new WebhookReceiver(); ``` ### `receiver.receive(headers, body)` Verifies and parses an incoming SIP webhook. | parameter | type | purpose | | --------- | ------------------------------------------------- | --------------------------------------------- | | `headers` | `Record<string, string \| string[] \| undefined>` | HTTP request headers (case-insensitive). | | `body` | `string` | Raw request body as string (not parsed JSON). | **Returns**: `WebhookSIPOutput` - Parsed and validated webhook payload. **Throws**: `SaynaValidationError` if signature verification fails or payload is invalid. ### Express Example ```typescript import express from "express"; import { WebhookReceiver } from "@sayna/node-sdk"; const app = express(); const receiver = new WebhookReceiver("your-secret-key-min-16-chars"); app.post( "/webhook", express.json({ verify: (req, res, buf) => { (req as any).rawBody = buf.toString("utf8"); }, }), (req, res) => { try { const webhook = receiver.receive(req.headers, (req as any).rawBody); console.log("Valid webhook received:"); console.log(" From:", webhook.from_phone_number); console.log(" To:", webhook.to_phone_number); console.log(" Room:", webhook.room.name); console.log(" SIP Host:", webhook.sip_host); console.log(" Participant:", webhook.participant.identity); res.status(200).json({ received: true }); } catch (error) { console.error("Webhook verification failed:", error.message); res.status(401).json({ error: "Invalid signature" }); } } ); ``` ### Fastify Example ```typescript import Fastify from "fastify"; import { WebhookReceiver } from "@sayna/node-sdk"; const fastify = Fastify(); const receiver = new WebhookReceiver(); fastify.post( "/webhook", { config: { rawBody: true }, }, async (request, reply) => { try { const webhook = receiver.receive(request.headers, request.rawBody); return { received: true }; } catch (error) { reply.code(401); return { error: error.message }; } } ); ``` ### WebhookSIPOutput Type The `receive` method returns a `WebhookSIPOutput` object with the following structure: | field | type | description | | ---------------------- | ----------------------- | -------------------------------------------- | | `participant` | `WebhookSIPParticipant` | SIP participant information. | | `participant.identity` | `string` | Unique identity assigned to the participant. | | `participant.sid` | `string` | Participant session ID from LiveKit. | | `participant.name` | `string?` | Display name (optional). | | `room` | `WebhookSIPRoom` | LiveKit room information. | | `room.name` | `string` | Name of the LiveKit room. | | `room.sid` | `string` | Room session ID from LiveKit. | | `from_phone_number` | `string` | Caller's phone number (E.164 format). | | `to_phone_number` | `string` | Called phone number (E.164 format). | | `room_prefix` | `string` | Room name prefix configured in Sayna. | | `sip_host` | `string` | SIP domain extracted from the To header. | ## Development ```bash bun install bun run typecheck bun run build ``` The repository uses Bun for dependency management and builds. The `build` script emits ready-to-publish JavaScript and type definitions in `dist/`.