UNPKG

@spatialwalk/avatarkit

Version:

SPAvatar SDK - 3D Gaussian Splatting Avatar Rendering SDK

678 lines (510 loc) 22.4 kB
# SPAvatarKit SDK Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supporting audio-driven animation rendering and high-quality 3D rendering. ## 🚀 Features - **3D Gaussian Splatting Rendering** - Based on the latest point cloud rendering technology, providing high-quality 3D virtual avatars - **Audio-Driven Real-Time Animation Rendering** - Users provide audio data, SDK handles receiving animation data and rendering - **WebGPU/WebGL Dual Rendering Backend** - Automatically selects the best rendering backend for compatibility - **WASM High-Performance Computing** - Uses C++ compiled WebAssembly modules for geometric calculations - **TypeScript Support** - Complete type definitions and IntelliSense - **Modular Architecture** - Clear component separation, easy to integrate and extend ## 📦 Installation ```bash npm install @spatialwalk/avatarkit ``` ## 🎯 Quick Start ### Basic Usage ```typescript import { AvatarKit, AvatarManager, AvatarView, Configuration, Environment } from '@spatialwalk/avatarkit' // 1. Initialize SDK const configuration: Configuration = { environment: Environment.test, } await AvatarKit.initialize('your-app-id', configuration) // Set sessionToken (if needed, call separately) // AvatarKit.setSessionToken('your-session-token') // 2. Load character const avatarManager = new AvatarManager() const avatar = await avatarManager.load('character-id', (progress) => { console.log(`Loading progress: ${progress.progress}%`) }) // 3. Create view (automatically creates Canvas and AvatarController) // Network mode (default) const container = document.getElementById('avatar-container') const avatarView = new AvatarView(avatar, { container: container, playbackMode: 'network' // Optional, 'network' is default }) // 4. Start real-time communication (network mode only) await avatarView.avatarController.start() // 5. Send audio data (network mode) // ⚠️ Important: Audio must be 16kHz mono PCM16 format // If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes) const audioData = audioUint8.slice().buffer // Simplified conversion, works for ArrayBuffer and SharedArrayBuffer avatarView.avatarController.send(audioData, false) // Send audio data, will automatically start playing after accumulating enough data avatarView.avatarController.send(audioData, true) // end=true means immediately return animation data, no longer accumulating ``` ### External Data Mode Example ```typescript import { AvatarPlaybackMode } from '@spatialwalk/avatarkit' // 1-3. Same as network mode (initialize SDK, load character) // 3. Create view with external data mode const container = document.getElementById('avatar-container') const avatarView = new AvatarView(avatar, { container: container, playbackMode: AvatarPlaybackMode.external }) // 4. Start playback with initial data (obtained from your service) // Note: Audio and animation data should be obtained from your backend service const initialAudioChunks = [{ data: audioData1, isLast: false }, { data: audioData2, isLast: false }] const initialKeyframes = animationData1 // Animation keyframes from your service await avatarView.avatarController.play(initialAudioChunks, initialKeyframes) // 5. Stream additional data as needed avatarView.avatarController.sendAudioChunk(audioData3, false) avatarView.avatarController.sendKeyframes(animationData2) ``` ### Complete Examples Check the example code in the GitHub repository for complete usage flows for both modes. **Example Project:** [Avatarkit-web-demo](https://github.com/spatialwalk/Avatarkit-web-demo) This repository contains complete examples for Vanilla JS, Vue 3, and React, demonstrating: - Network mode: Real-time audio input with automatic animation data reception - External data mode: Custom data sources with manual audio/animation data management ## 🏗️ Architecture Overview ### Three-Layer Architecture The SDK uses a three-layer architecture for clear separation of concerns: 1. **Rendering Layer (AvatarView)** - Responsible for 3D rendering only 2. **Playback Layer (AvatarController)** - Manages audio/animation synchronization and playback 3. **Network Layer (NetworkLayer)** - Handles WebSocket communication (only in network mode) ### Core Components - **AvatarKit** - SDK initialization and management - **AvatarManager** - Character resource loading and management - **AvatarView** - 3D rendering view (rendering layer) - **AvatarController** - Audio/animation playback controller (playback layer) - **NetworkLayer** - WebSocket communication (network layer, automatically composed in network mode) - **AvatarCoreAdapter** - WASM module adapter ### Playback Modes The SDK supports two playback modes, configured when creating `AvatarView`: #### 1. Network Mode (Default) - SDK handles WebSocket communication automatically - Send audio data via `AvatarController.send()` - SDK receives animation data from backend and synchronizes playback - Best for: Real-time audio input scenarios #### 2. External Data Mode - External components manage their own network/data fetching - External components provide both audio and animation data - SDK only handles synchronized playback - Best for: Custom data sources, pre-recorded content, or custom network implementations ### Data Flow #### Network Mode Flow ``` User audio input (16kHz mono PCM16) ↓ AvatarController.send() ↓ NetworkLayer → WebSocket → Backend processing ↓ Backend returns animation data (FLAME keyframes) ↓ NetworkLayer → AvatarController → AnimationPlayer ↓ FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data ↓ AvatarController (playback loop) → AvatarView.renderRealtimeFrame() ↓ RenderSystem → WebGPU/WebGL → Canvas rendering ``` #### External Data Mode Flow ``` External data source (audio + animation) ↓ AvatarController.play(initialAudio, initialKeyframes) // Start playback ↓ AvatarController.sendAudioChunk() // Stream additional audio AvatarController.sendKeyframes() // Stream additional animation ↓ AvatarController → AnimationPlayer (synchronized playback) ↓ FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data ↓ AvatarController (playback loop) → AvatarView.renderRealtimeFrame() ↓ RenderSystem → WebGPU/WebGL → Canvas rendering ``` **Note:** - In network mode, users provide audio data, SDK handles network communication and animation data reception - In external data mode, users provide both audio and animation data, SDK handles synchronized playback only ### Audio Format Requirements **⚠️ Important:** The SDK requires audio data to be in **16kHz mono PCM16** format: - **Sample Rate**: 16kHz (16000 Hz) - This is a backend requirement - **Channels**: Mono (single channel) - **Format**: PCM16 (16-bit signed integer, little-endian) - **Byte Order**: Little-endian **Audio Data Format:** - Each sample is 2 bytes (16-bit) - Audio data should be provided as `ArrayBuffer` or `Uint8Array` - For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes **Resampling:** - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK - For high-quality resampling, we recommend using Web Audio API's `OfflineAudioContext` with anti-aliasing filtering - See example projects for resampling implementation ## 📚 API Reference ### AvatarKit The core management class of the SDK, responsible for initialization and global configuration. ```typescript // Initialize SDK await AvatarKit.initialize(appId: string, configuration: Configuration) // Check initialization status const isInitialized = AvatarKit.isInitialized // Get initialized app ID const appId = AvatarKit.appId // Get configuration const config = AvatarKit.configuration // Set sessionToken (if needed, call separately) AvatarKit.setSessionToken('your-session-token') // Set userId (optional, for telemetry) AvatarKit.setUserId('user-id') // Get sessionToken const sessionToken = AvatarKit.sessionToken // Get userId const userId = AvatarKit.userId // Get SDK version const version = AvatarKit.version // Cleanup resources (must be called when no longer in use) AvatarKit.cleanup() ``` ### AvatarManager Character resource manager, responsible for downloading, caching, and loading character data. ```typescript const manager = new AvatarManager() // Load character const avatar = await manager.load( characterId: string, onProgress?: (progress: LoadProgressInfo) => void ) // Clear cache manager.clearCache() ``` ### AvatarView 3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`. **⚠️ Important Limitation:** Currently, the SDK only supports one AvatarView instance at a time. If you need to switch characters, you must first call the `dispose()` method to clean up the current AvatarView, then create a new instance. **Playback Mode Configuration:** - The playback mode is fixed when creating `AvatarView` and persists throughout its lifecycle - Cannot be changed after creation ```typescript import { AvatarPlaybackMode } from '@spatialwalk/avatarkit' // Create view (Canvas is automatically added to container) // Network mode (default) const container = document.getElementById('avatar-container') const avatarView = new AvatarView(avatar: Avatar, { container: container, playbackMode: AvatarPlaybackMode.network // Optional, default is 'network' }) // External data mode const avatarView = new AvatarView(avatar: Avatar, { container: container, playbackMode: AvatarPlaybackMode.external }) // Get playback mode const mode = avatarView.playbackMode // 'network' | 'external' // Cleanup resources (must be called before switching characters) avatarView.dispose() ``` **Character Switching Example:** ```typescript // Before switching characters, must clean up old AvatarView first if (currentAvatarView) { currentAvatarView.dispose() currentAvatarView = null } // Load new character const newAvatar = await avatarManager.load('new-character-id') // Create new AvatarView (with same or different playback mode) currentAvatarView = new AvatarView(newAvatar, { container: container, playbackMode: AvatarPlaybackMode.network }) // Network mode: start connection if (currentAvatarView.playbackMode === AvatarPlaybackMode.network) { await currentAvatarView.avatarController.start() } ``` ### AvatarController Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically composes `NetworkLayer` in network mode. **Two Usage Patterns:** #### Network Mode Methods ```typescript // Start WebSocket service await avatarView.avatarController.start() // Send audio data (SDK handles receiving animation data automatically) avatarView.avatarController.send(audioData: ArrayBuffer, end: boolean) // audioData: Audio data (ArrayBuffer format, must be 16kHz mono PCM16) // - Sample rate: 16kHz (16000 Hz) - backend requirement // - Format: PCM16 (16-bit signed integer, little-endian) // - Channels: Mono (single channel) // - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes // end: false (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data // end: true - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response // Close WebSocket service avatarView.avatarController.close() ``` #### External Data Mode Methods ```typescript // Start playback with initial audio and animation data await avatarView.avatarController.play( initialAudioChunks?: Array<{ data: Uint8Array, isLast: boolean }>, // Initial audio chunks (16kHz mono PCM16) initialKeyframes?: any[] // Initial animation keyframes (obtained from your service) ) // Stream additional audio chunks (after play() is called) avatarView.avatarController.sendAudioChunk( data: Uint8Array, // Audio chunk data isLast: boolean = false // Whether this is the last chunk ) // Stream additional animation keyframes (after play() is called) avatarView.avatarController.sendKeyframes( keyframes: any[] // Additional animation keyframes (obtained from your service) ) ``` #### Common Methods (Both Modes) ```typescript // Interrupt current playback (stops and clears data) avatarView.avatarController.interrupt() // Clear all data and resources avatarView.avatarController.clear() // Set event callbacks avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // Network mode only avatarView.avatarController.onAvatarState = (state: AvatarState) => {} avatarView.avatarController.onError = (error: Error) => {} ``` **Important Notes:** - `start()` and `close()` are only available in network mode - `play()`, `sendAudioChunk()`, and `sendKeyframes()` are only available in external data mode - `interrupt()` and `clear()` are available in both modes - The playback mode is determined when creating `AvatarView` and cannot be changed ## 🔧 Configuration ### Configuration ```typescript interface Configuration { environment: Environment } ``` **Description:** - `environment`: Specifies the environment (cn/us/test), SDK will automatically use the corresponding API address and WebSocket address based on the environment - `sessionToken`: Set separately via `AvatarKit.setSessionToken()`, not in Configuration ```typescript enum Environment { cn = 'cn', // China region us = 'us', // US region test = 'test' // Test environment } ``` ### AvatarViewOptions ```typescript interface AvatarViewOptions { playbackMode?: AvatarPlaybackMode // Playback mode, default is 'network' container?: HTMLElement // Canvas container element } ``` **Description:** - `playbackMode`: Specifies the playback mode (`'network'` or `'external'`), default is `'network'` - `'network'`: SDK handles WebSocket communication, send audio via `send()` - `'external'`: External components provide audio and animation data, SDK handles synchronized playback - `container`: Optional container element for Canvas, if not provided, Canvas will be created but not added to DOM ```typescript enum AvatarPlaybackMode { network = 'network', // Network mode: SDK handles WebSocket communication external = 'external' // External data mode: External provides data, SDK handles playback } ``` ### CameraConfig ```typescript interface CameraConfig { position: [number, number, number] // Camera position target: [number, number, number] // Camera target fov: number // Field of view angle near: number // Near clipping plane far: number // Far clipping plane up?: [number, number, number] // Up direction aspect?: number // Aspect ratio } ``` ## 📊 State Management ### ConnectionState ```typescript enum ConnectionState { disconnected = 'disconnected', connecting = 'connecting', connected = 'connected', failed = 'failed' } ``` ### AvatarState ```typescript enum AvatarState { idle = 'idle', // Idle state, showing breathing animation active = 'active', // Active, waiting for playable content playing = 'playing' // Playing } ``` ## 🎨 Rendering System The SDK supports two rendering backends: - **WebGPU** - High-performance rendering for modern browsers - **WebGL** - Better compatibility traditional rendering The rendering system automatically selects the best backend, no manual configuration needed. ## 🔍 Debugging and Monitoring ### Logging System The SDK has a built-in complete logging system, supporting different levels of log output: ```typescript import { logger } from '@spatialwalk/avatarkit' // Set log level logger.setLevel('verbose') // 'basic' | 'verbose' // Manual log output logger.log('Info message') logger.warn('Warning message') logger.error('Error message') ``` ### Performance Monitoring The SDK provides performance monitoring interfaces to monitor rendering performance: ```typescript // Get rendering performance statistics const stats = avatarView.getPerformanceStats() if (stats) { console.log(`Render time: ${stats.renderTime.toFixed(2)}ms`) console.log(`Sort time: ${stats.sortTime.toFixed(2)}ms`) console.log(`Rendering backend: ${stats.backend}`) // Calculate frame rate const fps = 1000 / stats.renderTime console.log(`Frame rate: ${fps.toFixed(2)} FPS`) } // Regular performance monitoring setInterval(() => { const stats = avatarView.getPerformanceStats() if (stats) { // Send to monitoring service or display on UI console.log('Performance:', stats) } }, 1000) ``` **Performance Statistics Description:** - `renderTime`: Total rendering time (milliseconds), includes sorting and GPU rendering - `sortTime`: Sorting time (milliseconds), uses Radix Sort algorithm to depth-sort point cloud - `backend`: Currently used rendering backend (`'webgpu'` | `'webgl'` | `null`) ## 🚨 Error Handling ### SPAvatarError The SDK uses custom error types, providing more detailed error information: ```typescript import { SPAvatarError } from '@spatialwalk/avatarkit' try { await avatarView.avatarController.start() } catch (error) { if (error instanceof SPAvatarError) { console.error('SDK Error:', error.message, error.code) } else { console.error('Unknown error:', error) } } ``` ### Error Callbacks ```typescript avatarView.avatarController.onError = (error: Error) => { console.error('AvatarController error:', error) // Handle error, such as reconnection, user notification, etc. } ``` ## 🔄 Resource Management ### Lifecycle Management #### Network Mode Lifecycle ```typescript // Initialize const container = document.getElementById('avatar-container') const avatarView = new AvatarView(avatar, { container: container, playbackMode: AvatarPlaybackMode.network }) await avatarView.avatarController.start() // Use avatarView.avatarController.send(audioData, false) // Cleanup avatarView.avatarController.close() avatarView.dispose() // Automatically cleans up all resources ``` #### External Data Mode Lifecycle ```typescript // Initialize const container = document.getElementById('avatar-container') const avatarView = new AvatarView(avatar, { container: container, playbackMode: AvatarPlaybackMode.external }) // Use const initialAudioChunks = [{ data: audioData1, isLast: false }] await avatarView.avatarController.play(initialAudioChunks, initialKeyframes) avatarView.avatarController.sendAudioChunk(audioChunk, false) avatarView.avatarController.sendKeyframes(keyframes) // Cleanup avatarView.avatarController.clear() // Clear all data and resources avatarView.dispose() // Automatically cleans up all resources ``` **⚠️ Important Notes:** - SDK currently only supports one AvatarView instance at a time - When switching characters, must first call `dispose()` to clean up old AvatarView, then create new instance - Not properly cleaning up may cause resource leaks and rendering errors - In network mode, call `close()` before `dispose()` to properly close WebSocket connections - In external data mode, call `clear()` before `dispose()` to clear all playback data ### Memory Optimization - SDK automatically manages WASM memory allocation - Supports dynamic loading/unloading of character and animation resources - Provides memory usage monitoring interface ### Audio Data Sending #### Network Mode The `send()` method receives audio data in `ArrayBuffer` format: **Audio Format Requirements:** - **Sample Rate**: 16kHz (16000 Hz) - **Backend requirement, must be exactly 16kHz** - **Format**: PCM16 (16-bit signed integer, little-endian) - **Channels**: Mono (single channel) - **Data Size**: Each sample is 2 bytes, so 1 second of audio = 16000 samples × 2 bytes = 32000 bytes **Usage:** - `audioData`: Audio data (ArrayBuffer format, must be 16kHz mono PCM16) - `end=false` (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data - `end=true` - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response - **Important**: No need to wait for `end=true` to start playing, it will automatically start playing after accumulating enough audio data #### External Data Mode The `play()` method starts playback with initial data, then use `sendAudioChunk()` to stream additional audio: **Audio Format Requirements:** - Same as network mode: 16kHz mono PCM16 format - Audio data should be provided as `Uint8Array` in chunks with `isLast` flag **Usage:** ```typescript // Start playback with initial audio and animation data // Note: Audio and animation data should be obtained from your backend service const initialAudioChunks = [ { data: audioData1, isLast: false }, { data: audioData2, isLast: false } ] await avatarController.play(initialAudioChunks, initialKeyframes) // Stream additional audio chunks avatarController.sendAudioChunk(audioChunk, isLast) ``` **Resampling (Both Modes):** - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you **must** resample it to 16kHz before sending - For high-quality resampling, use Web Audio API's `OfflineAudioContext` with anti-aliasing filtering - See example projects (`vanilla`, `react`, `vue`) for complete resampling implementation ## 🌐 Browser Compatibility - **Chrome/Edge** 90+ (WebGPU recommended) - **Firefox** 90+ (WebGL) - **Safari** 14+ (WebGL) - **Mobile** iOS 14+, Android 8+ ## 📝 License MIT License ## 🤝 Contributing Issues and Pull Requests are welcome! ## 📞 Support For questions, please contact: - Email: support@spavatar.com - Documentation: https://docs.spavatar.com - GitHub: https://github.com/spavatar/sdk