@spatialwalk/avatarkit
Version:
SPAvatar SDK - 3D Gaussian Splatting Avatar Rendering SDK
678 lines (510 loc) • 22.4 kB
Markdown
# SPAvatarKit SDK
Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supporting audio-driven animation rendering and high-quality 3D rendering.
## 🚀 Features
- **3D Gaussian Splatting Rendering** - Based on the latest point cloud rendering technology, providing high-quality 3D virtual avatars
- **Audio-Driven Real-Time Animation Rendering** - Users provide audio data, SDK handles receiving animation data and rendering
- **WebGPU/WebGL Dual Rendering Backend** - Automatically selects the best rendering backend for compatibility
- **WASM High-Performance Computing** - Uses C++ compiled WebAssembly modules for geometric calculations
- **TypeScript Support** - Complete type definitions and IntelliSense
- **Modular Architecture** - Clear component separation, easy to integrate and extend
## 📦 Installation
```bash
npm install @spatialwalk/avatarkit
```
## 🎯 Quick Start
### Basic Usage
```typescript
import {
AvatarKit,
AvatarManager,
AvatarView,
Configuration,
Environment
} from '@spatialwalk/avatarkit'
// 1. Initialize SDK
const configuration: Configuration = {
environment: Environment.test,
}
await AvatarKit.initialize('your-app-id', configuration)
// Set sessionToken (if needed, call separately)
// AvatarKit.setSessionToken('your-session-token')
// 2. Load character
const avatarManager = new AvatarManager()
const avatar = await avatarManager.load('character-id', (progress) => {
console.log(`Loading progress: ${progress.progress}%`)
})
// 3. Create view (automatically creates Canvas and AvatarController)
// Network mode (default)
const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar, {
container: container,
playbackMode: 'network' // Optional, 'network' is default
})
// 4. Start real-time communication (network mode only)
await avatarView.avatarController.start()
// 5. Send audio data (network mode)
// ⚠️ Important: Audio must be 16kHz mono PCM16 format
// If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer
const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes)
const audioData = audioUint8.slice().buffer // Simplified conversion, works for ArrayBuffer and SharedArrayBuffer
avatarView.avatarController.send(audioData, false) // Send audio data, will automatically start playing after accumulating enough data
avatarView.avatarController.send(audioData, true) // end=true means immediately return animation data, no longer accumulating
```
### External Data Mode Example
```typescript
import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
// 1-3. Same as network mode (initialize SDK, load character)
// 3. Create view with external data mode
const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar, {
container: container,
playbackMode: AvatarPlaybackMode.external
})
// 4. Start playback with initial data (obtained from your service)
// Note: Audio and animation data should be obtained from your backend service
const initialAudioChunks = [{ data: audioData1, isLast: false }, { data: audioData2, isLast: false }]
const initialKeyframes = animationData1 // Animation keyframes from your service
await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
// 5. Stream additional data as needed
avatarView.avatarController.sendAudioChunk(audioData3, false)
avatarView.avatarController.sendKeyframes(animationData2)
```
### Complete Examples
Check the example code in the GitHub repository for complete usage flows for both modes.
**Example Project:** [Avatarkit-web-demo](https://github.com/spatialwalk/Avatarkit-web-demo)
This repository contains complete examples for Vanilla JS, Vue 3, and React, demonstrating:
- Network mode: Real-time audio input with automatic animation data reception
- External data mode: Custom data sources with manual audio/animation data management
## 🏗️ Architecture Overview
### Three-Layer Architecture
The SDK uses a three-layer architecture for clear separation of concerns:
1. **Rendering Layer (AvatarView)** - Responsible for 3D rendering only
2. **Playback Layer (AvatarController)** - Manages audio/animation synchronization and playback
3. **Network Layer (NetworkLayer)** - Handles WebSocket communication (only in network mode)
### Core Components
- **AvatarKit** - SDK initialization and management
- **AvatarManager** - Character resource loading and management
- **AvatarView** - 3D rendering view (rendering layer)
- **AvatarController** - Audio/animation playback controller (playback layer)
- **NetworkLayer** - WebSocket communication (network layer, automatically composed in network mode)
- **AvatarCoreAdapter** - WASM module adapter
### Playback Modes
The SDK supports two playback modes, configured when creating `AvatarView`:
#### 1. Network Mode (Default)
- SDK handles WebSocket communication automatically
- Send audio data via `AvatarController.send()`
- SDK receives animation data from backend and synchronizes playback
- Best for: Real-time audio input scenarios
#### 2. External Data Mode
- External components manage their own network/data fetching
- External components provide both audio and animation data
- SDK only handles synchronized playback
- Best for: Custom data sources, pre-recorded content, or custom network implementations
### Data Flow
#### Network Mode Flow
```
User audio input (16kHz mono PCM16)
↓
AvatarController.send()
↓
NetworkLayer → WebSocket → Backend processing
↓
Backend returns animation data (FLAME keyframes)
↓
NetworkLayer → AvatarController → AnimationPlayer
↓
FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data
↓
AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
↓
RenderSystem → WebGPU/WebGL → Canvas rendering
```
#### External Data Mode Flow
```
External data source (audio + animation)
↓
AvatarController.play(initialAudio, initialKeyframes) // Start playback
↓
AvatarController.sendAudioChunk() // Stream additional audio
AvatarController.sendKeyframes() // Stream additional animation
↓
AvatarController → AnimationPlayer (synchronized playback)
↓
FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data
↓
AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
↓
RenderSystem → WebGPU/WebGL → Canvas rendering
```
**Note:**
- In network mode, users provide audio data, SDK handles network communication and animation data reception
- In external data mode, users provide both audio and animation data, SDK handles synchronized playback only
### Audio Format Requirements
**⚠️ Important:** The SDK requires audio data to be in **16kHz mono PCM16** format:
- **Sample Rate**: 16kHz (16000 Hz) - This is a backend requirement
- **Channels**: Mono (single channel)
- **Format**: PCM16 (16-bit signed integer, little-endian)
- **Byte Order**: Little-endian
**Audio Data Format:**
- Each sample is 2 bytes (16-bit)
- Audio data should be provided as `ArrayBuffer` or `Uint8Array`
- For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
**Resampling:**
- If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK
- For high-quality resampling, we recommend using Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
- See example projects for resampling implementation
## 📚 API Reference
### AvatarKit
The core management class of the SDK, responsible for initialization and global configuration.
```typescript
// Initialize SDK
await AvatarKit.initialize(appId: string, configuration: Configuration)
// Check initialization status
const isInitialized = AvatarKit.isInitialized
// Get initialized app ID
const appId = AvatarKit.appId
// Get configuration
const config = AvatarKit.configuration
// Set sessionToken (if needed, call separately)
AvatarKit.setSessionToken('your-session-token')
// Set userId (optional, for telemetry)
AvatarKit.setUserId('user-id')
// Get sessionToken
const sessionToken = AvatarKit.sessionToken
// Get userId
const userId = AvatarKit.userId
// Get SDK version
const version = AvatarKit.version
// Cleanup resources (must be called when no longer in use)
AvatarKit.cleanup()
```
### AvatarManager
Character resource manager, responsible for downloading, caching, and loading character data.
```typescript
const manager = new AvatarManager()
// Load character
const avatar = await manager.load(
characterId: string,
onProgress?: (progress: LoadProgressInfo) => void
)
// Clear cache
manager.clearCache()
```
### AvatarView
3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
**⚠️ Important Limitation:** Currently, the SDK only supports one AvatarView instance at a time. If you need to switch characters, you must first call the `dispose()` method to clean up the current AvatarView, then create a new instance.
**Playback Mode Configuration:**
- The playback mode is fixed when creating `AvatarView` and persists throughout its lifecycle
- Cannot be changed after creation
```typescript
import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
// Create view (Canvas is automatically added to container)
// Network mode (default)
const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar: Avatar, {
container: container,
playbackMode: AvatarPlaybackMode.network // Optional, default is 'network'
})
// External data mode
const avatarView = new AvatarView(avatar: Avatar, {
container: container,
playbackMode: AvatarPlaybackMode.external
})
// Get playback mode
const mode = avatarView.playbackMode // 'network' | 'external'
// Cleanup resources (must be called before switching characters)
avatarView.dispose()
```
**Character Switching Example:**
```typescript
// Before switching characters, must clean up old AvatarView first
if (currentAvatarView) {
currentAvatarView.dispose()
currentAvatarView = null
}
// Load new character
const newAvatar = await avatarManager.load('new-character-id')
// Create new AvatarView (with same or different playback mode)
currentAvatarView = new AvatarView(newAvatar, {
container: container,
playbackMode: AvatarPlaybackMode.network
})
// Network mode: start connection
if (currentAvatarView.playbackMode === AvatarPlaybackMode.network) {
await currentAvatarView.avatarController.start()
}
```
### AvatarController
Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically composes `NetworkLayer` in network mode.
**Two Usage Patterns:**
#### Network Mode Methods
```typescript
// Start WebSocket service
await avatarView.avatarController.start()
// Send audio data (SDK handles receiving animation data automatically)
avatarView.avatarController.send(audioData: ArrayBuffer, end: boolean)
// audioData: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
// - Sample rate: 16kHz (16000 Hz) - backend requirement
// - Format: PCM16 (16-bit signed integer, little-endian)
// - Channels: Mono (single channel)
// - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes
// end: false (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
// end: true - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
// Close WebSocket service
avatarView.avatarController.close()
```
#### External Data Mode Methods
```typescript
// Start playback with initial audio and animation data
await avatarView.avatarController.play(
initialAudioChunks?: Array<{ data: Uint8Array, isLast: boolean }>, // Initial audio chunks (16kHz mono PCM16)
initialKeyframes?: any[] // Initial animation keyframes (obtained from your service)
)
// Stream additional audio chunks (after play() is called)
avatarView.avatarController.sendAudioChunk(
data: Uint8Array, // Audio chunk data
isLast: boolean = false // Whether this is the last chunk
)
// Stream additional animation keyframes (after play() is called)
avatarView.avatarController.sendKeyframes(
keyframes: any[] // Additional animation keyframes (obtained from your service)
)
```
#### Common Methods (Both Modes)
```typescript
// Interrupt current playback (stops and clears data)
avatarView.avatarController.interrupt()
// Clear all data and resources
avatarView.avatarController.clear()
// Set event callbacks
avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // Network mode only
avatarView.avatarController.onAvatarState = (state: AvatarState) => {}
avatarView.avatarController.onError = (error: Error) => {}
```
**Important Notes:**
- `start()` and `close()` are only available in network mode
- `play()`, `sendAudioChunk()`, and `sendKeyframes()` are only available in external data mode
- `interrupt()` and `clear()` are available in both modes
- The playback mode is determined when creating `AvatarView` and cannot be changed
## 🔧 Configuration
### Configuration
```typescript
interface Configuration {
environment: Environment
}
```
**Description:**
- `environment`: Specifies the environment (cn/us/test), SDK will automatically use the corresponding API address and WebSocket address based on the environment
- `sessionToken`: Set separately via `AvatarKit.setSessionToken()`, not in Configuration
```typescript
enum Environment {
cn = 'cn', // China region
us = 'us', // US region
test = 'test' // Test environment
}
```
### AvatarViewOptions
```typescript
interface AvatarViewOptions {
playbackMode?: AvatarPlaybackMode // Playback mode, default is 'network'
container?: HTMLElement // Canvas container element
}
```
**Description:**
- `playbackMode`: Specifies the playback mode (`'network'` or `'external'`), default is `'network'`
- `'network'`: SDK handles WebSocket communication, send audio via `send()`
- `'external'`: External components provide audio and animation data, SDK handles synchronized playback
- `container`: Optional container element for Canvas, if not provided, Canvas will be created but not added to DOM
```typescript
enum AvatarPlaybackMode {
network = 'network', // Network mode: SDK handles WebSocket communication
external = 'external' // External data mode: External provides data, SDK handles playback
}
```
### CameraConfig
```typescript
interface CameraConfig {
position: [number, number, number] // Camera position
target: [number, number, number] // Camera target
fov: number // Field of view angle
near: number // Near clipping plane
far: number // Far clipping plane
up?: [number, number, number] // Up direction
aspect?: number // Aspect ratio
}
```
## 📊 State Management
### ConnectionState
```typescript
enum ConnectionState {
disconnected = 'disconnected',
connecting = 'connecting',
connected = 'connected',
failed = 'failed'
}
```
### AvatarState
```typescript
enum AvatarState {
idle = 'idle', // Idle state, showing breathing animation
active = 'active', // Active, waiting for playable content
playing = 'playing' // Playing
}
```
## 🎨 Rendering System
The SDK supports two rendering backends:
- **WebGPU** - High-performance rendering for modern browsers
- **WebGL** - Better compatibility traditional rendering
The rendering system automatically selects the best backend, no manual configuration needed.
## 🔍 Debugging and Monitoring
### Logging System
The SDK has a built-in complete logging system, supporting different levels of log output:
```typescript
import { logger } from '@spatialwalk/avatarkit'
// Set log level
logger.setLevel('verbose') // 'basic' | 'verbose'
// Manual log output
logger.log('Info message')
logger.warn('Warning message')
logger.error('Error message')
```
### Performance Monitoring
The SDK provides performance monitoring interfaces to monitor rendering performance:
```typescript
// Get rendering performance statistics
const stats = avatarView.getPerformanceStats()
if (stats) {
console.log(`Render time: ${stats.renderTime.toFixed(2)}ms`)
console.log(`Sort time: ${stats.sortTime.toFixed(2)}ms`)
console.log(`Rendering backend: ${stats.backend}`)
// Calculate frame rate
const fps = 1000 / stats.renderTime
console.log(`Frame rate: ${fps.toFixed(2)} FPS`)
}
// Regular performance monitoring
setInterval(() => {
const stats = avatarView.getPerformanceStats()
if (stats) {
// Send to monitoring service or display on UI
console.log('Performance:', stats)
}
}, 1000)
```
**Performance Statistics Description:**
- `renderTime`: Total rendering time (milliseconds), includes sorting and GPU rendering
- `sortTime`: Sorting time (milliseconds), uses Radix Sort algorithm to depth-sort point cloud
- `backend`: Currently used rendering backend (`'webgpu'` | `'webgl'` | `null`)
## 🚨 Error Handling
### SPAvatarError
The SDK uses custom error types, providing more detailed error information:
```typescript
import { SPAvatarError } from '@spatialwalk/avatarkit'
try {
await avatarView.avatarController.start()
} catch (error) {
if (error instanceof SPAvatarError) {
console.error('SDK Error:', error.message, error.code)
} else {
console.error('Unknown error:', error)
}
}
```
### Error Callbacks
```typescript
avatarView.avatarController.onError = (error: Error) => {
console.error('AvatarController error:', error)
// Handle error, such as reconnection, user notification, etc.
}
```
## 🔄 Resource Management
### Lifecycle Management
#### Network Mode Lifecycle
```typescript
// Initialize
const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar, {
container: container,
playbackMode: AvatarPlaybackMode.network
})
await avatarView.avatarController.start()
// Use
avatarView.avatarController.send(audioData, false)
// Cleanup
avatarView.avatarController.close()
avatarView.dispose() // Automatically cleans up all resources
```
#### External Data Mode Lifecycle
```typescript
// Initialize
const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar, {
container: container,
playbackMode: AvatarPlaybackMode.external
})
// Use
const initialAudioChunks = [{ data: audioData1, isLast: false }]
await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
avatarView.avatarController.sendAudioChunk(audioChunk, false)
avatarView.avatarController.sendKeyframes(keyframes)
// Cleanup
avatarView.avatarController.clear() // Clear all data and resources
avatarView.dispose() // Automatically cleans up all resources
```
**⚠️ Important Notes:**
- SDK currently only supports one AvatarView instance at a time
- When switching characters, must first call `dispose()` to clean up old AvatarView, then create new instance
- Not properly cleaning up may cause resource leaks and rendering errors
- In network mode, call `close()` before `dispose()` to properly close WebSocket connections
- In external data mode, call `clear()` before `dispose()` to clear all playback data
### Memory Optimization
- SDK automatically manages WASM memory allocation
- Supports dynamic loading/unloading of character and animation resources
- Provides memory usage monitoring interface
### Audio Data Sending
#### Network Mode
The `send()` method receives audio data in `ArrayBuffer` format:
**Audio Format Requirements:**
- **Sample Rate**: 16kHz (16000 Hz) - **Backend requirement, must be exactly 16kHz**
- **Format**: PCM16 (16-bit signed integer, little-endian)
- **Channels**: Mono (single channel)
- **Data Size**: Each sample is 2 bytes, so 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
**Usage:**
- `audioData`: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
- `end=false` (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
- `end=true` - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
- **Important**: No need to wait for `end=true` to start playing, it will automatically start playing after accumulating enough audio data
#### External Data Mode
The `play()` method starts playback with initial data, then use `sendAudioChunk()` to stream additional audio:
**Audio Format Requirements:**
- Same as network mode: 16kHz mono PCM16 format
- Audio data should be provided as `Uint8Array` in chunks with `isLast` flag
**Usage:**
```typescript
// Start playback with initial audio and animation data
// Note: Audio and animation data should be obtained from your backend service
const initialAudioChunks = [
{ data: audioData1, isLast: false },
{ data: audioData2, isLast: false }
]
await avatarController.play(initialAudioChunks, initialKeyframes)
// Stream additional audio chunks
avatarController.sendAudioChunk(audioChunk, isLast)
```
**Resampling (Both Modes):**
- If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you **must** resample it to 16kHz before sending
- For high-quality resampling, use Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
- See example projects (`vanilla`, `react`, `vue`) for complete resampling implementation
## 🌐 Browser Compatibility
- **Chrome/Edge** 90+ (WebGPU recommended)
- **Firefox** 90+ (WebGL)
- **Safari** 14+ (WebGL)
- **Mobile** iOS 14+, Android 8+
## 📝 License
MIT License
## 🤝 Contributing
Issues and Pull Requests are welcome!
## 📞 Support
For questions, please contact:
- Email: support@spavatar.com
- Documentation: https://docs.spavatar.com
- GitHub: https://github.com/spavatar/sdk