web-vad
Version:
Web Voice Activity Detection (VAD)
119 lines (87 loc) • 3.35 kB
Markdown
# Web Voice Activity Detection (VAD)
Adaption of ['s vad library](https://github.com/ricky0123/vad) that slightly shifts the API to only support passing a media stream, addresses some Typescript issues and reduces the codebase where possible. The primary purpose of this adaption is to support realtime voice agents, such as those provided by [Pipecat](https://www.pipecat.ai).
## Getting started
`npm install onnxruntime-web web-vad`
#### Copy Silero model somewhere accessible
Ensure `silero_vad.onnx` (included in this repo [here](https://github.com/jptaylor/web-vad/blob/main/silero_vad.onnx)) is hosted somewhere accessible (e.g. a public / static path.)
#### Ensure audio worker is available globally
Browsers ensure worklets cannot be imported as modules for safety reasons. Either import it with your framework specific syntax (e.g. `import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";`) or include it manually in a `<script>` declaration (at a higher order.)
#### Example project
An barebones example is included in this repo:
```shell
cd test-site
yarn
yarn run build # Copies onnx wasm to dist directory
yarn run dev
```
Navigate to the URL shown in your terminal
## Usage
```typescript
import { VAD } from "web-vad";
import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";
const localAudioTrack = ... // Get mic or other audio track
const stream = new MediaStream([localAudioTrack!]);
const vad = new VAD({
workletURL: AudioWorkletURL,
modelUrl: "path-to-silero.onnx",
stream,
onSpeechStart: () => {
console.log("speaking start");
},
onVADMisfire: () => {
console.log("misfire");
},
onSpeechEnd: () => {
console.log("speaking end");
},
});
// Initalize and load models
await vad.init();
// Start when ready
vad.start();
console.log(vad.state);
// > VADState.listening
```
## Next / Vite support
Web VAD uses WASM files provided by ONNX. Whilst these can be loaded at runtime, it is recommended to copy these files to your build / deployment. Here is an example `vite.config.js` that copies these files across at build time:
```js
// vite.config.js
export default defineConfig({
assetsInclude: ["**/*.onnx"],
server: {
headers: {
"Cross-Origin-Embedder-Policy": "require-corp",
"Cross-Origin-Opener-Policy": "same-origin",
},
},
resolve: {
alias: {
"@": path.resolve(__dirname, "./src"),
},
},
plugins: [
viteStaticCopy({
targets: [
{
src: "node_modules/onnxruntime-web/dist/*.wasm",
dest: "./",
},
],
}),
],
});
```
## Precaching models
Both the Silero.onnx and ONNX runtime wasms are quite large in size (~10mb). The VAD class exposes a static method for precaching these:
```typescript
import {VAD} from "web-vad";
async function run() {
console.log("Precaching models");
await VAD.precacheModels("/silero-vad.onnx");
console.log("Download complete!");
//...start()
}
```
## References
[1] Silero Team. (2021). Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. GitHub, GitHub repository, https://github.com/snakers4/silero-vad, hello@silero.ai.
[2] Ricky Samore. Original code, https://github.com/ricky0123/vad, rickycontact9@gmail.com