web-vad

Version:

Web Voice Activity Detection (VAD)

github.com/jptaylor/web-vad

jptaylor/web-vad

119 lines (87 loc) • 3.35 kB

Markdown

# Web Voice Activity Detection (VAD) Adaption of [@ricky0123's vad library](https://github.com/ricky0123/vad) that slightly shifts the API to only support passing a media stream, addresses some Typescript issues and reduces the codebase where possible. The primary purpose of this adaption is to support realtime voice agents, such as those provided by [Pipecat](https://www.pipecat.ai). ## Getting started `npm install onnxruntime-web web-vad` #### Copy Silero model somewhere accessible Ensure `silero_vad.onnx` (included in this repo [here](https://github.com/jptaylor/web-vad/blob/main/silero_vad.onnx)) is hosted somewhere accessible (e.g. a public / static path.) #### Ensure audio worker is available globally Browsers ensure worklets cannot be imported as modules for safety reasons. Either import it with your framework specific syntax (e.g. `import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";`) or include it manually in a `<script>` declaration (at a higher order.) #### Example project An barebones example is included in this repo: ```shell cd test-site yarn yarn run build # Copies onnx wasm to dist directory yarn run dev ``` Navigate to the URL shown in your terminal ## Usage ```typescript import { VAD } from "web-vad"; import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url"; const localAudioTrack = ... // Get mic or other audio track const stream = new MediaStream([localAudioTrack!]); const vad = new VAD({ workletURL: AudioWorkletURL, modelUrl: "path-to-silero.onnx", stream, onSpeechStart: () => { console.log("speaking start"); }, onVADMisfire: () => { console.log("misfire"); }, onSpeechEnd: () => { console.log("speaking end"); }, }); // Initalize and load models await vad.init(); // Start when ready vad.start(); console.log(vad.state); // > VADState.listening ``` ## Next / Vite support Web VAD uses WASM files provided by ONNX. Whilst these can be loaded at runtime, it is recommended to copy these files to your build / deployment. Here is an example `vite.config.js` that copies these files across at build time: ```js // vite.config.js export default defineConfig({ assetsInclude: ["**/*.onnx"], server: { headers: { "Cross-Origin-Embedder-Policy": "require-corp", "Cross-Origin-Opener-Policy": "same-origin", }, }, resolve: { alias: { "@": path.resolve(__dirname, "./src"), }, }, plugins: [ viteStaticCopy({ targets: [ { src: "node_modules/onnxruntime-web/dist/*.wasm", dest: "./", }, ], }), ], }); ``` ## Precaching models Both the Silero.onnx and ONNX runtime wasms are quite large in size (~10mb). The VAD class exposes a static method for precaching these: ```typescript import {VAD} from "web-vad"; async function run() { console.log("Precaching models"); await VAD.precacheModels("/silero-vad.onnx"); console.log("Download complete!"); //...start() } ``` ## References [1] Silero Team. (2021). Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. GitHub, GitHub repository, https://github.com/snakers4/silero-vad, hello@silero.ai. [2] Ricky Samore. Original code, https://github.com/ricky0123/vad, rickycontact9@gmail.com