@lichess-org/vosk-browser
Version:
Fork of ccoreilly's vosk-browser v0.0.8
222 lines (166 loc) • 6.14 kB
Markdown
A somewhat opinionated speech recognition library for the browser using a WebAssembly build of [Vosk](https://github.com/alphacep/vosk-api)
Checkout the demo running in-browser speech recognition of microphone input or audio files in 13 languages.
[](https://ccoreilly.github.io/vosk-browser/)
You can install vosk-browser as a module:
```
$ npm i vosk-browser
```
You can also use a CDN like jsdelivr to add the library to your page, which will be accessible via the global variable `Vosk`:
```
<script type="application/javascript" src="https://cdn.jsdelivr.net/npm/vosk-browser@0.0.3/dist/vosk.js"></script>
```
One of the simplest examples that assumes `vosk-browser` is loaded via a `script` tag. It loads the model named `model.tar.gz`located in the same path as the script and starts listening to the microphone. Recognition results are logged to the console.
```typescript
async function init() {
const model = await Vosk.createModel('model.tar.gz');
const recognizer = new model.KaldiRecognizer();
recognizer.on("result", (message) => {
console.log(`Result: ${message.result.text}`);
});
recognizer.on("partialresult", (message) => {
console.log(`Partial result: ${message.result.partial}`);
});
const mediaStream = await navigator.mediaDevices.getUserMedia({
video: false,
audio: {
echoCancellation: true,
noiseSuppression: true,
channelCount: 1,
sampleRate: 16000
},
});
const audioContext = new AudioContext();
const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
recognizerNode.onaudioprocess = (event) => {
try {
recognizer.acceptWaveform(event.inputBuffer)
} catch (error) {
console.error('acceptWaveform failed', error)
}
}
const source = audioContext.createMediaStreamSource(mediaStream);
source.connect(recognizerNode);
}
window.onload = init;
```
The library loads models as gzipped tar archives of a model folder following the model structure expected by Vosk.
- model/am/final.mdl - acoustic model
- model/conf/mfcc.conf - mfcc config file. Make sure you take mfcc_hires.conf version if you are using hires model (most external ones)
- model/conf/model.conf - provide default decoding beams and silence phones. you have to create this file yourself, it is not present in kaldi model
- model/ivector/final.dubm - take ivector files from ivector extractor (optional folder if the model is trained with ivectors)
- model/ivector/final.ie
- model/ivector/final.mat
- model/ivector/splice.conf
- model/ivector/global_cmvn.stats
- model/ivector/online_cmvn.conf
- model/graph/phones/word_boundary.int - from the graph
- model/graph/HCLG.fst - this is the decoding graph, if you are not using lookahead
- model/graph/HCLr.fst - use Gr.fst and HCLr.fst instead of one big HCLG.fst if you want to run rescoring
- model/graph/Gr.fst
- model/graph/phones.txt - from the graph
- model/graph/words.txt - from the graph
- model/rescore/G.carpa - carpa rescoring is optional but helpful in big models. Usually located inside data/lang_test_rescore
- model/rescore/G.fst - also optional if you want to use rescoring
# Reference
## Model
```typescript
new Model(modelUrl: string): Model
```
The Model constructor. Creates a new Model that will spawn a new Web Worker in the background. `modelUrl` can be a relative or absolute URL but be aware that CORS also applies to Workers. As Model initialization can take a while depending on the model size, you should either wait for the `load` event or use the `createModel` function.
### Model#ready
```typescript
ready: boolean;
```
A property that specifies if the model is loaded and ready.
```typescript
new model.KaldiRecognizer(): KaldiRecognizer
```
Creates a new KaldiRecognizer. Multiple recognizers can be created from the same model
```typescript
setLogLevel(level: number): void
```
Sets the log level:
-2: Error
-1: Warning
0: Info
1: Verbose
2: More verbose
3: Debug
### Model#terminate
```typescript
terminate(): void
```
Will terminate the Web Worker as well as free all allocated memory. Make sure to call this method when you are done
using a model in order to avoid memory leaks. The method will also remove all existing KaldiRecognizer instances.
### Events
A Model will also emit some useful events:
```typescript
.on('load', function(message: {
event: "load";
result: boolean;
}) {
// If the model load was successful or not
})
.on('error', function(message: {
event: "error";
error: string;
}) {
// An error occured
})
```
```typescript
id: string;
```
A unique ID for this KaldiRecognizer instance, passed in every event as `recognizerId`
```typescript
acceptWaveform(buffer: AudioBuffer): void;
```
Preprocesses AudioBuffers and transfers them to the Web Worker for inference.
```typescript
setWords(words: boolean): void;
```
Return word timestamps in the recognition message
```typescript
remove(): void
```
Will remove a recognizer from the model and free all allocated memory. Make sure to call this method when you are done using a recognizer in order to avoid memory leaks.
### Events
In order to obtain recognition results you should listen to the events of the KaldiRecognizer:
```js
.on('result', function(message: {
event: "result";
recognizerId: string;
result: {
result: Array<{
conf: number;
start: number;
end: number;
word: string;
}>;
text: string;
};
}) {
// A final recognition result
})
.on('partialresult', function(message: {
event: "partialresult";
recognizerId: string;
result: {
partial: string;
};
}) {
// A partial recognition result
})
```