UNPKG

@solyarisoftware/voskjs

Version:

NodeJs developers API for Vosk-api speech-to-text engine.

403 lines (314 loc) 13.5 kB
# `voskjshttp` and other server examples - [`voskjshttp.js` demo speech-to-text HTTP server](#voskjshttpjs-demo-speech-to-text-http-server) - [`voskjshttp` as RHASSPY speech-to-text remote HTTP Server](#voskjshttp-as-rhasspy-speech-to-text-remote-http-server) - [SocketIO server pseudocode](#socketio-server-pseudocode) ## `voskjshttp.js` demo speech-to-text HTTP server [`voskjshttp.js`](voskjshttp.js) is a very simple HTTP API server able to process concurrent/multi-user transcript requests, using a specific language model. A dedicated thread is spawned for each transcript processing request, so latency performance will be optimal if your host has multiple cores. Currently the server support just a single endpoint: - HTTP GET /transcript - HTTP POST /transcript Server settings: ```bash cd examples && node voskjshttp.js ``` or, if you installed this package as global: ```bash voskjshttp ``` ``` Simple demo HTTP JSON server, loading a Vosk engine model to transcript speeches. package @solyarisoftware/voskjs version 1.1.3, Vosk-api version 0.3.30 The server has two endpoints: HTTP GET /transcript The request query string arguments contain parameters, including a WAV file name already accessible by the server. HTTP POST /transcript The request query string arguments contain parameters, the request body contains the WAV file name to be submitted to the server. Usage: voskjshttp --model=<model directory path> \ [--port=<server port number. Default: 3000>] \ [--path=<server endpoint path. Default: /transcript>] \ [--no-threads] [--debug[=<vosk log level>]] Server settings examples: voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug=2 # stdout includes the server internal debug logs and Vosk debug logs (log level 2) voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug # stdout includes the server internal debug logs without Vosk debug logs (log level -1) voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 # stdout includes minimal info, just request and response messages voskjshttp --model=../models/vosk-model-small-en-us-0.15 # stdout includes minimal info, default port number is 3000 Client requests examples: 1. GET /transcript - query string includes just the speech file argument curl -s \ -X GET \ -H "Accept: application/json" \ -G \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ http://localhost:3000/transcript 2. GET /transcript - query string includes arguments: speech, model curl -s \ -X GET \ -H "Accept: application/json" \ -G \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ --data-urlencode model="vosk-model-en-us-aspire-0.2" \ http://localhost:3000/transcript 3. GET /transcript - query string includes arguments: id, speech, model curl -s \ -X GET \ -H "Accept: application/json" \ -G \ --data-urlencode id="1620060067830" \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ --data-urlencode model="vosk-model-en-us-aspire-0.2" \ http://localhost:3000/transcript 4. GET /transcript - includes arguments: id, speech, model, grammar curl -s \ -X GET \ -H "Accept: application/json" \ -G \ --data-urlencode id="1620060067830" \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ --data-urlencode model="vosk-model-en-us-aspire-0.2" \ --data-urlencode grammar="["experience proves this"]" \ http://localhost:3000/transcript 5. POST /transcript - body includes the speech file curl -s \ -X POST \ -H "Accept: application/json" \ -H "Content-Type: audio/wav" \ --data-binary="@../audio/2830-3980-0043.wav" \ "http://localhost:3000/transcript?id=1620060067830&model=vosk-model-en-us-aspire-0.2" ``` Server run example: ```bash voskjshttp --model=../models/vosk-model-small-en-us-0.15 ``` Client call example: ```bash curl \ -s \ -H "Accept: application/json" \ -G \ --data-urlencode id="283039800043" \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ --data-urlencode model="vosk-model-small-en-us-0.15" \ http://localhost:3000/transcript \ | python3 -m json.tool ``` The JSON returned by the transcript endpoint: ``` { "id": "283039800043", "latency": 575, "vosk": { "result": [ { "conf": 1, "end": 1.02, "start": 0.36, "word": "experience" }, { "conf": 1, "end": 1.35, "start": 1.02, "word": "proves" }, { "conf": 1, "end": 1.74, "start": 1.35, "word": "this" } ], "text": "experience proves this" } } ``` Server side stdout: ``` 1621335095393 Model path: ../models/vosk-model-small-en-us-0.15 1621335095395 Model name: vosk-model-small-en-us-0.15 1621335095395 HTTP server port: 3000 1621335095395 internal debug log: false 1621335095395 Vosk log level: -1 1621335095395 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient) 1621335095710 Vosk model loaded in 314 msecs 1621335095712 server voskjshttp.js running at http://localhost:3000 1621335095712 endpoint http://localhost:3000/transcript 1621335095712 press Ctrl-C to shutdown 1621335095713 ready to listen incoming requests 1621335101648 request 283039800043 ../audio/2830-3980-0043.wav vosk-model-small-en-us-0.15 undefined 1621335102223 response 283039800043 {"id":"283039800043","latency":574,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"} ^[^C1621335336951 SIGINT received 1621335337010 Shutdown done ``` ### client request query string arguments - `speech` The request quesry string argument is mandatory. It specifies the speech WAV file for the server speech-to-text transcription - `model` The argument is optional. If specified, the server verifies if it matches with the model name of the server-side loaded model If the argument is not specified, the server doesn't make any control, just using the loaded model In this case the client call is just: ```bash curl \ -H "Accept: application/json" \ -G \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ http://localhost:3000/transcript ``` The HTTP server corresponding log is: ```bash node voskjshttp --model=../models/vosk-model-small-en-us-0.15 ``` ``` 1620312429756 Model path: ../models/vosk-model-small-en-us-0.15 1620312429758 Model name: vosk-model-small-en-us-0.15 1620312429758 HTTP server port: 3000 1620312429758 internal debug log: false 1620312429758 Vosk log level: -1 1620312429758 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient) 1620312430058 Vosk model loaded in 300 msecs 1620312430060 server voskjshttp.js running at http://localhost:3000 1620312430060 endpoint http://localhost:3000/transcript 1620312430060 press Ctrl-C to shutdown 1620312430060 ready to listen incoming requests 1620312435318 request {"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]} 1620312435941 response 1620312435283 {"request":{"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]},"id":1620312435283,"latency":623,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"} ``` ### Server JSON response The HTTP response returns a JSON data structure containing: - `speech` the name of the speech file in the request - `model` the name of the model (the language) in the request - `id` an "UUID" that's the unix epoch timestamp that identify the incoming request and it could be used for debug. - `latency` the elapsed time, in milliseconds, required to elaborate the request - `result` the data structure returned by Vosk transcript function. ### HTTP Server Tests [`tests/`](../tests/) directory contains some utility bash scripts (client*.sh) to test the server endpoint with GEt and POST methods. ## `voskjshttp` as RHASSPY speech-to-text remote HTTP Server [RHASSPY](https://rhasspy.readthedocs.io/en/latest/) is an open source, fully offline set of voice assistant services. RHASSPY uses, as option, a [Remote HTTP Server ](https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server) to transform speech (WAV) to text. This is typically used in a client/server set up, where Rhasspy does speech/intent recognition on a home server with decent CPU/RAM available. You can run voskjshttp as RHASSPY speech-to-text remote HTTP Server Following these specifications: - https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server - https://rhasspy.readthedocs.io/en/latest/usage/#http-api - https://rhasspy.readthedocs.io/en/latest/reference/#http-api 1. Install the server Install on your home server, as described [here](../README.md#-install): - Vosk - `npm install -g @solyarisoftware/voskjs` - A Vosk language model of your choice 2. Run the server Warning: currently, because a bug in the Node-C++ interface of Vosk-API lib, multithreading causes a crash: https://github.com/solyarisoftware/voskJs/issues/3 Two temporary alternative workarounds proposed: - Vosk Multithreading **enabled** Use a Node version previous to v. 14. See: https://github.com/alphacep/vosk-api/issues/516#issuecomment-833462121 ``` voskjshttp \ --model=models/vosk-model-small-en-us-0.15 \ --path=/api/speech-to-text \ --port=12101 ``` - Vosk Multithreading **disabled** Use any Node version successive v.13 but disable multithreading in `voskjshttp`, with a command line flag `--no-threads`. This option seems to be a nonsense, because in this way the server just serve one request a time (that will saturate a CPU core for hundreds of milliseconds, also blocking the Node main thread). Nevertheless the lack of multithreading could be acceptable to serve few satellites (clients) in a small (home) environment. ``` voskjshttp \ --model=models/vosk-model-small-en-us-0.15 \ --path=/api/speech-to-text \ --port=12101 \ --no-threads ``` 3. Curl client tests Two bash scripts are available in the tests/ directory: - [`clientRHASSPYtext.sh`](../tests/clientRHASSPYtext.sh) get a text/plain response from the server ``` clientRHASSPYtext.sh ``` ``` experience proves this ``` - [`clientRHASSPYjson.sh`](../tests/clientRHASSPYjson.sh) get an application/json response from the server ``` clientRHASSPYjson.sh ``` ``` { "id": 1622012841793, "latency": 570, "vosk": { "result": [ { "conf": 1, "end": 1.02, "start": 0.36, "word": "experience" }, { "conf": 1, "end": 1.35, "start": 1.02, "word": "proves" }, { "conf": 1, "end": 1.74, "start": 1.35, "word": "this" } ], "text": "experience proves this" } } ``` ## SocketIO server pseudocode HTTP server is not the only way to go! Consider by example a client-server architecture using [socketio](https://socket.io/) websocket-based real-time bidirectional event-based communication library. Here below a simplified server-side pseudo-code taht shows how to use voskJs transcript: ```javascript const {transcript, toPCM } = require('voskjs') const app = require('express')() // get SSL certificate const credentials = { key: fs.readFileSync(KEY_FILENAME, 'utf8'), cert: fs.readFileSync(CERT_FILENAME, 'utf8') } // create the https server const server = https.createServer(credentials, app) // create the socketio channel const io = require('socket.io')(server) // a websocket message arrived io.on('connection', (socket) => { // the client sent an audio buffer socket.on('audioMessage', msg => { // save audio buffer into a local file, giving a unique name const audioFileCompressed = filenameUUID() await msgToAudioFile(audioFileCompressed, msg) // convert the received audio into a PCM buffer const buffer = toPCM(audioFileCompressed) // voskjs speech to text const voskResult = await transcriptFromBuffer(buffer, model) }) }) ``` --- [top](#) | [back](README.md) | [home](../README.md)