@solyarisoftware/voskjs
Version:
NodeJs developers API for Vosk-api speech-to-text engine.
403 lines (314 loc) • 13.5 kB
Markdown
# `voskjshttp` and other server examples
- [`voskjshttp.js` demo speech-to-text HTTP server](#voskjshttpjs-demo-speech-to-text-http-server)
- [`voskjshttp` as RHASSPY speech-to-text remote HTTP Server](#voskjshttp-as-rhasspy-speech-to-text-remote-http-server)
- [SocketIO server pseudocode](#socketio-server-pseudocode)
## `voskjshttp.js` demo speech-to-text HTTP server
[`voskjshttp.js`](voskjshttp.js) is a very simple HTTP API server
able to process concurrent/multi-user transcript requests, using a specific language model.
A dedicated thread is spawned for each transcript processing request,
so latency performance will be optimal if your host has multiple cores.
Currently the server support just a single endpoint:
- HTTP GET /transcript
- HTTP POST /transcript
Server settings:
```bash
cd examples && node voskjshttp.js
```
or, if you installed this package as global:
```bash
voskjshttp
```
```
Simple demo HTTP JSON server, loading a Vosk engine model to transcript speeches.
package /voskjs version 1.1.3, Vosk-api version 0.3.30
The server has two endpoints:
HTTP GET /transcript
The request query string arguments contain parameters,
including a WAV file name already accessible by the server.
HTTP POST /transcript
The request query string arguments contain parameters,
the request body contains the WAV file name to be submitted to the server.
Usage:
voskjshttp --model=<model directory path> \
[--port=<server port number. Default: 3000>] \
[--path=<server endpoint path. Default: /transcript>] \
[--no-threads]
[--debug[=<vosk log level>]]
Server settings examples:
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug=2
# stdout includes the server internal debug logs and Vosk debug logs (log level 2)
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug
# stdout includes the server internal debug logs without Vosk debug logs (log level -1)
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086
# stdout includes minimal info, just request and response messages
voskjshttp --model=../models/vosk-model-small-en-us-0.15
# stdout includes minimal info, default port number is 3000
Client requests examples:
1. GET /transcript - query string includes just the speech file argument
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
http://localhost:3000/transcript
2. GET /transcript - query string includes arguments: speech, model
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
http://localhost:3000/transcript
3. GET /transcript - query string includes arguments: id, speech, model
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode id="1620060067830" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
http://localhost:3000/transcript
4. GET /transcript - includes arguments: id, speech, model, grammar
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode id="1620060067830" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
--data-urlencode grammar="["experience proves this"]" \
http://localhost:3000/transcript
5. POST /transcript - body includes the speech file
curl -s \
-X POST \
-H "Accept: application/json" \
-H "Content-Type: audio/wav" \
--data-binary="@../audio/2830-3980-0043.wav" \
"http://localhost:3000/transcript?id=1620060067830&model=vosk-model-en-us-aspire-0.2"
```
Server run example:
```bash
voskjshttp --model=../models/vosk-model-small-en-us-0.15
```
Client call example:
```bash
curl \
-s \
-H "Accept: application/json" \
-G \
--data-urlencode id="283039800043" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-small-en-us-0.15" \
http://localhost:3000/transcript \
| python3 -m json.tool
```
The JSON returned by the transcript endpoint:
```
{
"id": "283039800043",
"latency": 575,
"vosk": {
"result": [
{
"conf": 1,
"end": 1.02,
"start": 0.36,
"word": "experience"
},
{
"conf": 1,
"end": 1.35,
"start": 1.02,
"word": "proves"
},
{
"conf": 1,
"end": 1.74,
"start": 1.35,
"word": "this"
}
],
"text": "experience proves this"
}
}
```
Server side stdout:
```
1621335095393 Model path: ../models/vosk-model-small-en-us-0.15
1621335095395 Model name: vosk-model-small-en-us-0.15
1621335095395 HTTP server port: 3000
1621335095395 internal debug log: false
1621335095395 Vosk log level: -1
1621335095395 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient)
1621335095710 Vosk model loaded in 314 msecs
1621335095712 server voskjshttp.js running at http://localhost:3000
1621335095712 endpoint http://localhost:3000/transcript
1621335095712 press Ctrl-C to shutdown
1621335095713 ready to listen incoming requests
1621335101648 request 283039800043 ../audio/2830-3980-0043.wav vosk-model-small-en-us-0.15 undefined
1621335102223 response 283039800043 {"id":"283039800043","latency":574,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}
^[^C1621335336951 SIGINT received
1621335337010 Shutdown done
```
### client request query string arguments
- `speech`
The request quesry string argument is mandatory.
It specifies the speech WAV file for the server speech-to-text transcription
- `model`
The argument is optional.
If specified, the server verifies if it matches with the model name of the server-side loaded model
If the argument is not specified, the server doesn't make any control, just using the loaded model
In this case the client call is just:
```bash
curl \
-H "Accept: application/json" \
-G \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
http://localhost:3000/transcript
```
The HTTP server corresponding log is:
```bash
node voskjshttp --model=../models/vosk-model-small-en-us-0.15
```
```
1620312429756 Model path: ../models/vosk-model-small-en-us-0.15
1620312429758 Model name: vosk-model-small-en-us-0.15
1620312429758 HTTP server port: 3000
1620312429758 internal debug log: false
1620312429758 Vosk log level: -1
1620312429758 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient)
1620312430058 Vosk model loaded in 300 msecs
1620312430060 server voskjshttp.js running at http://localhost:3000
1620312430060 endpoint http://localhost:3000/transcript
1620312430060 press Ctrl-C to shutdown
1620312430060 ready to listen incoming requests
1620312435318 request {"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]}
1620312435941 response 1620312435283 {"request":{"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]},"id":1620312435283,"latency":623,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}
```
### Server JSON response
The HTTP response returns a JSON data structure containing:
- `speech` the name of the speech file in the request
- `model` the name of the model (the language) in the request
- `id` an "UUID" that's the unix epoch timestamp
that identify the incoming request and it could be used for debug.
- `latency` the elapsed time, in milliseconds, required to elaborate the request
- `result` the data structure returned by Vosk transcript function.
### HTTP Server Tests
[`tests/`](../tests/) directory contains some utility bash scripts (client*.sh) to test the server endpoint with GEt and POST methods.
## `voskjshttp` as RHASSPY speech-to-text remote HTTP Server
[RHASSPY](https://rhasspy.readthedocs.io/en/latest/) is an open source,
fully offline set of voice assistant services.
RHASSPY uses, as option, a [Remote HTTP Server ](https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server)
to transform speech (WAV) to text. This is typically used in a client/server set up,
where Rhasspy does speech/intent recognition on a home server with decent CPU/RAM available.
You can run voskjshttp as RHASSPY speech-to-text remote HTTP Server
Following these specifications:
- https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server
- https://rhasspy.readthedocs.io/en/latest/usage/#http-api
- https://rhasspy.readthedocs.io/en/latest/reference/#http-api
1. Install the server
Install on your home server, as described [here](../README.md#-install):
- Vosk
- `npm install -g /voskjs`
- A Vosk language model of your choice
2. Run the server
Warning:
currently, because a bug in the Node-C++ interface of Vosk-API lib, multithreading causes a crash: https://github.com/solyarisoftware/voskJs/issues/3
Two temporary alternative workarounds proposed:
- Vosk Multithreading **enabled**
Use a Node version previous to v. 14.
See: https://github.com/alphacep/vosk-api/issues/516#issuecomment-833462121
```
voskjshttp \
--model=models/vosk-model-small-en-us-0.15 \
--path=/api/speech-to-text \
--port=12101
```
- Vosk Multithreading **disabled**
Use any Node version successive v.13 but disable multithreading in `voskjshttp`,
with a command line flag `--no-threads`.
This option seems to be a nonsense, because in this way the server just serve one request a time
(that will saturate a CPU core for hundreds of milliseconds, also blocking the Node main thread).
Nevertheless the lack of multithreading could be acceptable to serve few satellites (clients) in a small (home) environment.
```
voskjshttp \
--model=models/vosk-model-small-en-us-0.15 \
--path=/api/speech-to-text \
--port=12101 \
--no-threads
```
3. Curl client tests
Two bash scripts are available in the tests/ directory:
- [`clientRHASSPYtext.sh`](../tests/clientRHASSPYtext.sh) get a text/plain response from the server
```
clientRHASSPYtext.sh
```
```
experience proves this
```
- [`clientRHASSPYjson.sh`](../tests/clientRHASSPYjson.sh) get an application/json response from the server
```
clientRHASSPYjson.sh
```
```
{
"id": 1622012841793,
"latency": 570,
"vosk": {
"result": [
{
"conf": 1,
"end": 1.02,
"start": 0.36,
"word": "experience"
},
{
"conf": 1,
"end": 1.35,
"start": 1.02,
"word": "proves"
},
{
"conf": 1,
"end": 1.74,
"start": 1.35,
"word": "this"
}
],
"text": "experience proves this"
}
}
```
## SocketIO server pseudocode
HTTP server is not the only way to go!
Consider by example a client-server architecture using [socketio](https://socket.io/)
websocket-based real-time bidirectional event-based communication library.
Here below a simplified server-side pseudo-code taht shows how to use voskJs transcript:
```javascript
const {transcript, toPCM } = require('voskjs')
const app = require('express')()
// get SSL certificate
const credentials = {
key: fs.readFileSync(KEY_FILENAME, 'utf8'),
cert: fs.readFileSync(CERT_FILENAME, 'utf8')
}
// create the https server
const server = https.createServer(credentials, app)
// create the socketio channel
const io = require('socket.io')(server)
// a websocket message arrived
io.on('connection', (socket) => {
// the client sent an audio buffer
socket.on('audioMessage', msg => {
// save audio buffer into a local file, giving a unique name
const audioFileCompressed = filenameUUID()
await msgToAudioFile(audioFileCompressed, msg)
// convert the received audio into a PCM buffer
const buffer = toPCM(audioFileCompressed)
// voskjs speech to text
const voskResult = await transcriptFromBuffer(buffer, model)
})
})
```
---
[top](#) | [back](README.md) | [home](../README.md)