sber-salute-speech-recognition
Version:
A library that produces audio transcriptions using the SBER Salute Speech service.
116 lines (87 loc) • 3.28 kB
Markdown
A library for getting audio transcriptions from the SBER Salute Speech service https://developers.sber.ru/docs/ru/salutespeech/recognition/recognition-sync.
1. Create an account on https://developers.sber.ru/portal/products/smartspeech
2. Generate a client secret
3. Capture the auth key
```bash
npm install sber-salute-speech-recognition
```
```ts
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
);
```
The `scope` property is set during the instantiation of the `SberSaluteSpeechRecognitionService` class.
If no value is provided, it defaults to `Scope.Personal`.
Here is an example of how to use the `scope` property:
```typescript
import {
SberSaluteSpeechRecognitionService,
Scope,
} from 'sber-salute-speech-recognition';
const service = new SberSaluteSpeechRecognitionService(
AUTH_KEY,
undefined,
Scope.Corporate
);
```
In this example, the `scope` property is set to `Scope.Corporate` what equals `SALUTE_SPEECH_CORP`.
If you want to use the `SALUTE_SPEECH_PERS` scope,
you can either pass `Scope.Personal` as the third argument or omit the third argument entirely,
as it defaults to `Scope.Personal`.
For improve speech recognition you can pass in `speechToText` method hints param
Here is example of how to use the `hints` param:
```typescript
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
{
words: ['card', 'name'],
enable_letters: true,
eou_timeout: "2s"
}
);
```
In this example , we pass object with props `words`, `enable_letters`, `eou_timeout`.
`words` - A list of words or phrases whose recognition we want to strengthen.
Here you can list the words that the user is likely to pronounce
`enable_letters` - A short phrase model that improves recognition of single letters and short words.
Possible values: `true` and `false`
`eou_timeout` - Setting up recognition of the end of a phrase (End of Utterance - eou).
Such recognition will be expected after the end of the phrase for as many seconds as set in this parameter.
Possible values are from 0.5 to 5 seconds.
# Channels count
The channels_count allows to override the number of channels to recognize in the audio file. If not specified, the number of channels will be determined automatically.
Possible values:
undefined - channels count will be determined automatically
1 - mono
2 - stereo
```ts
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
1
);
```
In this example we set channels count to 1.