whisper.rn
Version:
React Native binding of whisper.cpp
335 lines (241 loc) • 11.8 kB
Markdown
# whisper.rn
[](https://github.com/mybigday/whisper.rn/actions)
[](https://opensource.org/licenses/MIT)
[](https://www.npmjs.com/package/whisper.rn/)
React Native binding of [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
[whisper.cpp](https://github.com/ggerganov/whisper.cpp): High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model
## Screenshots
| <img src="https://github.com/mybigday/whisper.rn/assets/3001525/2fea7b2d-c911-44fb-9afc-8efc7b594446" width="300" /> | <img src="https://github.com/mybigday/whisper.rn/assets/3001525/a5005a6c-44f7-4db9-95e8-0fd951a2e147" width="300" /> |
| :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: |
| iOS: Tested on iPhone 13 Pro Max | Android: Tested on Pixel 6 |
| (tiny.en, Core ML enabled, release mode + archive) | (tiny.en, armv8.2-a+fp16, release mode) |
## Installation
```sh
npm install whisper.rn
```
#### iOS
Please re-run `npx pod-install` again.
By default, `whisper.rn` will use pre-built `rnwhisper.xcframework` for iOS. If you want to build from source, please set `RNWHISPER_BUILD_FROM_SOURCE` to `1` in your Podfile.
If you want to use `medium` or `large` model, the [Extended Virtual Addressing](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_extended-virtual-addressing) capability is recommended to enable on iOS project.
#### Android
Add proguard rule if it's enabled in project (android/app/proguard-rules.pro):
```proguard
# whisper.rn
-keep class com.rnwhisper.** { *; }
```
It's recommended to use `ndkVersion = "24.0.8215888"` (or above) in your root project build configuration for Apple Silicon Macs. Otherwise please follow this trobleshooting [issue](./TROUBLESHOOTING.md#android-got-build-error-unknown-host-cpu-architecture-arm64-on-apple-silicon-macs).
#### Expo
You will need to prebuild the project before using it. See [Expo guide](https://docs.expo.io/guides/using-libraries/#using-a-library-in-a-expo-project) for more details.
## Tips & Tricks
The [Tips & Tricks](docs/TIPS.md) document is a collection of tips and tricks for using `whisper.rn`.
## Usage
```js
import { initWhisper } from 'whisper.rn'
const whisperContext = await initWhisper({
filePath: 'file://.../ggml-tiny.en.bin',
})
const sampleFilePath = 'file://.../sample.wav'
const options = { language: 'en' }
const { stop, promise } = whisperContext.transcribe(sampleFilePath, options)
const { result } = await promise
// result: (The inference text result from audio file)
```
## Voice Activity Detection (VAD)
Voice Activity Detection allows you to detect speech segments in audio data using the Silero VAD model.
#### Initialize VAD Context
```typescript
import { initWhisperVad } from 'whisper.rn'
const vadContext = await initWhisperVad({
filePath: require('./assets/ggml-silero-v6.2.0.bin'), // VAD model file
useGpu: true, // Use GPU acceleration (iOS only)
nThreads: 4, // Number of threads for processing
})
```
#### Detect Speech Segments
##### From Audio Files
```typescript
// Detect speech in audio file (supports same formats as transcribe)
const segments = await vadContext.detectSpeech(require('./assets/audio.wav'), {
threshold: 0.5, // Speech probability threshold (0.0-1.0)
minSpeechDurationMs: 250, // Minimum speech duration in ms
minSilenceDurationMs: 100, // Minimum silence duration in ms
maxSpeechDurationS: 30, // Maximum speech duration in seconds
speechPadMs: 30, // Padding around speech segments in ms
samplesOverlap: 0.1, // Overlap between analysis windows
})
// Also supports:
// - File paths: vadContext.detectSpeech('path/to/audio.wav', options)
// - HTTP URLs: vadContext.detectSpeech('https://example.com/audio.wav', options)
// - Base64 WAV: vadContext.detectSpeech('data:audio/wav;base64,...', options)
// - Assets: vadContext.detectSpeech(require('./assets/audio.wav'), options)
```
##### From Raw Audio Data
```typescript
// Detect speech in base64 encoded float32 PCM data
const segments = await vadContext.detectSpeechData(base64AudioData, {
threshold: 0.5,
minSpeechDurationMs: 250,
minSilenceDurationMs: 100,
maxSpeechDurationS: 30,
speechPadMs: 30,
samplesOverlap: 0.1,
})
```
#### Process Results
```typescript
segments.forEach((segment, index) => {
console.log(
`Segment ${index + 1}: ${segment.t0.toFixed(2)}s - ${segment.t1.toFixed(
2,
)}s`,
)
console.log(`Duration: ${(segment.t1 - segment.t0).toFixed(2)}s`)
})
```
#### Release VAD Context
```typescript
await vadContext.release()
// Or release all VAD contexts
await releaseAllWhisperVad()
```
## Realtime Transcription
The new `RealtimeTranscriber` provides enhanced realtime transcription with features like Voice Activity Detection (VAD), auto-slicing, and memory management.
```js
// If your RN packager is not enable package exports support, use whisper.rn/src/realtime-transcription
import { RealtimeTranscriber } from 'whisper.rn/realtime-transcription'
import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription/adapters'
import RNFS from 'react-native-fs' // or any compatible filesystem
// Dependencies
const whisperContext = await initWhisper({
/* ... */
})
const vadContext = await initWhisperVad({
/* ... */
})
const audioStream = new AudioPcmStreamAdapter() // requires @fugood/react-native-audio-pcm-stream
// Create transcriber
const transcriber = new RealtimeTranscriber(
{ whisperContext, vadContext, audioStream, fs: RNFS },
{
audioSliceSec: 30,
vadPreset: 'default',
autoSliceOnSpeechEnd: true,
transcribeOptions: { language: 'en' },
},
{
onTranscribe: (event) => console.log('Transcription:', event.data?.result),
onVad: (event) => console.log('VAD:', event.type, event.confidence),
onStatusChange: (isActive) =>
console.log('Status:', isActive ? 'ACTIVE' : 'INACTIVE'),
onError: (error) => console.error('Error:', error),
},
)
// Start/stop transcription
await transcriber.start()
await transcriber.stop()
```
**Dependencies:**
- `@fugood/react-native-audio-pcm-stream` for `AudioPcmStreamAdapter`
- Compatible filesystem module (e.g., `react-native-fs`). See [filesystem interface](src/utils/WavFileWriter.ts#L9-L16) for TypeScript definition
**Custom Audio Adapters:**
You can create custom audio stream adapters by implementing the [AudioStreamInterface](src/realtime-transcription/types.ts#L21-L30). This allows integration with different audio sources or custom audio processing pipelines.
**Example:** See [complete example](example/src/RealtimeTranscriber.tsx) for full implementation including file simulation and UI.
Please visit the [Documentation](docs/) for more details.
## Usage with assets
You can also use the model file / audio file from assets:
```js
import { initWhisper } from 'whisper.rn'
const whisperContext = await initWhisper({
filePath: require('../assets/ggml-tiny.en.bin'),
})
const { stop, promise } = whisperContext.transcribe(
require('../assets/sample.wav'),
options,
)
// ...
```
This requires editing the `metro.config.js` to support assets:
```js
// ...
const defaultAssetExts = require('metro-config/src/defaults/defaults').assetExts
module.exports = {
// ...
resolver: {
// ...
assetExts: [
...defaultAssetExts,
'bin', // whisper.rn: ggml model binary
'mil', // whisper.rn: CoreML model asset
],
},
}
```
Please note that:
- It will significantly increase the size of the app in release mode.
- The RN packager is not allowed file size larger than 2GB, so it not able to use original f16 `large` model (2.9GB), you can use quantized models instead.
## Core ML support
**_Platform: iOS 15.0+, tvOS 15.0+_**
To use Core ML on iOS, you will need to have the Core ML model files.
The `.mlmodelc` model files is load depend on the ggml model file path. For example, if your ggml model path is `ggml-tiny.en.bin`, the Core ML model path will be `ggml-tiny.en-encoder.mlmodelc`. Please note that the ggml model is still needed as decoder or encoder fallback.
The Core ML models are hosted here: https://huggingface.co/ggerganov/whisper.cpp/tree/main
If you want to download model at runtime, during the host file is archive, you will need to unzip the file to get the `.mlmodelc` directory, you can use library like [react-native-zip-archive](https://github.com/mockingbot/react-native-zip-archive), or host those individual files to download yourself.
The `.mlmodelc` is a directory, usually it includes 5 files (3 required):
```json5
[
'model.mil',
'coremldata.bin',
'weights/weight.bin',
// Not required:
// 'metadata.json', 'analytics/coremldata.bin',
]
```
Or just use `require` to bundle that in your app, like the example app does, but this would increase the app size significantly.
```js
const whisperContext = await initWhisper({
filePath: require('../assets/ggml-tiny.en.bin')
coreMLModelAsset:
Platform.OS === 'ios'
? {
filename: 'ggml-tiny.en-encoder.mlmodelc',
assets: [
require('../assets/ggml-tiny.en-encoder.mlmodelc/weights/weight.bin'),
require('../assets/ggml-tiny.en-encoder.mlmodelc/model.mil'),
require('../assets/ggml-tiny.en-encoder.mlmodelc/coremldata.bin'),
],
}
: undefined,
})
```
In real world, we recommended to split the asset imports into another platform specific file (e.g. `context-opts.ios.js`) to avoid these unused files in the bundle for Android.
## Run with example
The example app provide a simple UI for testing the functions.
Used Whisper model: `tiny.en` in https://huggingface.co/ggerganov/whisper.cpp
Sample file: `jfk.wav` in https://github.com/ggerganov/whisper.cpp/tree/master/samples
Please follow the [Development Workflow section of contributing guide](./CONTRIBUTING.md#development-workflow) to run the example app.
## Mock `whisper.rn`
We have provided a mock version of `whisper.rn` for testing purpose you can use on Jest:
```js
jest.mock('whisper.rn', () => require('whisper.rn/jest-mock'))
```
## Apps using `whisper.rn`
- [BRICKS](https://bricks.tools): Our product for building interactive signage in simple way. We provide LLM functions as Generator LLM/Assistant.
- ... (Any Contribution is welcome)
## Node.js binding
- [whisper.node](https://github.com/mybigday/whisper.node): An another Node.js binding of `whisper.cpp` but made API same as `whisper.rn`.
## Contributing
See the [contributing guide](CONTRIBUTING.md) to learn how to contribute to the repository and the development workflow.
## Troubleshooting
See the [troubleshooting](docs/TROUBLESHOOTING.md) if you encounter any problem while using `whisper.rn`.
## License
MIT
---
Made with [create-react-native-library](https://github.com/callstack/react-native-builder-bob)
---
<p align="center">
<a href="https://bricks.tools">
<img width="90px" src="https://avatars.githubusercontent.com/u/17320237?s=200&v=4">
</a>
<p align="center">
Built and maintained by <a href="https://bricks.tools">BRICKS</a>.
</p>
</p>