whisper.rn

# whisper.rn [![Actions Status](https://github.com/mybigday/whisper.rn/workflows/CI/badge.svg)](https://github.com/mybigday/whisper.rn/actions) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![npm](https://img.shields.io/npm/v/whisper.rn.svg)](https://www.npmjs.com/package/whisper.rn/) React Native binding of [whisper.cpp](https://github.com/ggerganov/whisper.cpp). [whisper.cpp](https://github.com/ggerganov/whisper.cpp): High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model ## Screenshots | <img src="https://github.com/mybigday/whisper.rn/assets/3001525/2fea7b2d-c911-44fb-9afc-8efc7b594446" width="300" /> | <img src="https://github.com/mybigday/whisper.rn/assets/3001525/a5005a6c-44f7-4db9-95e8-0fd951a2e147" width="300" /> | | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | | iOS: Tested on iPhone 13 Pro Max | Android: Tested on Pixel 6 | | (tiny.en, Core ML enabled, release mode + archive) | (tiny.en, armv8.2-a+fp16, release mode) | ## Installation ```sh npm install whisper.rn ``` #### iOS Please re-run `npx pod-install` again. By default, `whisper.rn` will use pre-built `rnwhisper.xcframework` for iOS. If you want to build from source, please set `RNWHISPER_BUILD_FROM_SOURCE` to `1` in your Podfile. If you want to use `medium` or `large` model, the [Extended Virtual Addressing](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_extended-virtual-addressing) capability is recommended to enable on iOS project. #### Android Add proguard rule if it's enabled in project (android/app/proguard-rules.pro): ```proguard # whisper.rn -keep class com.rnwhisper.** { *; } ``` It's recommended to use `ndkVersion = "24.0.8215888"` (or above) in your root project build configuration for Apple Silicon Macs. Otherwise please follow this trobleshooting [issue](./TROUBLESHOOTING.md#android-got-build-error-unknown-host-cpu-architecture-arm64-on-apple-silicon-macs). #### Expo You will need to prebuild the project before using it. See [Expo guide](https://docs.expo.io/guides/using-libraries/#using-a-library-in-a-expo-project) for more details. ## Tips & Tricks The [Tips & Tricks](docs/TIPS.md) document is a collection of tips and tricks for using `whisper.rn`. ## Usage ```js import { initWhisper } from 'whisper.rn' const whisperContext = await initWhisper({ filePath: 'file://.../ggml-tiny.en.bin', }) const sampleFilePath = 'file://.../sample.wav' const options = { language: 'en' } const { stop, promise } = whisperContext.transcribe(sampleFilePath, options) const { result } = await promise // result: (The inference text result from audio file) ``` ## Voice Activity Detection (VAD) Voice Activity Detection allows you to detect speech segments in audio data using the Silero VAD model. #### Initialize VAD Context ```typescript import { initWhisperVad } from 'whisper.rn' const vadContext = await initWhisperVad({ filePath: require('./assets/ggml-silero-v6.2.0.bin'), // VAD model file useGpu: true, // Use GPU acceleration (iOS only) nThreads: 4, // Number of threads for processing }) ``` #### Detect Speech Segments ##### From Audio Files ```typescript // Detect speech in audio file (supports same formats as transcribe) const segments = await vadContext.detectSpeech(require('./assets/audio.wav'), { threshold: 0.5, // Speech probability threshold (0.0-1.0) minSpeechDurationMs: 250, // Minimum speech duration in ms minSilenceDurationMs: 100, // Minimum silence duration in ms maxSpeechDurationS: 30, // Maximum speech duration in seconds speechPadMs: 30, // Padding around speech segments in ms samplesOverlap: 0.1, // Overlap between analysis windows }) // Also supports: // - File paths: vadContext.detectSpeech('path/to/audio.wav', options) // - HTTP URLs: vadContext.detectSpeech('https://example.com/audio.wav', options) // - Base64 WAV: vadContext.detectSpeech('data:audio/wav;base64,...', options) // - Assets: vadContext.detectSpeech(require('./assets/audio.wav'), options) ``` ##### From Raw Audio Data ```typescript // Detect speech in base64 encoded float32 PCM data const segments = await vadContext.detectSpeechData(base64AudioData, { threshold: 0.5, minSpeechDurationMs: 250, minSilenceDurationMs: 100, maxSpeechDurationS: 30, speechPadMs: 30, samplesOverlap: 0.1, }) ``` #### Process Results ```typescript segments.forEach((segment, index) => { console.log( `Segment ${index + 1}: ${segment.t0.toFixed(2)}s - ${segment.t1.toFixed( 2, )}s`, ) console.log(`Duration: ${(segment.t1 - segment.t0).toFixed(2)}s`) }) ``` #### Release VAD Context ```typescript await vadContext.release() // Or release all VAD contexts await releaseAllWhisperVad() ``` ## Realtime Transcription The new `RealtimeTranscriber` provides enhanced realtime transcription with features like Voice Activity Detection (VAD), auto-slicing, and memory management. ```js // If your RN packager is not enable package exports support, use whisper.rn/src/realtime-transcription import { RealtimeTranscriber } from 'whisper.rn/realtime-transcription' import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription/adapters' import RNFS from 'react-native-fs' // or any compatible filesystem // Dependencies const whisperContext = await initWhisper({ /* ... */ }) const vadContext = await initWhisperVad({ /* ... */ }) const audioStream = new AudioPcmStreamAdapter() // requires @fugood/react-native-audio-pcm-stream // Create transcriber const transcriber = new RealtimeTranscriber( { whisperContext, vadContext, audioStream, fs: RNFS }, { audioSliceSec: 30, vadPreset: 'default', autoSliceOnSpeechEnd: true, transcribeOptions: { language: 'en' }, }, { onTranscribe: (event) => console.log('Transcription:', event.data?.result), onVad: (event) => console.log('VAD:', event.type, event.confidence), onStatusChange: (isActive) => console.log('Status:', isActive ? 'ACTIVE' : 'INACTIVE'), onError: (error) => console.error('Error:', error), }, ) // Start/stop transcription await transcriber.start() await transcriber.stop() ``` **Dependencies:** - `@fugood/react-native-audio-pcm-stream` for `AudioPcmStreamAdapter` - Compatible filesystem module (e.g., `react-native-fs`). See [filesystem interface](src/utils/WavFileWriter.ts#L9-L16) for TypeScript definition **Custom Audio Adapters:** You can create custom audio stream adapters by implementing the [AudioStreamInterface](src/realtime-transcription/types.ts#L21-L30). This allows integration with different audio sources or custom audio processing pipelines. **Example:** See [complete example](example/src/RealtimeTranscriber.tsx) for full implementation including file simulation and UI. Please visit the [Documentation](docs/) for more details. ## Usage with assets You can also use the model file / audio file from assets: ```js import { initWhisper } from 'whisper.rn' const whisperContext = await initWhisper({ filePath: require('../assets/ggml-tiny.en.bin'), }) const { stop, promise } = whisperContext.transcribe( require('../assets/sample.wav'), options, ) // ... ``` This requires editing the `metro.config.js` to support assets: ```js // ... const defaultAssetExts = require('metro-config/src/defaults/defaults').assetExts module.exports = { // ... resolver: { // ... assetExts: [ ...defaultAssetExts, 'bin', // whisper.rn: ggml model binary 'mil', // whisper.rn: CoreML model asset ], }, } ``` Please note that: - It will significantly increase the size of the app in release mode. - The RN packager is not allowed file size larger than 2GB, so it not able to use original f16 `large` model (2.9GB), you can use quantized models instead. ## Core ML support **_Platform: iOS 15.0+, tvOS 15.0+_** To use Core ML on iOS, you will need to have the Core ML model files. The `.mlmodelc` model files is load depend on the ggml model file path. For example, if your ggml model path is `ggml-tiny.en.bin`, the Core ML model path will be `ggml-tiny.en-encoder.mlmodelc`. Please note that the ggml model is still needed as decoder or encoder fallback. The Core ML models are hosted here: https://huggingface.co/ggerganov/whisper.cpp/tree/main If you want to download model at runtime, during the host file is archive, you will need to unzip the file to get the `.mlmodelc` directory, you can use library like [react-native-zip-archive](https://github.com/mockingbot/react-native-zip-archive), or host those individual files to download yourself. The `.mlmodelc` is a directory, usually it includes 5 files (3 required): ```json5 [ 'model.mil', 'coremldata.bin', 'weights/weight.bin', // Not required: // 'metadata.json', 'analytics/coremldata.bin', ] ``` Or just use `require` to bundle that in your app, like the example app does, but this would increase the app size significantly. ```js const whisperContext = await initWhisper({ filePath: require('../assets/ggml-tiny.en.bin') coreMLModelAsset: Platform.OS === 'ios' ? { filename: 'ggml-tiny.en-encoder.mlmodelc', assets: [ require('../assets/ggml-tiny.en-encoder.mlmodelc/weights/weight.bin'), require('../assets/ggml-tiny.en-encoder.mlmodelc/model.mil'), require('../assets/ggml-tiny.en-encoder.mlmodelc/coremldata.bin'), ], } : undefined, }) ``` In real world, we recommended to split the asset imports into another platform specific file (e.g. `context-opts.ios.js`) to avoid these unused files in the bundle for Android. ## Run with example The example app provide a simple UI for testing the functions. Used Whisper model: `tiny.en` in https://huggingface.co/ggerganov/whisper.cpp Sample file: `jfk.wav` in https://github.com/ggerganov/whisper.cpp/tree/master/samples Please follow the [Development Workflow section of contributing guide](./CONTRIBUTING.md#development-workflow) to run the example app. ## Mock `whisper.rn` We have provided a mock version of `whisper.rn` for testing purpose you can use on Jest: ```js jest.mock('whisper.rn', () => require('whisper.rn/jest-mock')) ``` ## Apps using `whisper.rn` - [BRICKS](https://bricks.tools): Our product for building interactive signage in simple way. We provide LLM functions as Generator LLM/Assistant. - ... (Any Contribution is welcome) ## Node.js binding - [whisper.node](https://github.com/mybigday/whisper.node): An another Node.js binding of `whisper.cpp` but made API same as `whisper.rn`. ## Contributing See the [contributing guide](CONTRIBUTING.md) to learn how to contribute to the repository and the development workflow. ## Troubleshooting See the [troubleshooting](docs/TROUBLESHOOTING.md) if you encounter any problem while using `whisper.rn`. ## License MIT --- Made with [create-react-native-library](https://github.com/callstack/react-native-builder-bob) --- <p align="center"> <a href="https://bricks.tools"> <img width="90px" src="https://avatars.githubusercontent.com/u/17320237?s=200&v=4"> </a> <p align="center"> Built and maintained by <a href="https://bricks.tools">BRICKS</a>. </p> </p>