UNPKG

whisper-nodejs-wrapper

Version:

Node.js wrapper for OpenAI Whisper speech recognition with TypeScript support

206 lines (147 loc) 5.18 kB
# Whisper for Node.js A Node.js wrapper for OpenAI's Whisper speech recognition model. This package provides an easy-to-use interface for transcribing audio files with word-level timestamps. ## Features - 🎯 Simple async/await API - 🔄 Automatic retry with exponential backoff - 📝 Word-level timestamps - 🌍 Multi-language support - 🔧 TypeScript support - 🚀 Automatic dependency installation - 💻 CPU and GPU support ## Installation ```bash npm install @whisper/nodejs ``` The package will automatically create a Python virtual environment and install dependencies during the npm install process. This avoids conflicts with system Python packages. ## Quick Start ```javascript const { whisper } = require('@whisper/nodejs'); // Basic transcription const result = await whisper.transcribe('audio.mp3'); console.log(result.text); // With options const result = await whisper.transcribe('audio.mp3', { language: 'en', modelSize: 'base' }); ``` ## TypeScript Usage ```typescript import { WhisperTranscriber, WhisperOptions, WhisperResult } from '@whisper/nodejs'; const transcriber = new WhisperTranscriber(); const options: WhisperOptions = { language: 'en', modelSize: 'base', verbose: true }; const result: WhisperResult = await transcriber.transcribe('audio.mp3', options); // Access word-level timestamps result.segments.forEach(segment => { console.log(`[${segment.start}-${segment.end}] ${segment.text}`); segment.words?.forEach(word => { console.log(` ${word.text} (${word.start}-${word.end})`); }); }); ``` ## API Reference ### `WhisperTranscriber` #### Constructor ```typescript new WhisperTranscriber(options?: { pythonPath?: string }) ``` - `pythonPath` (optional): Path to Python executable. Auto-detects if not provided. #### Methods ##### `transcribe(audioPath: string, options?: WhisperOptions): Promise<WhisperResult>` Transcribe an audio file. **Parameters:** - `audioPath`: Path to the audio file - `options`: Transcription options ##### `transcribeWithRetry(audioPath: string, options?: WhisperOptions, maxRetries?: number): Promise<WhisperResult>` Transcribe with automatic retry on failure. **Parameters:** - `audioPath`: Path to the audio file - `options`: Transcription options - `maxRetries`: Maximum number of retry attempts (default: 3) ##### `initialize(): Promise<void>` Initialize and check/install dependencies. ##### `checkDependencies(): Promise<boolean>` Check if Python dependencies are installed. ### Types #### `WhisperOptions` ```typescript interface WhisperOptions { language?: string; // Language code (e.g., 'en', 'es', 'fr') modelSize?: 'tiny' | 'base' | 'small' | 'medium' | 'large'; pythonPath?: string; // Custom Python path cpuOnly?: boolean; // Force CPU-only mode verbose?: boolean; // Enable verbose logging } ``` #### `WhisperResult` ```typescript interface WhisperResult { text: string; // Full transcribed text segments: WhisperSegment[]; // Time-aligned segments language?: string; // Detected language duration?: number; // Total audio duration } ``` #### `WhisperSegment` ```typescript interface WhisperSegment { text: string; // Segment text start: number; // Start time in seconds end: number; // End time in seconds words?: WhisperWord[]; // Word-level timestamps } ``` ## Model Sizes | Model | Parameters | English-only | Multilingual | Required VRAM | Relative Speed | |-------|------------|--------------|--------------|---------------|----------------| | tiny | 39 M | ✓ | ✓ | ~1 GB | ~32x | | base | 74 M | ✓ | ✓ | ~1 GB | ~16x | | small | 244 M | ✓ | ✓ | ~2 GB | ~6x | | medium| 769 M | ✓ | ✓ | ~5 GB | ~2x | | large | 1550 M | ✗ | ✓ | ~10 GB | 1x | ## Language Support Supports 100+ languages including: - English (`en`) - Spanish (`es`) - French (`fr`) - German (`de`) - Italian (`it`) - Portuguese (`pt`) - Russian (`ru`) - Chinese (`zh`) - Japanese (`ja`) - Korean (`ko`) - Vietnamese (`vi`) - And many more... ## Environment Variables - `WHISPER_CPU_ONLY`: Set to `"1"` to force CPU-only mode - `WHISPER_VERBOSE`: Set to `"true"` for verbose logging - `SKIP_WHISPER_SETUP`: Set to `"true"` to skip automatic setup ## Requirements - Node.js >= 16.0.0 - Python >= 3.7 - FFmpeg (for audio processing) ## Troubleshooting ### Python not found Make sure Python 3.7+ is installed and available in PATH: ```bash python3 --version ``` ### Manual dependency installation If automatic installation fails: ```bash pip install openai-whisper torch ``` ### GPU Support For GPU acceleration, install CUDA-enabled PyTorch: ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 ``` ## License MIT ## Contributing Contributions are welcome! Please feel free to submit a Pull Request.