whisper-nodejs-wrapper
Version:
Node.js wrapper for OpenAI Whisper speech recognition with TypeScript support
206 lines (147 loc) • 5.18 kB
Markdown
# Whisper for Node.js
A Node.js wrapper for OpenAI's Whisper speech recognition model. This package provides an easy-to-use interface for transcribing audio files with word-level timestamps.
## Features
- 🎯 Simple async/await API
- 🔄 Automatic retry with exponential backoff
- 📝 Word-level timestamps
- 🌍 Multi-language support
- 🔧 TypeScript support
- 🚀 Automatic dependency installation
- 💻 CPU and GPU support
## Installation
```bash
npm install @whisper/nodejs
```
The package will automatically create a Python virtual environment and install dependencies during the npm install process. This avoids conflicts with system Python packages.
## Quick Start
```javascript
const { whisper } = require('@whisper/nodejs');
// Basic transcription
const result = await whisper.transcribe('audio.mp3');
console.log(result.text);
// With options
const result = await whisper.transcribe('audio.mp3', {
language: 'en',
modelSize: 'base'
});
```
## TypeScript Usage
```typescript
import { WhisperTranscriber, WhisperOptions, WhisperResult } from '@whisper/nodejs';
const transcriber = new WhisperTranscriber();
const options: WhisperOptions = {
language: 'en',
modelSize: 'base',
verbose: true
};
const result: WhisperResult = await transcriber.transcribe('audio.mp3', options);
// Access word-level timestamps
result.segments.forEach(segment => {
console.log(`[${segment.start}-${segment.end}] ${segment.text}`);
segment.words?.forEach(word => {
console.log(` ${word.text} (${word.start}-${word.end})`);
});
});
```
## API Reference
### `WhisperTranscriber`
#### Constructor
```typescript
new WhisperTranscriber(options?: { pythonPath?: string })
```
- `pythonPath` (optional): Path to Python executable. Auto-detects if not provided.
#### Methods
##### `transcribe(audioPath: string, options?: WhisperOptions): Promise<WhisperResult>`
Transcribe an audio file.
**Parameters:**
- `audioPath`: Path to the audio file
- `options`: Transcription options
##### `transcribeWithRetry(audioPath: string, options?: WhisperOptions, maxRetries?: number): Promise<WhisperResult>`
Transcribe with automatic retry on failure.
**Parameters:**
- `audioPath`: Path to the audio file
- `options`: Transcription options
- `maxRetries`: Maximum number of retry attempts (default: 3)
##### `initialize(): Promise<void>`
Initialize and check/install dependencies.
##### `checkDependencies(): Promise<boolean>`
Check if Python dependencies are installed.
### Types
#### `WhisperOptions`
```typescript
interface WhisperOptions {
language?: string; // Language code (e.g., 'en', 'es', 'fr')
modelSize?: 'tiny' | 'base' | 'small' | 'medium' | 'large';
pythonPath?: string; // Custom Python path
cpuOnly?: boolean; // Force CPU-only mode
verbose?: boolean; // Enable verbose logging
}
```
#### `WhisperResult`
```typescript
interface WhisperResult {
text: string; // Full transcribed text
segments: WhisperSegment[]; // Time-aligned segments
language?: string; // Detected language
duration?: number; // Total audio duration
}
```
#### `WhisperSegment`
```typescript
interface WhisperSegment {
text: string; // Segment text
start: number; // Start time in seconds
end: number; // End time in seconds
words?: WhisperWord[]; // Word-level timestamps
}
```
## Model Sizes
| Model | Parameters | English-only | Multilingual | Required VRAM | Relative Speed |
|-------|------------|--------------|--------------|---------------|----------------|
| tiny | 39 M | ✓ | ✓ | ~1 GB | ~32x |
| base | 74 M | ✓ | ✓ | ~1 GB | ~16x |
| small | 244 M | ✓ | ✓ | ~2 GB | ~6x |
| medium| 769 M | ✓ | ✓ | ~5 GB | ~2x |
| large | 1550 M | ✗ | ✓ | ~10 GB | 1x |
## Language Support
Supports 100+ languages including:
- English (`en`)
- Spanish (`es`)
- French (`fr`)
- German (`de`)
- Italian (`it`)
- Portuguese (`pt`)
- Russian (`ru`)
- Chinese (`zh`)
- Japanese (`ja`)
- Korean (`ko`)
- Vietnamese (`vi`)
- And many more...
## Environment Variables
- `WHISPER_CPU_ONLY`: Set to `"1"` to force CPU-only mode
- `WHISPER_VERBOSE`: Set to `"true"` for verbose logging
- `SKIP_WHISPER_SETUP`: Set to `"true"` to skip automatic setup
## Requirements
- Node.js >= 16.0.0
- Python >= 3.7
- FFmpeg (for audio processing)
## Troubleshooting
### Python not found
Make sure Python 3.7+ is installed and available in PATH:
```bash
python3 --version
```
### Manual dependency installation
If automatic installation fails:
```bash
pip install openai-whisper torch
```
### GPU Support
For GPU acceleration, install CUDA-enabled PyTorch:
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
## License
MIT
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.