audio-to-text-node
Version:
Backend audio file to text transcription using Web Speech API with Puppeteer
226 lines (151 loc) โข 7.24 kB
Markdown
# ๐ง audio-to-text
A free and robust backend package for transcribing audio files to text using the [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API).
## Features
- โ
Convert audio files to text
- ๐ค Supports multiple languages
- ๐ง Uses **Web Speech API** inside a headless browser (via Puppeteer)
- ๐ Streams audio using a virtual microphone
- ๐พ Supports all audio file formats supported by ffmpeg (e.g., .mp3, .wav, .ogg, .m4a, etc.)
- ๐ช Automatically sets up required audio routing using `pactl` and `paplay`
- โ๏ธ Works in Linux environments with PipeWire or PulseAudio
## ๐ Requirements
Before installing and using this package, please ensure the following dependencies are installed and properly configured on your system:
- [**ffmpeg**](https://ffmpeg.org/) โ for audio format conversion and processing
- [**ffprobe**](https://ffmpeg.org/) โ for audio validation (comes with ffmpeg)
- [**PipeWire**](https://pipewire.org/) โ **RECOMMENDED** modern audio server
- [**PulseAudio**](https://www.freedesktop.org/wiki/Software/PulseAudio/) โ alternative audio server (older systems)
- [**pactl**](https://docs.pipewire.org/) โ Audio control tool
- [**paplay**](https://docs.pipewire.org/) โ Audio playback utility
- [**Microsoft Edge**](https://www.microsoft.com/en-us/edge) โ Microsoft Edge Browser
- [**Google Chrome**](https://www.google.com/chrome/) or [**Chromium**](https://www.chromium.org/) โ browsers
- [**Node.js**](https://nodejs.org/) โ version 18 or higher is recommended
- [**bun**](https://bun.sh/) โ optional, recommended for development and build tasks
- **Internet connection** (required for browser-based speech recognition)
## Install on Ubuntu/Debian:
### PipeWire (Recommended - Modern Audio Server):
```zsh
sudo apt update
sudo apt install ffmpeg pipewire pipewire-pulse wireplumber
```
### PulseAudio (Alternative - Older Systems):
```zsh
sudo apt update
sudo apt install ffmpeg pulseaudio-utils pulseaudio
```
## ๐ Permissions
- Make sure Node.js has permission to run `pactl` and `paplay`
- Puppeteer will launch a headless browser and use your virtual audio devices
## ๐ฆ Installation
To install with Bun:
```bash
bun add audio-to-text-node
```
Or with npm:
```bash
npm install audio-to-text-node
```
## ๐งผ Cleanup
The package creates temporary folders in `/tmp/audio-to-text` and cleans them up automatically after use.
## โจ Usage
```typescript
import { transcribeFromFile } from "audio-to-text-node";
async function main() {
const transcript = await transcribeFromFile("/path/to/audio.wav", {
language: "en-US",
executablePath: "/usr/bin/microsoft-edge",
speakerDevice: "virtual_speaker",
microphoneDevice: "virtual_microphone",
});
console.log(transcript);
}
main();
```
## Tested Distributions
| Distribution | Version | Status |
| ------------ | ------- | ---------------- |
| Ubuntu | 24.10 | โ
Fully Tested |
| MacOS | - | โ Not Supported |
| Windows | - | โ Not Supported |
> **Note:** This package is designed for Linux environments.
## ๐ API Reference
### ๐ง `transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>`
| ๐งฉ Parameter | ๐ Type | ๐ Description | ๐งต Default |
| -------------------------- | -------- | ------------------------------------------------------ | ---------------------- |
| `filePath` | `string` | Path to the audio file (`.wav`, `.mp3`, `.ogg`, etc.) | โ |
| `options.language` | `string` | Language code for transcription | `'en-US'` |
| `options.executablePath` | `string` | Path to browser executable | Auto-detected |
| `options.speakerDevice` | `string` | Virtual speaker device name (PipeWire/PulseAudio) | `'virtual_speaker'` |
| `options.microphoneDevice` | `string` | Virtual microphone device name (PipeWire/PulseAudio) | `'virtual_microphone'` |
#### Browser Detection Priority:
1. **Microsoft Edge** - `/usr/bin/microsoft-edge`
2. **Google Chrome** - `/usr/bin/google-chrome`
3. **Chromium** - `/usr/bin/chromium-browser`
๐ **Returns:** `Promise<string>` โ The transcribed text.
### โ๏ธ How it works:
1. โ
Validates and splits the audio file into 5-second chunks
2. ๐ Sets up virtual audio devices for routing (PipeWire/PulseAudio)
3. ๐งญ Launches a headless browser and uses Web Speech API for transcription
4. ๐งน Cleans up temporary files and restores audio routing
## ๐ต Supported Audio Formats
This package supports all audio formats supported by ffmpeg. For a full list, see:
- [FFmpeg Supported File Formats](https://ffmpeg.org/general.html#File-Formats)
Common formats include: `.wav`, `.mp3`, `.ogg`, `.flac`, `.aac`, `.m4a`, and more.
## ๐ Supported Languages
You can use any language supported by the Web Speech API and Google Speech-to-Text. For a full list, see:
- [Google Speech-to-Text Supported Languages](https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages)
Specify the language code (e.g., `en-US`, `fa-IR`, `fr-FR`, etc.) in the `language` option.
## ๐ ๏ธ Troubleshooting
- Ensure all prerequisites are installed and available in your PATH (`which ffmpeg`, `which ffprobe`, `which pactl`, `which paplay`)
- For best audio performance: Use PipeWire (modern) over PulseAudio (legacy)
- For long audio files, ensure enough disk space in `/tmp`
- If you get permission errors, run with appropriate user rights
- For best results, use high-quality audio files (16kHz mono recommended)
- Make sure your connection is stable and not interrupted during transcription
- Only Linux with PipeWire or PulseAudio is supported
- If browser detection fails, explicitly set `executablePath` to your browser location
### Common Browser Paths:
```bash
# Check if browsers are installed
which microsoft-edge
which google-chrome
which chromium-browser
```
### Audio System Check:
```bash
# Check if PipeWire is running (recommended)
systemctl --user status pipewire pipewire-pulse
# Check if PulseAudio is running (alternative)
systemctl --user status pulseaudio
# Test audio commands
which pactl paplay # Should work with both PipeWire and PulseAudio
```
## ๐ Changelog
### Version 0.2.0 (Latest)
- ๐ **BREAKING**: Switched from `puppeteer` to `puppeteer-core` for better control
- โจ Added **multi-browser support** with automatic detection (Edge, Chrome, Chromium)
- โ๏ธ Added `executablePath` option to specify custom browser location
- ๐ต Added **PipeWire support** (recommended audio server)
## ๐ฌ Contributing
Pull requests and issues are welcome!<br>
Please open issues for any bugs or feature requests.
When contributing, please:
- Use clear commit messages
- Follow TypeScript best practices
## ๐ License
MIT ยฉ 2025 [ErfanBahramali](https://github.com/ErfanBahramali)