aiabm
Version:
AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)
431 lines (340 loc) β’ 16.9 kB
Markdown
# π§ AI Audiobook Maker (AIABM) v5.1.0
[](https://www.npmjs.com/package/aiabm)
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org)
Transform your PDFs and text files into high-quality audiobooks using **OpenAI TTS** (cloud) or **Thorsten-Voice** (native German). Choose between premium cloud voices or run everything locally at no cost!
**π New in v5.1:** Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.
## β¨ Features
### ποΈ **Dual TTS Providers**
- **βοΈ OpenAI TTS**: Premium cloud voices with 6 voice options (requires API key)
- **π©πͺ Thorsten-Voice**: Native German TTS with authentic pronunciation (local/free)
### π **Core Features**
- **π Zero Installation**: Run directly with `npx aiabm`
- **π Smart File Handling**: Supports PDF and TXT files with drag & drop
- **π€ Voice Preview**: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
- **π Enhanced Security**: Input sanitization, API key validation, and secure storage
- **π§ͺ Comprehensive Testing**: 55+ unit tests with 12.6% coverage and growing
- **βΈοΈ Resume & Pause**: Continue interrupted conversions anytime
- **π Secure API Key Management**: Encrypted local storage
- **π Progress Tracking**: Real-time conversion progress with estimates
- **ποΈ Advanced Controls**: Adjust speed, quality, and output format
- **π° Cost Transparency**: See exact pricing (OpenAI) or run free (local providers)
- **π§ Smart Installation**: Automatic setup for local TTS providers
## π Quick Start
### Method 1: Direct Usage (Recommended)
```bash
# Convert a specific file
npx aiabm mybook.pdf
# Interactive mode
npx aiabm
```
### Method 2: Global Installation
```bash
npm install -g aiabm
aiabm mybook.pdf
```
## π Prerequisites
### Required
- **Node.js 16+** (Download from [nodejs.org](https://nodejs.org/))
- **FFmpeg** (for audio combining - auto-installed on most systems)
### Optional (Choose One or Both)
**For OpenAI TTS:**
- OpenAI API key (get from [platform.openai.com](https://platform.openai.com/account/api-keys))
- Costs ~$0.015 per 1,000 characters
**For Thorsten-Voice (German TTS):**
- Python 3.9-3.11 (auto-installed)
- Coqui TTS (auto-installed)
- **Completely FREE** - runs locally
## π― Usage Examples
### CLI Mode
```bash
# Basic conversion
npx aiabm document.pdf
# With specific options (OpenAI)
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd
# Manage API key
npx aiabm --config
```
### Interactive Mode
```bash
npx aiabm
```
Then follow the interactive prompts to:
1. **Select TTS Provider** (OpenAI, Fish Speech, or Thorsten-Voice)
2. **Auto-install local providers** if needed (one-time setup)
3. **Select your file** (browse, drag & drop, or enter path)
4. **Preview and choose a voice**
5. **Configure settings** (speed, quality, output format)
6. **Monitor progress** and resume if needed
## π€ Available Voices
### π€ OpenAI TTS (Cloud)
- **Alloy**: Neutral, versatile
- **Echo**: Clear, professional
- **Fable**: Warm, storytelling
- **Onyx**: Deep, authoritative
- **Nova**: Bright, engaging
- **Shimmer**: Gentle, soothing
### π Fish Speech (Local/Multilingual)
- **π©πͺ German Female (Natural)**: High-quality German synthesis
- **π©πͺ German Male (Clear)**: Professional German voice
- **π©πͺ German Female (Expressive)**: Emotional German narration
- **πΊπΈ English Female (Warm)**: Natural English voice
- **πΊπΈ English Male (Professional)**: Business-quality English
- **πΊπΈ English Female (Energetic)**: Dynamic storytelling
- **π«π· French Female (Elegant)**: Sophisticated French accent
- **π«π· French Male (Sophisticated)**: Professional French voice
### π©πͺ Thorsten-Voice (Native German)
- **π©πͺ Thorsten (Authentic German Male)**: High-quality native German voice
- **π©πͺ Thorsten Emotional (German Male)**: German voice with emotional expression
## π° Pricing
### OpenAI TTS
**$0.015 per 1,000 characters**
| Content Length | Estimated Cost | Example |
|----------------|----------------|---------|
| 10,000 characters | ~$0.15 | Short article |
| 50,000 characters | ~$0.75 | Small e-book |
| 100,000 characters | ~$1.50 | Average novel |
| 250,000 characters | ~$3.75 | Large book |
### Fish Speech & Thorsten-Voice
**100% FREE** - No API costs, runs entirely on your machine!
## π§ Local TTS Setup
Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! **Now with fully automated installation!**
### π Smart Installation (Recommended)
```bash
npx aiabm
# Select "Fish Speech" or "Thorsten-Voice"
# Choose "Auto Install (recommended)"
# β System automatically downloads and configures everything!
```
### π Fish Speech Setup
**What happens automatically:**
1. **π¦ Repository Cloning** - Downloads latest Fish Speech
2. **π Virtual Environment** - Creates isolated Python environment
3. **β‘ PyTorch Installation** - Installs optimized CPU version
4. **π€ Model Download** - Downloads Fish Speech 1.2 models (~1GB)
5. **β
Dependency Check** - Verifies installation works
**System Requirements:**
- **Python 3.8+** recommended
- **~2GB disk space** for models and dependencies
- **4GB+ RAM** recommended
- **CPU or GPU** (GPU faster but optional)
### π©πͺ Thorsten-Voice Setup
**What happens automatically:**
1. **π Compatible Python Detection** - Finds Python 3.9-3.11
2. **π¦ Virtual Environment** - Creates isolated environment
3. **π€ Coqui TTS Installation** - Installs German TTS framework
4. **π€ Thorsten Model** - Downloads German voice model (~500MB)
5. **β
Compatibility Check** - Verifies everything works
**System Requirements:**
- **Python 3.9-3.11** (NOT 3.12+, NOT 3.13+)
- **~1GB disk space** for models and dependencies
- **2GB+ RAM** recommended
**Python Version Issues?**
```bash
# Install compatible Python on macOS
brew install python@3.11
# On Ubuntu/Debian
sudo apt install python3.11 python3.11-venv
```
### π§ Installation Status Tracking
- **β
Smart Detection**: Avoids re-installation if already installed
- **π
Version Tracking**: Shows installation date and version
- **π Update Suggestions**: Recommends updates after 30+ days
- **π οΈ Installation Markers**: Persistent installation state
## π§ Advanced Features
### Resume Interrupted Conversions
If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.
### Multiple Output Formats
- **Single File**: One complete audiobook MP3
- **Chapter Files**: Separate MP3 per chunk
- **Both**: Get both formats
### Voice Preview Caching
Voice previews are cached locally to save API costs and improve performance.
### Smart Text Chunking
- Respects sentence boundaries
- Preserves chapter structure for PDFs
- Configurable chunk sizes (default: 4000 characters)
## π File Support
### PDF Files
- β
Up to 50MB
- β
Text extraction with structure preservation
- β
Automatic chapter detection
### Text Files
- β
Up to 1M characters
- β
UTF-8 encoding
- β
Automatic formatting cleanup
## π What's New in v5.0
### π **Enhanced Security**
- **Input Sanitization**: Prevents code injection and malicious input
- **API Key Validation**: Comprehensive security checks for OpenAI keys
- **Secure Storage**: Encrypted API key storage with multiple layers
- **Environment Assessment**: Automatic security environment analysis
### π§ͺ **Comprehensive Testing**
- **55+ Unit Tests**: Extensive test coverage for core functionality
- **12.6% Code Coverage**: Growing test suite with focus on critical paths
- **Mocked Services**: Fast, reliable tests without external dependencies
- **CI/CD Pipeline**: Automated testing on every commit
### π‘οΈ **Better Error Handling**
- **Type-Safe Validation**: Zod schemas for all configuration and data
- **Graceful Failures**: Better error messages and recovery mechanisms
- **Logging & Monitoring**: Detailed error tracking and user feedback
### π― **Developer Experience**
- **GitHub Actions**: Automated CI/CD with security auditing
- **ESLint Clean**: Zero linting errors with consistent code style
- **Documentation**: Comprehensive inline documentation and examples
---
## βοΈ Configuration
### API Key Storage
Your OpenAI API key is encrypted and stored locally at:
- **macOS/Linux**: `~/.config/ai-audiobook-maker/config.json`
- **Windows**: `%APPDATA%\ai-audiobook-maker\config.json`
### Cache Location
Voice previews and temporary files:
- **macOS/Linux**: `~/.config/ai-audiobook-maker/cache/`
- **Windows**: `%APPDATA%\ai-audiobook-maker\cache\`
### Local TTS Installations
Local TTS providers are installed to:
- **Fish Speech**: `~/.aiabm/fish-speech/`
- **Thorsten-Voice**: `~/.aiabm/thorsten-voice/`
## π οΈ Troubleshooting
### Common Issues
**"FFmpeg not found"**
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html
```
**"API key invalid"**
- Verify your key at [OpenAI Platform](https://platform.openai.com/account/api-keys)
- Use `npx aiabm --config` to update your key
**"File too large"**
- PDFs: Maximum 50MB
- Text: Maximum 1M characters
- Split large files before conversion
**"Fish Speech dependencies missing"**
- Check Python version: `python3 --version`
- Try restarting the app
- Virtual environment issues usually resolve on restart
**"Thorsten-Voice requires Python 3.9-3.11"**
- Install compatible Python: `brew install python@3.11`
- App will automatically detect and use it
- Creates separate virtual environment
**Voice preview not playing**
- macOS: Uses built-in `afplay`
- Windows: Uses PowerShell media player
- Linux: Requires `ffplay`, `mpv`, `vlc`, or `mplayer`
### Performance Tips
- Use `tts-1` model for faster processing
- Use `tts-1-hd` for higher quality (slower)
- Local TTS providers are free but slower than cloud
- Cache clears automatically after 30 days
- Resume feature prevents re-processing completed chunks
## π Privacy & Security
- API keys are encrypted locally using AES-192
- No data is sent to servers when using local TTS
- OpenAI TTS sends only text chunks to OpenAI servers
- Cache files are stored locally only
- Session data helps resume interrupted conversions
- Local TTS models run entirely offline
## π Examples
### Converting a PDF Book with German Voice
```bash
npx aiabm "Mein Roman.pdf"
# Select "Thorsten-Voice"
# Choose German voice
# Enjoy authentic German pronunciation!
```
### Interactive Multilingual Setup
```bash
npx aiabm
# Select "Fish Speech"
# Auto-install if needed
# Preview German, English, and French voices
# Choose your favorite for the content language
```
### Quick OpenAI Conversion
```bash
npx aiabm document.pdf --voice nova --speed 1.1
```
## π€ Contributing
Issues and feature requests welcome at: [GitHub Issues](https://github.com/iamthamanic/AI-Audiobook-Maker/issues)
## π License
MIT License - see LICENSE file for details
## π Acknowledgments
- Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS
- Fish Speech: https://github.com/fishaudio/fish-speech
- Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice
- Coqui TTS: https://github.com/coqui-ai/TTS
- Uses FFmpeg for audio processing
## π Changelog
### v4.0.7 (2025-08-03) - π Fish Speech Fully Fixed & Operational
- π **Fish Speech 100% Working** - Complete resolution of all Fish Speech TTS issues
- π§ **Fixed tokenizer.tiktoken** - Proper base64 encoding of 32,000 tokens from Fish Speech
- βοΈ **Model Configuration Fixed** - Created correct firefly_gan_vq.yaml matching model architecture
- π **Dimension Mismatch Resolved** - Fixed 512-dim vs 1024-dim PyTorch tensor issues
- β
**Parameter Validation Fixed** - Corrected ServeTTSRequest use_memory_cache format
- π― **End-to-End Functionality** - Text-to-semantic and decoder models load perfectly
- π **Full Service Availability** - Fish Speech now detected as available and operational
### v4.0.6 (2025-08-03) - π§ͺ Comprehensive Test Coverage & TTS Fixes
- π§ͺ **Major Test Coverage Improvement** - 20% to 45.07% overall coverage (+125% improvement)
- π― **AudiobookMaker.js Tests** - 0% to 42.58% coverage with integration tests
- π **ConfigManager.js Tests** - 0% to 98.03% coverage with security tests
- π **FileHandler.js Tests** - 0% to 72.99% coverage with core functionality tests
- π₯οΈ **cli.js Tests** - 0% to 75.75% coverage with end-to-end tests
- π **Fish Speech Fixed** - Installation detection and availability checking
- π©πͺ **Thorsten Voice Fixed** - Python 3.13 compatibility and installation issues
- π **207 Total Tests** - 195 passing with comprehensive edge case coverage
- π§ **Integration Tests** - Real-world testing with actual TTS services and PDF processing
- π‘οΈ **Robust Error Handling** - Enhanced service availability validation
### v4.0.5 (2025-08-03) - π΅ Unified Preview System
- π΅ **Unified Preview Texts** - Consistent voice previews across all TTS providers
- π **Language-Specific Previews** - German, English, and French preview texts
- πΎ **Smart Caching** - Consistent cache filenames prevent preview regeneration
- π― **Voice Language Detection** - Automatic language detection from voice names
- π **Cache Optimization** - Separate preview cache directories for each provider
- βοΈ **Better Performance** - No more regenerating previews when switching providers
### v4.0.4 (2025-08-03) - π οΈ Fish Speech Engine Fix
- π§ **Fixed TTSInferenceEngine initialization** - Use proper ModelManager pattern
- ποΈ **Implemented correct model loading** - Load LLaMA and DAC models separately
- π― **Auto-device detection** - Support for MPS (Apple Silicon), CUDA, and CPU
- π¦ **Better model management** - Use launch_thread_safe_queue for text-to-semantic
- π **Improved generation flow** - Proper model initialization before inference
### v4.0.3 (2025-08-03) - π§ Fish Speech Import Fix
- π§ **Fixed MODDED_DAC import** - Changed to correct DAC import from inference_engine
- β
**Added missing torch import** - Fixed undefined torch reference in generation script
- π οΈ **Simplified dependency check** - Import DAC directly from inference_engine
- π¦ **Better module verification** - Check ServeTTSRequest schema availability
### v4.0.2 (2025-08-03) - π Fish Speech API Update
- π§ **Fixed Fish Speech dependency check** - Updated to use current DAC-based architecture
- ποΈ **Removed deprecated VQGAN imports** - Fish Speech now uses DAC (Descript Audio Codec)
- β
**Updated generation script** - Uses modern TTSInferenceEngine API
- π **Better installation handling** - Auto-removes incomplete installations
- π¦ **Improved pip install** - Installs Fish Speech package in development mode
- π οΈ **Enhanced error reporting** - More detailed debugging information
### v4.0.1 (2025-08-02) - π§ Installation & Compatibility Fixes
- π§ **Fixed Fish Speech virtual environment usage** - Proper dependency checking
- π **Enhanced Python version detection** - Blocks Thorsten-Voice on Python 3.13+
- β
**Smart installation status tracking** - Avoids unnecessary re-installations
- π
**Installation markers** - Persistent installation state with version info
- π **Better error handling** - More informative error messages and recovery
- π‘ **Improved user guidance** - Clear instructions for Python compatibility issues
### v4.0.0 (2025-08-02) - π Major Refactoring
- ποΈ **REMOVED**: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
- π **NEW**: Fish Speech integration - State-of-the-art multilingual TTS
- π©πͺ **NEW**: Thorsten-Voice integration - Native German TTS
- π€ **Enhanced Voice Selection**: 16 total voices across 3 providers
- ποΈ **Automated Installation**: One-click setup for local TTS providers
- π§ **Improved Architecture**: Better service abstraction and error handling
- π **Enhanced Testing**: 80%+ test coverage with Jest
- π οΈ **Code Quality Tools**: ESLint, Prettier, Snyk integration
- π **Backward Compatibility**: 100% compatibility with existing OpenAI workflows
### v3.3.0 (2025-08-01) - π Kyutai Integration (Deprecated)
- π Kyutai TTS integration (now removed in v4.0.0)
- ποΈ Automated installation system
- π€ 15+ voice options
- π Provider selection system
---
**Happy listening! π§** Turn any text into your personal audiobook library with the best TTS technology available.