UNPKG

aiabm

Version:

AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)

431 lines (340 loc) β€’ 16.9 kB
# 🎧 AI Audiobook Maker (AIABM) v5.1.0 [![npm version](https://img.shields.io/npm/v/aiabm.svg)](https://www.npmjs.com/package/aiabm) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js Version](https://img.shields.io/node/v/ai-audiobook-maker.svg)](https://nodejs.org) Transform your PDFs and text files into high-quality audiobooks using **OpenAI TTS** (cloud) or **Thorsten-Voice** (native German). Choose between premium cloud voices or run everything locally at no cost! **πŸ†• New in v5.1:** Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output. ## ✨ Features ### πŸŽ™οΈ **Dual TTS Providers** - **☁️ OpenAI TTS**: Premium cloud voices with 6 voice options (requires API key) - **πŸ‡©πŸ‡ͺ Thorsten-Voice**: Native German TTS with authentic pronunciation (local/free) ### πŸš€ **Core Features** - **πŸš€ Zero Installation**: Run directly with `npx aiabm` - **πŸ“ Smart File Handling**: Supports PDF and TXT files with drag & drop - **🎀 Voice Preview**: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices) - **πŸ”’ Enhanced Security**: Input sanitization, API key validation, and secure storage - **πŸ§ͺ Comprehensive Testing**: 55+ unit tests with 12.6% coverage and growing - **⏸️ Resume & Pause**: Continue interrupted conversions anytime - **πŸ” Secure API Key Management**: Encrypted local storage - **πŸ“Š Progress Tracking**: Real-time conversion progress with estimates - **πŸŽ›οΈ Advanced Controls**: Adjust speed, quality, and output format - **πŸ’° Cost Transparency**: See exact pricing (OpenAI) or run free (local providers) - **πŸ”§ Smart Installation**: Automatic setup for local TTS providers ## πŸš€ Quick Start ### Method 1: Direct Usage (Recommended) ```bash # Convert a specific file npx aiabm mybook.pdf # Interactive mode npx aiabm ``` ### Method 2: Global Installation ```bash npm install -g aiabm aiabm mybook.pdf ``` ## πŸ“‹ Prerequisites ### Required - **Node.js 16+** (Download from [nodejs.org](https://nodejs.org/)) - **FFmpeg** (for audio combining - auto-installed on most systems) ### Optional (Choose One or Both) **For OpenAI TTS:** - OpenAI API key (get from [platform.openai.com](https://platform.openai.com/account/api-keys)) - Costs ~$0.015 per 1,000 characters **For Thorsten-Voice (German TTS):** - Python 3.9-3.11 (auto-installed) - Coqui TTS (auto-installed) - **Completely FREE** - runs locally ## 🎯 Usage Examples ### CLI Mode ```bash # Basic conversion npx aiabm document.pdf # With specific options (OpenAI) npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd # Manage API key npx aiabm --config ``` ### Interactive Mode ```bash npx aiabm ``` Then follow the interactive prompts to: 1. **Select TTS Provider** (OpenAI, Fish Speech, or Thorsten-Voice) 2. **Auto-install local providers** if needed (one-time setup) 3. **Select your file** (browse, drag & drop, or enter path) 4. **Preview and choose a voice** 5. **Configure settings** (speed, quality, output format) 6. **Monitor progress** and resume if needed ## 🎀 Available Voices ### πŸ€– OpenAI TTS (Cloud) - **Alloy**: Neutral, versatile - **Echo**: Clear, professional - **Fable**: Warm, storytelling - **Onyx**: Deep, authoritative - **Nova**: Bright, engaging - **Shimmer**: Gentle, soothing ### 🐟 Fish Speech (Local/Multilingual) - **πŸ‡©πŸ‡ͺ German Female (Natural)**: High-quality German synthesis - **πŸ‡©πŸ‡ͺ German Male (Clear)**: Professional German voice - **πŸ‡©πŸ‡ͺ German Female (Expressive)**: Emotional German narration - **πŸ‡ΊπŸ‡Έ English Female (Warm)**: Natural English voice - **πŸ‡ΊπŸ‡Έ English Male (Professional)**: Business-quality English - **πŸ‡ΊπŸ‡Έ English Female (Energetic)**: Dynamic storytelling - **πŸ‡«πŸ‡· French Female (Elegant)**: Sophisticated French accent - **πŸ‡«πŸ‡· French Male (Sophisticated)**: Professional French voice ### πŸ‡©πŸ‡ͺ Thorsten-Voice (Native German) - **πŸ‡©πŸ‡ͺ Thorsten (Authentic German Male)**: High-quality native German voice - **πŸ‡©πŸ‡ͺ Thorsten Emotional (German Male)**: German voice with emotional expression ## πŸ’° Pricing ### OpenAI TTS **$0.015 per 1,000 characters** | Content Length | Estimated Cost | Example | |----------------|----------------|---------| | 10,000 characters | ~$0.15 | Short article | | 50,000 characters | ~$0.75 | Small e-book | | 100,000 characters | ~$1.50 | Average novel | | 250,000 characters | ~$3.75 | Large book | ### Fish Speech & Thorsten-Voice **100% FREE** - No API costs, runs entirely on your machine! ## πŸ”§ Local TTS Setup Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! **Now with fully automated installation!** ### πŸš€ Smart Installation (Recommended) ```bash npx aiabm # Select "Fish Speech" or "Thorsten-Voice" # Choose "Auto Install (recommended)" # β†’ System automatically downloads and configures everything! ``` ### 🐟 Fish Speech Setup **What happens automatically:** 1. **πŸ“¦ Repository Cloning** - Downloads latest Fish Speech 2. **🐍 Virtual Environment** - Creates isolated Python environment 3. **⚑ PyTorch Installation** - Installs optimized CPU version 4. **πŸ€– Model Download** - Downloads Fish Speech 1.2 models (~1GB) 5. **βœ… Dependency Check** - Verifies installation works **System Requirements:** - **Python 3.8+** recommended - **~2GB disk space** for models and dependencies - **4GB+ RAM** recommended - **CPU or GPU** (GPU faster but optional) ### πŸ‡©πŸ‡ͺ Thorsten-Voice Setup **What happens automatically:** 1. **🐍 Compatible Python Detection** - Finds Python 3.9-3.11 2. **πŸ“¦ Virtual Environment** - Creates isolated environment 3. **🎀 Coqui TTS Installation** - Installs German TTS framework 4. **πŸ€– Thorsten Model** - Downloads German voice model (~500MB) 5. **βœ… Compatibility Check** - Verifies everything works **System Requirements:** - **Python 3.9-3.11** (NOT 3.12+, NOT 3.13+) - **~1GB disk space** for models and dependencies - **2GB+ RAM** recommended **Python Version Issues?** ```bash # Install compatible Python on macOS brew install python@3.11 # On Ubuntu/Debian sudo apt install python3.11 python3.11-venv ``` ### πŸ”§ Installation Status Tracking - **βœ… Smart Detection**: Avoids re-installation if already installed - **πŸ“… Version Tracking**: Shows installation date and version - **πŸ”„ Update Suggestions**: Recommends updates after 30+ days - **πŸ› οΈ Installation Markers**: Persistent installation state ## πŸ”§ Advanced Features ### Resume Interrupted Conversions If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session. ### Multiple Output Formats - **Single File**: One complete audiobook MP3 - **Chapter Files**: Separate MP3 per chunk - **Both**: Get both formats ### Voice Preview Caching Voice previews are cached locally to save API costs and improve performance. ### Smart Text Chunking - Respects sentence boundaries - Preserves chapter structure for PDFs - Configurable chunk sizes (default: 4000 characters) ## πŸ“‚ File Support ### PDF Files - βœ… Up to 50MB - βœ… Text extraction with structure preservation - βœ… Automatic chapter detection ### Text Files - βœ… Up to 1M characters - βœ… UTF-8 encoding - βœ… Automatic formatting cleanup ## πŸ†• What's New in v5.0 ### πŸ”’ **Enhanced Security** - **Input Sanitization**: Prevents code injection and malicious input - **API Key Validation**: Comprehensive security checks for OpenAI keys - **Secure Storage**: Encrypted API key storage with multiple layers - **Environment Assessment**: Automatic security environment analysis ### πŸ§ͺ **Comprehensive Testing** - **55+ Unit Tests**: Extensive test coverage for core functionality - **12.6% Code Coverage**: Growing test suite with focus on critical paths - **Mocked Services**: Fast, reliable tests without external dependencies - **CI/CD Pipeline**: Automated testing on every commit ### πŸ›‘οΈ **Better Error Handling** - **Type-Safe Validation**: Zod schemas for all configuration and data - **Graceful Failures**: Better error messages and recovery mechanisms - **Logging & Monitoring**: Detailed error tracking and user feedback ### 🎯 **Developer Experience** - **GitHub Actions**: Automated CI/CD with security auditing - **ESLint Clean**: Zero linting errors with consistent code style - **Documentation**: Comprehensive inline documentation and examples --- ## βš™οΈ Configuration ### API Key Storage Your OpenAI API key is encrypted and stored locally at: - **macOS/Linux**: `~/.config/ai-audiobook-maker/config.json` - **Windows**: `%APPDATA%\ai-audiobook-maker\config.json` ### Cache Location Voice previews and temporary files: - **macOS/Linux**: `~/.config/ai-audiobook-maker/cache/` - **Windows**: `%APPDATA%\ai-audiobook-maker\cache\` ### Local TTS Installations Local TTS providers are installed to: - **Fish Speech**: `~/.aiabm/fish-speech/` - **Thorsten-Voice**: `~/.aiabm/thorsten-voice/` ## πŸ› οΈ Troubleshooting ### Common Issues **"FFmpeg not found"** ```bash # macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html ``` **"API key invalid"** - Verify your key at [OpenAI Platform](https://platform.openai.com/account/api-keys) - Use `npx aiabm --config` to update your key **"File too large"** - PDFs: Maximum 50MB - Text: Maximum 1M characters - Split large files before conversion **"Fish Speech dependencies missing"** - Check Python version: `python3 --version` - Try restarting the app - Virtual environment issues usually resolve on restart **"Thorsten-Voice requires Python 3.9-3.11"** - Install compatible Python: `brew install python@3.11` - App will automatically detect and use it - Creates separate virtual environment **Voice preview not playing** - macOS: Uses built-in `afplay` - Windows: Uses PowerShell media player - Linux: Requires `ffplay`, `mpv`, `vlc`, or `mplayer` ### Performance Tips - Use `tts-1` model for faster processing - Use `tts-1-hd` for higher quality (slower) - Local TTS providers are free but slower than cloud - Cache clears automatically after 30 days - Resume feature prevents re-processing completed chunks ## πŸ”’ Privacy & Security - API keys are encrypted locally using AES-192 - No data is sent to servers when using local TTS - OpenAI TTS sends only text chunks to OpenAI servers - Cache files are stored locally only - Session data helps resume interrupted conversions - Local TTS models run entirely offline ## πŸ“– Examples ### Converting a PDF Book with German Voice ```bash npx aiabm "Mein Roman.pdf" # Select "Thorsten-Voice" # Choose German voice # Enjoy authentic German pronunciation! ``` ### Interactive Multilingual Setup ```bash npx aiabm # Select "Fish Speech" # Auto-install if needed # Preview German, English, and French voices # Choose your favorite for the content language ``` ### Quick OpenAI Conversion ```bash npx aiabm document.pdf --voice nova --speed 1.1 ``` ## 🀝 Contributing Issues and feature requests welcome at: [GitHub Issues](https://github.com/iamthamanic/AI-Audiobook-Maker/issues) ## πŸ“„ License MIT License - see LICENSE file for details ## πŸ™ Acknowledgments - Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS - Fish Speech: https://github.com/fishaudio/fish-speech - Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice - Coqui TTS: https://github.com/coqui-ai/TTS - Uses FFmpeg for audio processing ## πŸ“ Changelog ### v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational - 🐟 **Fish Speech 100% Working** - Complete resolution of all Fish Speech TTS issues - πŸ”§ **Fixed tokenizer.tiktoken** - Proper base64 encoding of 32,000 tokens from Fish Speech - βš™οΈ **Model Configuration Fixed** - Created correct firefly_gan_vq.yaml matching model architecture - πŸ“ **Dimension Mismatch Resolved** - Fixed 512-dim vs 1024-dim PyTorch tensor issues - βœ… **Parameter Validation Fixed** - Corrected ServeTTSRequest use_memory_cache format - 🎯 **End-to-End Functionality** - Text-to-semantic and decoder models load perfectly - πŸš€ **Full Service Availability** - Fish Speech now detected as available and operational ### v4.0.6 (2025-08-03) - πŸ§ͺ Comprehensive Test Coverage & TTS Fixes - πŸ§ͺ **Major Test Coverage Improvement** - 20% to 45.07% overall coverage (+125% improvement) - 🎯 **AudiobookMaker.js Tests** - 0% to 42.58% coverage with integration tests - πŸ” **ConfigManager.js Tests** - 0% to 98.03% coverage with security tests - πŸ“ **FileHandler.js Tests** - 0% to 72.99% coverage with core functionality tests - πŸ–₯️ **cli.js Tests** - 0% to 75.75% coverage with end-to-end tests - 🐟 **Fish Speech Fixed** - Installation detection and availability checking - πŸ‡©πŸ‡ͺ **Thorsten Voice Fixed** - Python 3.13 compatibility and installation issues - πŸ“Š **207 Total Tests** - 195 passing with comprehensive edge case coverage - πŸ”§ **Integration Tests** - Real-world testing with actual TTS services and PDF processing - πŸ›‘οΈ **Robust Error Handling** - Enhanced service availability validation ### v4.0.5 (2025-08-03) - 🎡 Unified Preview System - 🎡 **Unified Preview Texts** - Consistent voice previews across all TTS providers - 🌍 **Language-Specific Previews** - German, English, and French preview texts - πŸ’Ύ **Smart Caching** - Consistent cache filenames prevent preview regeneration - 🎯 **Voice Language Detection** - Automatic language detection from voice names - πŸ”„ **Cache Optimization** - Separate preview cache directories for each provider - βš™οΈ **Better Performance** - No more regenerating previews when switching providers ### v4.0.4 (2025-08-03) - πŸ› οΈ Fish Speech Engine Fix - πŸ”§ **Fixed TTSInferenceEngine initialization** - Use proper ModelManager pattern - πŸ—οΈ **Implemented correct model loading** - Load LLaMA and DAC models separately - 🎯 **Auto-device detection** - Support for MPS (Apple Silicon), CUDA, and CPU - πŸ“¦ **Better model management** - Use launch_thread_safe_queue for text-to-semantic - πŸ”„ **Improved generation flow** - Proper model initialization before inference ### v4.0.3 (2025-08-03) - πŸ”§ Fish Speech Import Fix - πŸ”§ **Fixed MODDED_DAC import** - Changed to correct DAC import from inference_engine - βœ… **Added missing torch import** - Fixed undefined torch reference in generation script - πŸ› οΈ **Simplified dependency check** - Import DAC directly from inference_engine - πŸ“¦ **Better module verification** - Check ServeTTSRequest schema availability ### v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update - πŸ”§ **Fixed Fish Speech dependency check** - Updated to use current DAC-based architecture - πŸ—‘οΈ **Removed deprecated VQGAN imports** - Fish Speech now uses DAC (Descript Audio Codec) - βœ… **Updated generation script** - Uses modern TTSInferenceEngine API - πŸ”„ **Better installation handling** - Auto-removes incomplete installations - πŸ“¦ **Improved pip install** - Installs Fish Speech package in development mode - πŸ› οΈ **Enhanced error reporting** - More detailed debugging information ### v4.0.1 (2025-08-02) - πŸ”§ Installation & Compatibility Fixes - πŸ”§ **Fixed Fish Speech virtual environment usage** - Proper dependency checking - 🐍 **Enhanced Python version detection** - Blocks Thorsten-Voice on Python 3.13+ - βœ… **Smart installation status tracking** - Avoids unnecessary re-installations - πŸ“… **Installation markers** - Persistent installation state with version info - πŸ”„ **Better error handling** - More informative error messages and recovery - πŸ’‘ **Improved user guidance** - Clear instructions for Python compatibility issues ### v4.0.0 (2025-08-02) - 🌟 Major Refactoring - πŸ—‘οΈ **REMOVED**: Kyutai TTS (replaced due to Python 3.13 compatibility issues) - 🐟 **NEW**: Fish Speech integration - State-of-the-art multilingual TTS - πŸ‡©πŸ‡ͺ **NEW**: Thorsten-Voice integration - Native German TTS - 🎀 **Enhanced Voice Selection**: 16 total voices across 3 providers - πŸ—οΈ **Automated Installation**: One-click setup for local TTS providers - πŸ”§ **Improved Architecture**: Better service abstraction and error handling - πŸ“Š **Enhanced Testing**: 80%+ test coverage with Jest - πŸ› οΈ **Code Quality Tools**: ESLint, Prettier, Snyk integration - πŸ”„ **Backward Compatibility**: 100% compatibility with existing OpenAI workflows ### v3.3.0 (2025-08-01) - πŸš€ Kyutai Integration (Deprecated) - πŸ†“ Kyutai TTS integration (now removed in v4.0.0) - πŸ—οΈ Automated installation system - 🎀 15+ voice options - πŸ”„ Provider selection system --- **Happy listening! 🎧** Turn any text into your personal audiobook library with the best TTS technology available.