aiabm

# 🎧 AI Audiobook Maker (AIABM) v5.1.0 [![npm version](https://img.shields.io/npm/v/aiabm.svg)](https://www.npmjs.com/package/aiabm) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js Version](https://img.shields.io/node/v/ai-audiobook-maker.svg)](https://nodejs.org) Transform your PDFs and text files into high-quality audiobooks using **OpenAI TTS** (cloud) or **Thorsten-Voice** (native German). Choose between premium cloud voices or run everything locally at no cost! **🆕 New in v5.1:** Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output. ## ✨ Features ### 🎙️ **Dual TTS Providers** - **☁️ OpenAI TTS**: Premium cloud voices with 6 voice options (requires API key) - **🇩🇪 Thorsten-Voice**: Native German TTS with authentic pronunciation (local/free) ### 🚀 **Core Features** - **🚀 Zero Installation**: Run directly with `npx aiabm` - **📁 Smart File Handling**: Supports PDF and TXT files with drag & drop - **🎤 Voice Preview**: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices) - **🔒 Enhanced Security**: Input sanitization, API key validation, and secure storage - **🧪 Comprehensive Testing**: 55+ unit tests with 12.6% coverage and growing - **⏸️ Resume & Pause**: Continue interrupted conversions anytime - **🔐 Secure API Key Management**: Encrypted local storage - **📊 Progress Tracking**: Real-time conversion progress with estimates - **🎛️ Advanced Controls**: Adjust speed, quality, and output format - **💰 Cost Transparency**: See exact pricing (OpenAI) or run free (local providers) - **🔧 Smart Installation**: Automatic setup for local TTS providers ## 🚀 Quick Start ### Method 1: Direct Usage (Recommended) ```bash # Convert a specific file npx aiabm mybook.pdf # Interactive mode npx aiabm ``` ### Method 2: Global Installation ```bash npm install -g aiabm aiabm mybook.pdf ``` ## 📋 Prerequisites ### Required - **Node.js 16+** (Download from [nodejs.org](https://nodejs.org/)) - **FFmpeg** (for audio combining - auto-installed on most systems) ### Optional (Choose One or Both) **For OpenAI TTS:** - OpenAI API key (get from [platform.openai.com](https://platform.openai.com/account/api-keys)) - Costs ~$0.015 per 1,000 characters **For Thorsten-Voice (German TTS):** - Python 3.9-3.11 (auto-installed) - Coqui TTS (auto-installed) - **Completely FREE** - runs locally ## 🎯 Usage Examples ### CLI Mode ```bash # Basic conversion npx aiabm document.pdf # With specific options (OpenAI) npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd # Manage API key npx aiabm --config ``` ### Interactive Mode ```bash npx aiabm ``` Then follow the interactive prompts to: 1. **Select TTS Provider** (OpenAI, Fish Speech, or Thorsten-Voice) 2. **Auto-install local providers** if needed (one-time setup) 3. **Select your file** (browse, drag & drop, or enter path) 4. **Preview and choose a voice** 5. **Configure settings** (speed, quality, output format) 6. **Monitor progress** and resume if needed ## 🎤 Available Voices ### 🤖 OpenAI TTS (Cloud) - **Alloy**: Neutral, versatile - **Echo**: Clear, professional - **Fable**: Warm, storytelling - **Onyx**: Deep, authoritative - **Nova**: Bright, engaging - **Shimmer**: Gentle, soothing ### 🐟 Fish Speech (Local/Multilingual) - **🇩🇪 German Female (Natural)**: High-quality German synthesis - **🇩🇪 German Male (Clear)**: Professional German voice - **🇩🇪 German Female (Expressive)**: Emotional German narration - **🇺🇸 English Female (Warm)**: Natural English voice - **🇺🇸 English Male (Professional)**: Business-quality English - **🇺🇸 English Female (Energetic)**: Dynamic storytelling - **🇫🇷 French Female (Elegant)**: Sophisticated French accent - **🇫🇷 French Male (Sophisticated)**: Professional French voice ### 🇩🇪 Thorsten-Voice (Native German) - **🇩🇪 Thorsten (Authentic German Male)**: High-quality native German voice - **🇩🇪 Thorsten Emotional (German Male)**: German voice with emotional expression ## 💰 Pricing ### OpenAI TTS **$0.015 per 1,000 characters** | Content Length | Estimated Cost | Example | |----------------|----------------|---------| | 10,000 characters | ~$0.15 | Short article | | 50,000 characters | ~$0.75 | Small e-book | | 100,000 characters | ~$1.50 | Average novel | | 250,000 characters | ~$3.75 | Large book | ### Fish Speech & Thorsten-Voice **100% FREE** - No API costs, runs entirely on your machine! ## 🔧 Local TTS Setup Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! **Now with fully automated installation!** ### 🚀 Smart Installation (Recommended) ```bash npx aiabm # Select "Fish Speech" or "Thorsten-Voice" # Choose "Auto Install (recommended)" # → System automatically downloads and configures everything! ``` ### 🐟 Fish Speech Setup **What happens automatically:** 1. **📦 Repository Cloning** - Downloads latest Fish Speech 2. **🐍 Virtual Environment** - Creates isolated Python environment 3. **⚡ PyTorch Installation** - Installs optimized CPU version 4. **🤖 Model Download** - Downloads Fish Speech 1.2 models (~1GB) 5. **✅ Dependency Check** - Verifies installation works **System Requirements:** - **Python 3.8+** recommended - **~2GB disk space** for models and dependencies - **4GB+ RAM** recommended - **CPU or GPU** (GPU faster but optional) ### 🇩🇪 Thorsten-Voice Setup **What happens automatically:** 1. **🐍 Compatible Python Detection** - Finds Python 3.9-3.11 2. **📦 Virtual Environment** - Creates isolated environment 3. **🎤 Coqui TTS Installation** - Installs German TTS framework 4. **🤖 Thorsten Model** - Downloads German voice model (~500MB) 5. **✅ Compatibility Check** - Verifies everything works **System Requirements:** - **Python 3.9-3.11** (NOT 3.12+, NOT 3.13+) - **~1GB disk space** for models and dependencies - **2GB+ RAM** recommended **Python Version Issues?** ```bash # Install compatible Python on macOS brew install python@3.11 # On Ubuntu/Debian sudo apt install python3.11 python3.11-venv ``` ### 🔧 Installation Status Tracking - **✅ Smart Detection**: Avoids re-installation if already installed - **📅 Version Tracking**: Shows installation date and version - **🔄 Update Suggestions**: Recommends updates after 30+ days - **🛠️ Installation Markers**: Persistent installation state ## 🔧 Advanced Features ### Resume Interrupted Conversions If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session. ### Multiple Output Formats - **Single File**: One complete audiobook MP3 - **Chapter Files**: Separate MP3 per chunk - **Both**: Get both formats ### Voice Preview Caching Voice previews are cached locally to save API costs and improve performance. ### Smart Text Chunking - Respects sentence boundaries - Preserves chapter structure for PDFs - Configurable chunk sizes (default: 4000 characters) ## 📂 File Support ### PDF Files - ✅ Up to 50MB - ✅ Text extraction with structure preservation - ✅ Automatic chapter detection ### Text Files - ✅ Up to 1M characters - ✅ UTF-8 encoding - ✅ Automatic formatting cleanup ## 🆕 What's New in v5.0 ### 🔒 **Enhanced Security** - **Input Sanitization**: Prevents code injection and malicious input - **API Key Validation**: Comprehensive security checks for OpenAI keys - **Secure Storage**: Encrypted API key storage with multiple layers - **Environment Assessment**: Automatic security environment analysis ### 🧪 **Comprehensive Testing** - **55+ Unit Tests**: Extensive test coverage for core functionality - **12.6% Code Coverage**: Growing test suite with focus on critical paths - **Mocked Services**: Fast, reliable tests without external dependencies - **CI/CD Pipeline**: Automated testing on every commit ### 🛡️ **Better Error Handling** - **Type-Safe Validation**: Zod schemas for all configuration and data - **Graceful Failures**: Better error messages and recovery mechanisms - **Logging & Monitoring**: Detailed error tracking and user feedback ### 🎯 **Developer Experience** - **GitHub Actions**: Automated CI/CD with security auditing - **ESLint Clean**: Zero linting errors with consistent code style - **Documentation**: Comprehensive inline documentation and examples --- ## ⚙️ Configuration ### API Key Storage Your OpenAI API key is encrypted and stored locally at: - **macOS/Linux**: `~/.config/ai-audiobook-maker/config.json` - **Windows**: `%APPDATA%\ai-audiobook-maker\config.json` ### Cache Location Voice previews and temporary files: - **macOS/Linux**: `~/.config/ai-audiobook-maker/cache/` - **Windows**: `%APPDATA%\ai-audiobook-maker\cache\` ### Local TTS Installations Local TTS providers are installed to: - **Fish Speech**: `~/.aiabm/fish-speech/` - **Thorsten-Voice**: `~/.aiabm/thorsten-voice/` ## 🛠️ Troubleshooting ### Common Issues **"FFmpeg not found"** ```bash # macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html ``` **"API key invalid"** - Verify your key at [OpenAI Platform](https://platform.openai.com/account/api-keys) - Use `npx aiabm --config` to update your key **"File too large"** - PDFs: Maximum 50MB - Text: Maximum 1M characters - Split large files before conversion **"Fish Speech dependencies missing"** - Check Python version: `python3 --version` - Try restarting the app - Virtual environment issues usually resolve on restart **"Thorsten-Voice requires Python 3.9-3.11"** - Install compatible Python: `brew install python@3.11` - App will automatically detect and use it - Creates separate virtual environment **Voice preview not playing** - macOS: Uses built-in `afplay` - Windows: Uses PowerShell media player - Linux: Requires `ffplay`, `mpv`, `vlc`, or `mplayer` ### Performance Tips - Use `tts-1` model for faster processing - Use `tts-1-hd` for higher quality (slower) - Local TTS providers are free but slower than cloud - Cache clears automatically after 30 days - Resume feature prevents re-processing completed chunks ## 🔒 Privacy & Security - API keys are encrypted locally using AES-192 - No data is sent to servers when using local TTS - OpenAI TTS sends only text chunks to OpenAI servers - Cache files are stored locally only - Session data helps resume interrupted conversions - Local TTS models run entirely offline ## 📖 Examples ### Converting a PDF Book with German Voice ```bash npx aiabm "Mein Roman.pdf" # Select "Thorsten-Voice" # Choose German voice # Enjoy authentic German pronunciation! ``` ### Interactive Multilingual Setup ```bash npx aiabm # Select "Fish Speech" # Auto-install if needed # Preview German, English, and French voices # Choose your favorite for the content language ``` ### Quick OpenAI Conversion ```bash npx aiabm document.pdf --voice nova --speed 1.1 ``` ## 🤝 Contributing Issues and feature requests welcome at: [GitHub Issues](https://github.com/iamthamanic/AI-Audiobook-Maker/issues) ## 📄 License MIT License - see LICENSE file for details ## 🙏 Acknowledgments - Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS - Fish Speech: https://github.com/fishaudio/fish-speech - Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice - Coqui TTS: https://github.com/coqui-ai/TTS - Uses FFmpeg for audio processing ## 📝 Changelog ### v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational - 🐟 **Fish Speech 100% Working** - Complete resolution of all Fish Speech TTS issues - 🔧 **Fixed tokenizer.tiktoken** - Proper base64 encoding of 32,000 tokens from Fish Speech - ⚙️ **Model Configuration Fixed** - Created correct firefly_gan_vq.yaml matching model architecture - 📐 **Dimension Mismatch Resolved** - Fixed 512-dim vs 1024-dim PyTorch tensor issues - ✅ **Parameter Validation Fixed** - Corrected ServeTTSRequest use_memory_cache format - 🎯 **End-to-End Functionality** - Text-to-semantic and decoder models load perfectly - 🚀 **Full Service Availability** - Fish Speech now detected as available and operational ### v4.0.6 (2025-08-03) - 🧪 Comprehensive Test Coverage & TTS Fixes - 🧪 **Major Test Coverage Improvement** - 20% to 45.07% overall coverage (+125% improvement) - 🎯 **AudiobookMaker.js Tests** - 0% to 42.58% coverage with integration tests - 🔐 **ConfigManager.js Tests** - 0% to 98.03% coverage with security tests - 📁 **FileHandler.js Tests** - 0% to 72.99% coverage with core functionality tests - 🖥️ **cli.js Tests** - 0% to 75.75% coverage with end-to-end tests - 🐟 **Fish Speech Fixed** - Installation detection and availability checking - 🇩🇪 **Thorsten Voice Fixed** - Python 3.13 compatibility and installation issues - 📊 **207 Total Tests** - 195 passing with comprehensive edge case coverage - 🔧 **Integration Tests** - Real-world testing with actual TTS services and PDF processing - 🛡️ **Robust Error Handling** - Enhanced service availability validation ### v4.0.5 (2025-08-03) - 🎵 Unified Preview System - 🎵 **Unified Preview Texts** - Consistent voice previews across all TTS providers - 🌍 **Language-Specific Previews** - German, English, and French preview texts - 💾 **Smart Caching** - Consistent cache filenames prevent preview regeneration - 🎯 **Voice Language Detection** - Automatic language detection from voice names - 🔄 **Cache Optimization** - Separate preview cache directories for each provider - ⚙️ **Better Performance** - No more regenerating previews when switching providers ### v4.0.4 (2025-08-03) - 🛠️ Fish Speech Engine Fix - 🔧 **Fixed TTSInferenceEngine initialization** - Use proper ModelManager pattern - 🏗️ **Implemented correct model loading** - Load LLaMA and DAC models separately - 🎯 **Auto-device detection** - Support for MPS (Apple Silicon), CUDA, and CPU - 📦 **Better model management** - Use launch_thread_safe_queue for text-to-semantic - 🔄 **Improved generation flow** - Proper model initialization before inference ### v4.0.3 (2025-08-03) - 🔧 Fish Speech Import Fix - 🔧 **Fixed MODDED_DAC import** - Changed to correct DAC import from inference_engine - ✅ **Added missing torch import** - Fixed undefined torch reference in generation script - 🛠️ **Simplified dependency check** - Import DAC directly from inference_engine - 📦 **Better module verification** - Check ServeTTSRequest schema availability ### v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update - 🔧 **Fixed Fish Speech dependency check** - Updated to use current DAC-based architecture - 🗑️ **Removed deprecated VQGAN imports** - Fish Speech now uses DAC (Descript Audio Codec) - ✅ **Updated generation script** - Uses modern TTSInferenceEngine API - 🔄 **Better installation handling** - Auto-removes incomplete installations - 📦 **Improved pip install** - Installs Fish Speech package in development mode - 🛠️ **Enhanced error reporting** - More detailed debugging information ### v4.0.1 (2025-08-02) - 🔧 Installation & Compatibility Fixes - 🔧 **Fixed Fish Speech virtual environment usage** - Proper dependency checking - 🐍 **Enhanced Python version detection** - Blocks Thorsten-Voice on Python 3.13+ - ✅ **Smart installation status tracking** - Avoids unnecessary re-installations - 📅 **Installation markers** - Persistent installation state with version info - 🔄 **Better error handling** - More informative error messages and recovery - 💡 **Improved user guidance** - Clear instructions for Python compatibility issues ### v4.0.0 (2025-08-02) - 🌟 Major Refactoring - 🗑️ **REMOVED**: Kyutai TTS (replaced due to Python 3.13 compatibility issues) - 🐟 **NEW**: Fish Speech integration - State-of-the-art multilingual TTS - 🇩🇪 **NEW**: Thorsten-Voice integration - Native German TTS - 🎤 **Enhanced Voice Selection**: 16 total voices across 3 providers - 🏗️ **Automated Installation**: One-click setup for local TTS providers - 🔧 **Improved Architecture**: Better service abstraction and error handling - 📊 **Enhanced Testing**: 80%+ test coverage with Jest - 🛠️ **Code Quality Tools**: ESLint, Prettier, Snyk integration - 🔄 **Backward Compatibility**: 100% compatibility with existing OpenAI workflows ### v3.3.0 (2025-08-01) - 🚀 Kyutai Integration (Deprecated) - 🆓 Kyutai TTS integration (now removed in v4.0.0) - 🏗️ Automated installation system - 🎤 15+ voice options - 🔄 Provider selection system --- **Happy listening! 🎧** Turn any text into your personal audiobook library with the best TTS technology available.