UNPKG

agentvibes

Version:

Now your AI Agents can finally talk back! Professional TTS voice for Claude Code and Claude Desktop (via MCP) with multi-provider support.

255 lines (207 loc) 8.34 kB
# Voice Registration Fix - Extra Piper Voices ## Problems Identified ### 1. **Download Issues** - The `tracy.onnx` download only got 15 bytes (redirect/error page) - **Root Cause**: The HuggingFace file is actually named `16Speakers.onnx`, not `tracy.onnx` - **Fix**: Updated download script to use correct filename `16Speakers.onnx` ### 2. **Voice Registration Issues** When users download custom Piper voices (jenny, kristin, 16Speakers), they cannot switch to them because: **Problem**: The voice-manager.sh uses two different lookup methods: 1. **ElevenLabs voices**: Uses `voices-config.sh` associative array 2. **Piper voices**: Scans `.onnx` files in the voice storage directory **Current voice-manager.sh logic (line 265-269)**: ```bash # If using Piper and voice name looks like a Piper model (contains underscore and dash) # then skip ElevenLabs voice validation if [[ "$ACTIVE_PROVIDER" == "piper" ]] && [[ "$VOICE_NAME" == *"_"*"-"* ]]; then # This is a Piper model name, use it directly FOUND="$VOICE_NAME" ``` **Issue**: Custom voice names like `jenny`, `kristin`, and `16Speakers` DON'T contain underscore and dash, so they fail the Piper detection and fall through to ElevenLabs lookup, which fails. **Why it fails**: - `jenny` → No underscore/dash → Falls to ElevenLabs lookup → Not found → Error - `kristin` → No underscore/dash → Falls to ElevenLabs lookup → Not found → Error - `16Speakers` → No underscore/dash → Falls to ElevenLabs lookup → Not found → Error **Why standard Piper voices work**: - `en_US-lessac-medium` → Contains `_` and `-` → Passes Piper check → Works! - `en_GB-alan-medium` → Contains `_` and `-` → Passes Piper check → Works! ### 3. **Multi-Speaker Voice Support (16Speakers)** The `16Speakers.onnx` is a special multi-speaker model containing 16 different voices: ```json { "speaker_id_map": { "Cori_Samuel": 0, "Kara_Shallenberg": 1, "Kristin_Hughes": 2, "Maria_Kasper": 3, "Mike_Pelton": 4, "Mark_Nelson": 5, "Michael_Scherer": 6, "James_K_White": 7, "Rose_Ibex": 8, "progressingamerica": 9, "Steve_C": 10, "Owlivia": 11, "Paul_Hampton": 12, "Jennifer_Dorr": 13, "Emily_Cripps": 14, "Martin_Clifton": 15 } } ``` **Usage**: Users should be able to say: - `/agent-vibes:switch Cori_Samuel` → Uses `16Speakers.onnx` with speaker ID 0 - `/agent-vibes:switch Rose_Ibex` → Uses `16Speakers.onnx` with speaker ID 8 **Problem**: Currently no system to: 1. Parse multi-speaker voice names 2. Pass speaker ID to Piper TTS 3. Register individual speaker names from multi-speaker models ## Solution Design ### Phase 1: Fix Basic Custom Voice Registration **Approach**: Enhance `voice-manager.sh` to check for custom Piper voices by scanning the voice directory. **Modified logic**: ```bash # 1. Check if it's a number (for ElevenLabs numbered selection) if [[ "$VOICE_NAME" =~ ^[0-9]+$ ]]; then # ... existing numbered selection code ... # 2. NEW: Check if it's a Piper voice (scan voice directory) elif [[ "$ACTIVE_PROVIDER" == "piper" ]]; then source "$SCRIPT_DIR/piper-voice-manager.sh" VOICE_DIR=$(get_voice_storage_dir) # Check if voice file exists (case-insensitive) FOUND="" for onnx_file in "$VOICE_DIR"/*.onnx; do if [[ -f "$onnx_file" ]]; then voice=$(basename "$onnx_file" .onnx) if [[ "${voice,,}" == "${VOICE_NAME,,}" ]]; then FOUND="$voice" break fi fi done if [[ -z "$FOUND" ]]; then echo "❌ Piper voice not found: $VOICE_NAME" echo "" echo "Available Piper voices:" for onnx_file in "$VOICE_DIR"/*.onnx; do [[ -f "$onnx_file" ]] && echo " - $(basename "$onnx_file" .onnx)" done | sort exit 1 fi # 3. Fall back to ElevenLabs lookup else # ... existing ElevenLabs lookup code ... fi ``` ### Phase 2: Multi-Speaker Support **1. Create Multi-Speaker Registry** New file: `.claude/hooks/piper-multispeaker-registry.sh` ```bash # Registry of multi-speaker models and their speaker names # Format: "SpeakerName:model_file:speaker_id" MULTISPEAKER_VOICES=( "Cori_Samuel:16Speakers:0" "Kara_Shallenberg:16Speakers:1" "Kristin_Hughes:16Speakers:2" "Maria_Kasper:16Speakers:3" "Mike_Pelton:16Speakers:4" "Mark_Nelson:16Speakers:5" "Michael_Scherer:16Speakers:6" "James_K_White:16Speakers:7" "Rose_Ibex:16Speakers:8" "progressingamerica:16Speakers:9" "Steve_C:16Speakers:10" "Owlivia:16Speakers:11" "Paul_Hampton:16Speakers:12" "Jennifer_Dorr:16Speakers:13" "Emily_Cripps:16Speakers:14" "Martin_Clifton:16Speakers:15" ) # Get model and speaker ID for a speaker name get_multispeaker_info() { local speaker_name="$1" for entry in "${MULTISPEAKER_VOICES[@]}"; do name="${entry%%:*}" rest="${entry#*:}" model="${rest%%:*}" speaker_id="${rest#*:}" if [[ "${name,,}" == "${speaker_name,,}" ]]; then echo "$model:$speaker_id" return 0 fi done return 1 } ``` **2. Update voice-manager.sh** ```bash # After checking standard Piper voices, check multi-speaker registry if [[ -z "$FOUND" ]] && [[ "$ACTIVE_PROVIDER" == "piper" ]]; then source "$SCRIPT_DIR/piper-multispeaker-registry.sh" MULTISPEAKER_INFO=$(get_multispeaker_info "$VOICE_NAME") if [[ -n "$MULTISPEAKER_INFO" ]]; then MODEL="${MULTISPEAKER_INFO%%:*}" SPEAKER_ID="${MULTISPEAKER_INFO#*:}" # Store as "SpeakerName" in tts-voice.txt # Store model and speaker ID separately for play-tts-piper.sh to use echo "$VOICE_NAME" > "$VOICE_FILE" echo "$MODEL" > "$CLAUDE_DIR/tts-piper-model.txt" echo "$SPEAKER_ID" > "$CLAUDE_DIR/tts-piper-speaker-id.txt" echo "✅ Multi-speaker voice switched to: $VOICE_NAME" echo "🎤 Model: $MODEL (Speaker ID: $SPEAKER_ID)" exit 0 fi fi ``` **3. Update play-tts-piper.sh** ```bash # Check if this is a multi-speaker voice SPEAKER_ID_FILE="$CLAUDE_DIR/tts-piper-speaker-id.txt" MODEL_FILE="$CLAUDE_DIR/tts-piper-model.txt" if [[ -f "$SPEAKER_ID_FILE" ]] && [[ -f "$MODEL_FILE" ]]; then # Use multi-speaker model PIPER_MODEL=$(cat "$MODEL_FILE") SPEAKER_ID=$(cat "$SPEAKER_ID_FILE") # Get model path VOICE_PATH=$(get_voice_path "$PIPER_MODEL") # Pass speaker ID to Piper echo "$TEXT" | piper \ --model "$VOICE_PATH" \ --speaker "$SPEAKER_ID" \ --output_file "$OUTPUT_FILE" 2>&1 else # Standard single-speaker voice # ... existing code ... fi ``` ## Implementation Plan ### Step 1: Fix download script filename ✅ - [x] Change `tracy.onnx` to `16Speakers.onnx` in download-extra-voices.sh ### Step 2: Fix basic custom voice switching - [ ] Update `voice-manager.sh` switch logic to scan voice directory for Piper voices - [ ] Test switching to `jenny`, `kristin`, `16Speakers` ### Step 3: Implement multi-speaker support - [ ] Create `piper-multispeaker-registry.sh` - [ ] Update `voice-manager.sh` to handle multi-speaker lookups - [ ] Update `play-tts-piper.sh` to pass speaker ID to Piper - [ ] Test switching to individual speakers (e.g., `Cori_Samuel`, `Rose_Ibex`) ### Step 4: Update voice listing - [ ] Make `/agent-vibes:list` show all custom voices (jenny, kristin, 16 speaker names) - [ ] Organize output: "Standard Voices", "Custom Voices", "Multi-Speaker Voices" ### Step 5: MCP Integration - [ ] Update MCP server to expose multi-speaker voice info - [ ] Add `list_multispeaker_voices()` method to show available speakers ## Testing Checklist - [ ] Download extra voices: `mcp__agentvibes__download_extra_voices()` - [ ] Switch to jenny: `/agent-vibes:switch jenny` - [ ] Switch to kristin: `/agent-vibes:switch kristin` - [ ] Switch to 16Speakers: `/agent-vibes:switch 16Speakers` - [ ] Switch to Cori_Samuel: `/agent-vibes:switch Cori_Samuel` - [ ] Switch to Rose_Ibex: `/agent-vibes:switch Rose_Ibex` - [ ] List all voices: `/agent-vibes:list` (should show all custom + multi-speaker) - [ ] Play TTS with each voice to verify audio works ## Future Enhancements 1. **Auto-registration**: When downloading 16Speakers, auto-register all 16 speaker names 2. **Voice preview**: Add preview support for multi-speaker voices 3. **Voice metadata**: Store speaker descriptions (gender, accent, etc.) in registry 4. **Dynamic discovery**: Auto-scan .onnx.json files for speaker_id_map and register dynamically