UNPKG

tree-of-thought-cli

Version:

Tree of Thought CLI - Install /tot command for Claude Code

439 lines (327 loc) β€’ 15.2 kB
--- name: tot description: "Tree of Thought - Systematic problem solving through structured exploration" --- # /tot - Tree of Thought Framework **CRITICAL: You MUST follow the OUTPUT_FORMAT.md specification exactly. Display ALL thoughts with FULL content at each level.** Read and strictly follow: `~/.claude/tot/OUTPUT_FORMAT.md` ## πŸš€ STEP 0: MODE SELECTION (Execute FIRST) **Select execution mode based on user request:** 1. **User forced Claude-Only** (`-c` flag)? β†’ Go to **Step 1A** 2. **Otherwise** β†’ Attempt **Hybrid Mode** (Step 1B) - Codex MCP call happens automatically in Phase 2 - Auto-fallback to Claude if Codex fails - No pre-check needed! --- ## 🌐 STEP 0.5: LANGUAGE DETECTION (Execute SECOND) **Automatically detect input language and adapt all outputs accordingly:** ### Language Detection Rules Analyze the user's problem description: ```python def detect_language(problem_text): # Check for Korean characters (Hangul) has_korean = any('\uac00' <= char <= '\ud7a3' for char in problem_text) if has_korean: return "Korean" # ν•œκ΅­μ–΄ else: return "English" ``` ### Output Language Adaptation **If language is Korean (ν•œκ΅­μ–΄):** - All thought content β†’ Korean - All evaluations β†’ Korean - All conclusions β†’ Korean - Framework labels β†’ English (πŸ“ Level 0, βœ… Final Conclusion, etc.) **If language is English:** - All thought content β†’ English - All evaluations β†’ English - All conclusions β†’ English - Framework labels β†’ English ### Codex Prompt Language Variable Update the Codex MCP prompt with detected language: ```markdown **Korean input detected:** - Write all text and reasoning in Korean (ν•œκ΅­μ–΄) **English input detected:** - Write all text and reasoning in English ``` **Examples:** ```bash # Korean input /tot "λ©”λͺ¨λ¦¬ λˆ„μˆ˜ - 1μ‹œκ°„μ— 50MB 증가" β†’ Language: Korean β†’ All outputs in ν•œκ΅­μ–΄ # English input /tot "Memory leak - grows 50MB per hour" β†’ Language: English β†’ All outputs in English # Mixed (Korean present) /tot "Memory leak λ©”λͺ¨λ¦¬ 문제" β†’ Language: Korean (ν•œκΈ€ detected) β†’ ν•œκ΅­μ–΄ ``` --- ## Execution Instructions ### STEP 1A: Claude-Only Mode Execution When in Claude-Only mode (either forced via `-c` or auto-fallback): 1. **Read OUTPUT_FORMAT.md first** - This defines the exact output structure 2. **Display complete header** with problem description 3. **Generate 5 thoughts** using self-response (all marked as [Claude]) 4. **Evaluate all 5 thoughts** independently 5. **Select top 3** for further exploration 6. **Present final solution path** with all steps β†’ Skip to "Required Output Structure" section below --- ### STEP 1B: Hybrid Mode Execution (Parallel Optimization Protocol) When in Hybrid mode (Codex MCP available): **πŸš€ CRITICAL: PARALLEL EXECUTION - Start BOTH simultaneously!** **PHASE 1: Parallel Thought Generation (Claude + Codex simultaneously)** 1. **IMMEDIATELY generate 3 Claude thoughts** using self-response - Output them as soon as generated (don't wait for Codex) - Mark each as [Claude] 2. **AT THE SAME TIME, call mcp__codex__codex tool** for 2 technical thoughts - ⚠️ **MANDATORY**: You MUST actually call the mcp__codex__codex tool - Do NOT skip this step - Do NOT simulate Codex responses yourself **Exact tool call format:** ``` mcp__codex__codex( prompt="""You are a technical problem-solving expert. Analyze this problem and generate 2 distinct technical solution approaches. # Problem [Insert user's actual problem description here] # Your Task Generate 2 different technical approaches focusing on: - Deep technical analysis - Algorithm optimization - System design perspectives - Performance considerations - Implementation details For each approach, provide: 1. Approach name 2. Core idea 3. Technical details 4. Expected performance/impact 5. Implementation complexity considerations # Output Requirements Return ONLY a JSON object in this exact format: { "thoughts": [ { "id": "codex_1", "text": "First technical approach full explanation (detailed - minimum 5-6 sentences)", "reasoning": "Technical rationale and expected impact for this approach" }, { "id": "codex_2", "text": "Second technical approach full explanation (detailed - minimum 5-6 sentences)", "reasoning": "Technical rationale and expected impact for this approach" } ] } **CRITICAL**: - Return ONLY valid JSON with no additional text before or after - **Write all text and reasoning in [DETECTED_LANGUAGE]**: - If problem is in Korean β†’ Korean (ν•œκ΅­μ–΄) - If problem is in English β†’ English - Provide detailed technical depth in each thought """ ) ``` 3. **When Codex responds**, parse the JSON and output the 2 Codex thoughts - Mark as [Codex] - If Codex fails: Generate 2 additional Claude thoughts as fallback **PHASE 2: Evaluation** 4. **Evaluate all 5 thoughts** (3 Claude + 2 Codex/Claude-fallback) 5. **Select top 3** for further exploration 6. **Present final solution path** with all steps **⚠️ Self-Validation Checkpoint (BEFORE evaluation):** - [ ] Did I generate 3 Claude thoughts and output them? - [ ] Did I ACTUALLY CALL mcp__codex__codex tool (not simulate)? - [ ] Did I receive and parse 2 Codex thoughts (or fallback)? - [ ] Total thought count = 5? - [ ] Are thoughts 4 and 5 marked as [Codex] (or [Claude] if fallback)? **If ANY checkbox is unchecked β†’ STOP and fix before continuing!** --- ## STEP 2: Required Output Structure (Both Modes) ### Required Output Structure ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 🌳 Tree of Thought: [Problem Description] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ πŸ“ Level 0: Initial Thoughts (n_generate=5) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Thought 1 [Claude]: [Title] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ [FULL detailed content explaining the approach] β”‚ β”‚ β”‚ β”‚ [Specific actions or checks to perform] β”‚ β”‚ β€’ Point 1 β”‚ β”‚ β€’ Point 2 β”‚ β”‚ β€’ Point 3 β”‚ β”‚ β”‚ β”‚ Verification method: [Command or approach] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ [... Repeat for ALL 5 thoughts with FULL content] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ πŸ“Š Level 1: Evaluation (n_evaluate=3) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Evaluating Thought 1 [Claude]... Eval 1: 8.5/10 β†’ [Specific reason] Eval 2: 9.0/10 β†’ [Specific reason] Eval 3: 8.7/10 β†’ [Specific reason] ──────────────── Average: 8.7/10 ⭐ (Confidence: 95%) [... Repeat for ALL 5 thoughts] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎯 Level 2: Selection (n_select=3) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Selected Top 3 Thoughts: βœ“ Thought 2 [Codex] - 9.1/10: [Title] βœ“ Thought 1 [Claude] - 8.7/10: [Title] βœ“ Thought 4 [Codex] - 8.3/10: [Title] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ βœ… Final Conclusion ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Solution Path (3 steps): 1. [9.1] [Title] βœ… 2. [9.5] [Refined approach] βœ… 3. [9.7] [Final solution] βœ… Overall Score: 9.4/10 ⭐⭐⭐⭐⭐ [Final verdict and recommendation] Key Findings: - [Finding 1] - [Finding 2] - [Finding 3] πŸš€ [Call to action or next steps] ``` ## Princeton ToT Methodology ### Default Parameters ```yaml n_generate: 5 # Generate 5 thoughts per level n_evaluate: 3 # Evaluate each thought 3 times n_select: 3 # Keep top 3 for next level algorithm: BFS # Breadth-first search ratio: "3:2" # Claude:Codex ratio (3 Claude, 2 Codex) max_depth: 3 # Maximum search depth confidence: 9.0 # Early stopping threshold ``` ### Hybrid Mode (Claude + Codex) **Generation:** - Claude thoughts (3): Practical, user-focused, quick solutions - Codex thoughts (2): Technical depth, algorithm optimization, system design **Evaluation:** - Cross-evaluation: Claude evaluates Codex, Codex evaluates Claude - Each thought gets 3 independent evaluations - Confidence calculated from evaluation consistency **When Codex MCP is available:** Use `mcp__codex__codex` tool for direct Codex integration. See `~/.claude/tot/core/codex-mcp-integration.md` ### Codex MCP Connection Status **IMPORTANT: Automatic Fallback System** When ToT initializes in Hybrid mode, it automatically checks Codex MCP availability: **βœ… Connection Successful:** ``` βœ… Hybrid λͺ¨λ“œ - Codex MCP 연결됨 Claude 3 + Codex 2 (ratio 3:2) ``` - Full Hybrid mode with both Claude and Codex thoughts - Direct MCP call for faster execution - Expected execution time: **30-45 seconds** (optimized!) **⚠️ Connection Failed:** ``` ⚠️ Codex MCP 호좜 μ‹€νŒ¨ β†’ Claude둜 λŒ€μ²΄ 생성 (3 Claude + 2 Claude-fallback = 5 thoughts) ``` - Automatic fallback to Claude for failed Codex thoughts - All 5 thoughts generated by Claude - Execution time: ~25-30 seconds - No loss of functionality, only reduced technical depth **Error Recovery:** - Codex MCP calls have **1 automatic retry with 3-second delay** - If retry fails, those 2 thoughts fallback to Claude immediately - User is notified with clear status messages - Execution continues seamlessly without manual intervention **Manual Mode Selection:** - `/tot -c "problem"` β†’ Force Claude-only mode (skip Codex check) - `/tot -x "problem"` β†’ Force Codex-only mode (fail if unavailable) - `/tot "problem"` β†’ Auto-detect mode with fallback (recommended) ## Problem Types The framework automatically detects and handles: - **Debug**: Bug analysis and root cause identification - **Refactor**: Code restructuring and improvement strategies - **Design**: Architecture and system design decisions - **Optimize**: Performance and efficiency improvements - **Custom**: Any problem requiring systematic exploration ## Algorithm Selection ### BFS (Breadth-First Search) - Default - Explores all options at each level before going deeper - Guarantees finding optimal solution within depth limit - Best for: Comprehensive exploration, finding multiple solutions ### DFS (Depth-First Search) - Dives deep into promising paths with backtracking - Lower memory usage, faster for deep problems - Best for: Complex problems requiring deep analysis **Selection criteria:** - Use BFS for broad exploration (debugging, design choices) - Use DFS for deep technical analysis (algorithm optimization) ## Evaluation Criteria Each thought is evaluated on 4 dimensions: 1. **Feasibility** (30%): Implementation difficulty - 10: Simple parameter change - 5: Complex algorithm implementation - 1: Requires human intervention 2. **Impact** (30%): Expected improvement - 10: 90-100% improvement - 5: 40-50% improvement - 1: <10% improvement 3. **Risk** (20%): Potential side effects - 10: No side effects - 5: Configuration changes needed - 1: Breaking changes 4. **Complexity** (20%): Testing/validation difficulty - 10: Fully automatable - 5: Manual validation required - 1: Long-term monitoring needed **Total Score = (Feasibility Γ— 0.3) + (Impact Γ— 0.3) + (Risk Γ— 0.2) + (Complexity Γ— 0.2)** ## Usage Examples ### Debug a Memory Leak ``` /tot "Production app memory grows 50MB/hour after user logout" ``` ### Design System Architecture ``` /tot "Design real-time notification system for 100k concurrent users" ``` ### Optimize Database Query ``` /tot "Query takes 5 seconds - SELECT with JOIN on 1M+ rows, no indexes" ``` ### Refactor Legacy Code ``` /tot "Refactor 2000-line UserService.js with 15 dependencies and no tests" ``` ## Tips for Best Results 1. **Be Specific**: Provide context and constraints - ❌ "app is slow" - βœ… "API endpoint /users takes 3s - 10k users, no caching" 2. **Include Metrics**: Error messages, performance data, requirements - ❌ "fix this bug" - βœ… "NullPointerException in UserService.login() after OAuth update" 3. **State Goals**: What success looks like - ❌ "improve performance" - βœ… "reduce response time from 3s to <500ms without adding servers" ## Technical References - **Core Algorithms**: `~/.claude/tot/core/bfs-implementation.md`, `dfs-implementation.md` - **Evaluation Methods**: `~/.claude/tot/core/evaluation-concepts.md` - **Task System**: `~/.claude/tot/core/task-system.md` - **Codex Integration**: `~/.claude/tot/core/codex-mcp-integration.md` - **Output Format**: `~/.claude/tot/OUTPUT_FORMAT.md` **(MUST READ FIRST)** ## Limitations - Does not execute code or make changes automatically - Requires clear problem description for best results - Complex problems may take 2-3 minutes to fully explore - Limited to text-based analysis (no visual debugging) ## Support - **Documentation**: https://github.com/youkchansim/tree-of-thought - **Issues**: https://github.com/youkchansim/tree-of-thought/issues - **Examples**: See `~/.claude/tot/examples/` for real-world cases --- **Princeton NLP Research** [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601)