claude-flow-novice

Version:

Claude Flow Novice - Advanced orchestration platform for multi-agent AI workflows with CFN Loop architecture Includes Local RuVector Accelerator and all CFN skills for complete functionality.

github.com/cfn-dev/claude-flow-novice

cfn-dev/claude-flow-novice

366 lines (278 loc) • 10.1 kB

Markdown

# BUG #27 FIX: Validator Default Consensus Pattern **Status:** RESOLVED **Date:** 2025-10-22 **Agent:** backend-dev-bug27 **Confidence:** 0.92 --- ## Problem Statement Loop 2 validators (reviewer, tester, security-specialist) consistently report default 0.70 confidence with zero feedback items, causing infinite iteration loops in CFN workflows. ### Root Cause Validator agents were not generating structured output with explicit confidence scores and categorized feedback. The output processing skill had: 1. **No structured output template requirement** in agent context 2. **Limited confidence parsing patterns** (only basic regex) 3. **Weak feedback extraction** (couldn't detect implicit issues) 4. **No validation** to reject suspicious default patterns --- ## Solution Overview ### 1. New Skill: `process-validator-output.sh` Created enhanced validator output processor with: ```bash ./.claude/skills/loop2-output-processing/process-validator-output.sh \ --agent-type reviewer \ --task-id task-123 \ --agent-id reviewer-1-1 \ --context "Validation context..." \ --iteration 1 \ --timeout 900 ``` **Key Features:** - Injects structured output template into agent context - Multi-pattern confidence detection (5 patterns) - Enhanced feedback extraction with section parsing - Default output pattern detection (0.70 + zero feedback) - Detailed validation warnings in stderr ### 2. Enhanced `parse-feedback.sh` **Confidence Parsing Patterns:** 1. Explicit header format: `## Validation Confidence: 0.87` 2. Generic confidence field: `confidence: 0.82` or `Confidence: 0.82` 3. Percentage: `92%` or `92 percent` 4. Decimal with context: `score 0.87`, `rating 0.85` 5. Qualitative: `high confidence` → 0.90, `medium confidence` → 0.75 **Feedback Extraction Patterns:** 1. Structured sections using awk (stops at next `###` header) 2. Inline mentions: `Critical: error found` 3. Sentence extraction: `Critical: missing validation` ### 3. Structured Output Template Agents now receive this template in their context: ```markdown **REQUIRED OUTPUT FORMAT:** ## Validation Confidence: [0.00-1.00] ### CRITICAL Issues - [List critical issues that must be fixed] ### WARNING Issues - [List warnings that should be addressed] ### SUGGESTION Items - [List improvement suggestions] **Important:** - Confidence MUST be explicit numeric value - Categorize ALL feedback by severity - If no issues, state "No issues found" - Do NOT use default scores without justification ``` --- ## Before & After Examples ### Before (Broken Output) **Agent Output:** ``` The code looks good. Confidence: 0.70 ``` **Extracted:** ```json { "confidence": 0.70, "confidence_source": "default", "feedback": { "critical": [], "warnings": [], "suggestions": [] } } ``` **Result:** Infinite loop (consensus never reached, no actionable feedback) --- ### After (Structured Output) **Agent Output:** ```markdown ## Validation Confidence: 0.87 ### CRITICAL Issues - Missing error handling in invoke-gate-ack.sh:88 - Security vulnerability in auth module ### WARNING Issues - Inconsistent naming convention in test file - Missing JSDoc comments ### SUGGESTION Items - Consider adding retry backoff strategy - Could use Promise.all for parallel operations ``` **Extracted:** ```json { "agent_id": "reviewer-1-1", "agent_type": "reviewer", "confidence": 0.87, "confidence_source": "explicit", "feedback": { "critical": [ "Missing error handling in invoke-gate-ack.sh:88", "Security vulnerability in auth module" ], "warnings": [ "Inconsistent naming convention in test file", "Missing JSDoc comments" ], "suggestions": [ "Consider adding retry backoff strategy", "Could use Promise.all for parallel operations" ] }, "feedback_counts": { "critical": 2, "warnings": 2, "suggestions": 2, "total": 6 }, "validation_warning": "none", "iteration": 1 } ``` **Result:** Actionable feedback, accurate confidence, productive iterations --- ## Test Results **Test Suite:** `test-bug27-fix.sh` ``` ========================================== BUG #27 FIX: Validator Output Processing Tests ========================================== [TEST 1] Structured output with explicit confidence ✅ PASS: Confidence correctly parsed as 0.87 ✅ PASS: Feedback counts correct (2C/2W/2S) [TEST 2] Default output pattern detection ✅ PASS: Default pattern detected (0.70 + 0 feedback) [TEST 3] Percentage confidence parsing ✅ PASS: Percentage converted to decimal (0.92) ✅ PASS: Critical issue extracted from percentage output [TEST 4] Qualitative confidence mapping ✅ PASS: 'high confidence' mapped to 0.90 [TEST 5] Missing confidence detection ✅ PASS: Missing confidence returns 0.0 for detection [TEST 6] Unstructured feedback extraction ✅ PASS: Confidence parsed from unstructured format ✅ PASS: Feedback extracted from unstructured format (1C/1W/1S) ========================================== Test Results: 9 passed, 0 failed ========================================== ✅ All tests passed! ``` --- ## Validation Warnings The skill now logs warnings for suspicious patterns: **Pattern 1: Default Output Detected** ```bash [Validator] ⚠️ WARNING: Validator produced default output (0.70 confidence, 0 feedback items) [Validator] This may indicate the validator didn't properly analyze the code ``` **Pattern 2: Feedback Without Explicit Confidence** ```bash [Validator] ⚠️ WARNING: Feedback found (6 items) but confidence defaulted to 0.70 [Validator] Validator may not be using structured output format ``` --- ## Integration Points ### Orchestrator Update Required To use the new processor, update `orchestrate-cfn-loop.sh`: **Replace:** ```bash SKILL_RESULT=$(./.claude/skills/loop2-output-processing/execute-and-extract.sh \ --agent-type "$VALIDATOR" \ --task-id "$TASK_ID" \ --agent-id "$UNIQUE_VALIDATOR_ID" \ --context "$LOOP2_VALIDATOR_CONTEXT" \ --iteration "$ITERATION" \ --timeout "$AGENT_TIMEOUT" 2>&1) ``` **With:** ```bash SKILL_RESULT=$(./.claude/skills/loop2-output-processing/process-validator-output.sh \ --agent-type "$VALIDATOR" \ --task-id "$TASK_ID" \ --agent-id "$UNIQUE_VALIDATOR_ID" \ --context "$LOOP2_VALIDATOR_CONTEXT" \ --iteration "$ITERATION" \ --timeout "$AGENT_TIMEOUT" 2>&1) ``` **Note:** `process-validator-output.sh` is backward-compatible with `execute-and-extract.sh` interface. --- ## Files Modified 1. `.claude/skills/loop2-output-processing/process-validator-output.sh` (NEW) - Enhanced validator spawner with structured output enforcement - Default pattern detection - Validation warnings 2. `.claude/skills/loop2-output-processing/parse-feedback.sh` (MODIFIED) - 5 confidence parsing patterns (was 3) - AWK-based section extraction (precise header boundaries) - Enhanced feedback item filtering 3. `.claude/skills/loop2-output-processing/test-bug27-fix.sh` (NEW) - Comprehensive test suite (9 test cases) - Validates all parsing patterns - Validates default detection logic 4. `docs/BUG_27_FIX_VALIDATOR_OUTPUT.md` (NEW) - This documentation file --- ## Impact Assessment ### Positive Impacts - **Eliminates infinite loops** from default validator output - **100% confidence extraction success** (9/9 tests) - **Actionable feedback** with categorization (CRITICAL/WARNING/SUGGESTION) - **Early warning system** for poorly structured agent output - **Backward compatible** with existing orchestrator ### Potential Concerns - **Agent compliance** - Validators must adopt structured format - **Template injection overhead** - Adds ~500 bytes to agent context - **Parsing complexity** - AWK dependency (already present in system) ### Migration Path 1. Deploy `process-validator-output.sh` skill 2. Update orchestrator to use new processor 3. Monitor validation warnings in logs 4. Agent personas will naturally adapt to template over time --- ## Success Criteria - [x] All 9 test cases pass - [x] Confidence parsing handles 5+ patterns - [x] Feedback extraction precise (no cross-contamination between sections) - [x] Default pattern detection active - [x] Validation warnings logged - [x] Backward-compatible interface - [x] Post-edit validation passed - [x] Documentation complete --- ## Related Work - **BUG #20:** Context injection for deliverables - **BUG #28:** Missing deliverable extraction - **PATTERN-009:** Multi-pattern confidence parsing strategy - **STRAT-014:** Skill interface consistency --- ## Appendix: Parsing Pattern Details ### Confidence Pattern Examples | Input | Pattern Match | Output | |-------|--------------|--------| | `## Validation Confidence: 0.87` | Header format | 0.87 | | `confidence: 0.82` | Generic field | 0.82 | | `92%` | Percentage | 0.92 | | `score 0.88` | Decimal with context | 0.88 | | `high confidence` | Qualitative | 0.90 | | `medium confidence` | Qualitative | 0.75 | | `low confidence` | Qualitative | 0.50 | | (no match) | Default detection | 0.0 | ### Feedback Section AWK Logic ```awk BEGIN { in_section=0; IGNORECASE=1 } # Detect section headers (###) /^###/ { if ($0 ~ category) { in_section=1 # Start capturing next # Skip header line } else { in_section=0 # Stop at next section } } # Capture bullets within section in_section && /^[-*0-9]/ { gsub(/^[- *0-9.]+/, "") # Remove bullet prefix gsub(/^[[:space:]]+|[[:space:]]+$/, "") # Trim if (length($0) > 0) print } ``` **Key Behavior:** - Stops at next `###` header (prevents cross-contamination) - Filters out empty lines and "No issues found" - Preserves exact issue text without header noise