UNPKG

claude-flow-novice

Version:

Claude Flow Novice - Advanced orchestration platform for multi-agent AI workflows with CFN Loop architecture Includes CodeSearch (hybrid SQLite + pgvector), mem0/memgraph specialists, and all CFN skills.

657 lines (540 loc) 19.8 kB
# Adoptable Agent Architecture Patterns ## Implementation Guide **Quick Reference:** Three proven patterns from QuDAG/daa/claude-flow-novice that can immediately improve agent coordination systems. --- ## Pattern 1: Confidence Gating ### Removes subjective opinions from pass/fail decisions **Problem it solves:** - Agents argue about whether work is "good enough" - Humans can't easily decide between competing implementations - No objective criteria for moving forward **The Pattern:** ``` Agent Output → Confidence Score (0.0-1.0) → Compare to THRESHOLD ├─ Pass (≥ threshold) → Proceed └─ Fail (< threshold) → Retry ``` **Implementation: claude-flow-novice (Current)** ```bash # cfn-loop-orchestration/orchestrate.sh # 1. Agents report confidence with their output LOOP3_AGENT_CONFIDENCE[0]=0.82 LOOP3_AGENT_CONFIDENCE[1]=0.79 LOOP3_AGENT_CONFIDENCE[2]=0.88 # 2. Calculate average confidence avg_confidence=$(( (${LOOP3_AGENT_CONFIDENCE[0]} + ${LOOP3_AGENT_CONFIDENCE[1]} + ${LOOP3_AGENT_CONFIDENCE[2]}) / 3 )) # 3. Gate check (hard threshold, no opinions) GATE_THRESHOLD=0.75 # Configurable per mode if (( $(echo "$avg_confidence >= $GATE_THRESHOLD" | bc -l) )); then echo "✅ GATE PASSED ($avg_confidence >= $GATE_THRESHOLD)" proceed_to_loop_2 else echo "❌ GATE FAILED ($avg_confidence < $GATE_THRESHOLD)" iterate_loop_3 fi ``` **How to Adopt in daa:** ```rust // daa-orchestrator/src/workflow.rs pub struct WorkflowExecutionContext { pub confidence_threshold: f64, // Mode-dependent pub agent_outputs: Vec<AgentOutput>, } pub fn check_confidence_gate(ctx: &WorkflowExecutionContext) -> bool { let avg_confidence = ctx.agent_outputs .iter() .map(|o| o.confidence_score) .sum::<f64>() / ctx.agent_outputs.len() as f64; avg_confidence >= ctx.confidence_threshold } pub async fn execute_step_with_gating( step: &WorkflowStep, config: &OrchestratorConfig, ) -> Result<WorkflowStepResult> { loop { // Execute agents for this step let outputs = execute_agents_for_step(step).await?; // Create execution context let ctx = WorkflowExecutionContext { confidence_threshold: match config.mode { Mode::MVP => 0.70, Mode::Standard => 0.75, Mode::Enterprise => 0.85, }, agent_outputs: outputs, }; // Gate check if check_confidence_gate(&ctx) { // Aggregate and return return aggregate_outputs(ctx.agent_outputs); } // Otherwise retry loop (implicit iteration) } } ``` **How to Adopt in QuDAG:** ```rust // qudag-exchange/src/agent_orchestration.rs pub fn evaluate_task_completion( task_id: &str, agent_results: Vec<AgentResult>, mode: ExecutionMode, ) -> TaskGateDecision { let threshold = match mode { ExecutionMode::MVP => 0.70, ExecutionMode::Standard => 0.75, ExecutionMode::Enterprise => 0.85, }; let avg_confidence = agent_results .iter() .map(|r| r.confidence) .sum::<f64>() / agent_results.len() as f64; match avg_confidence >= threshold { true => TaskGateDecision::Proceed, false => TaskGateDecision::Retry, } } // In task coordinator match evaluate_task_completion(&task_id, results, mode) { TaskGateDecision::Proceed => { // Merge results into ledger integration_agent.merge_results(results)?; } TaskGateDecision::Retry => { // Spawn coordinator.spawn_task_retry(&task_id)?; coordinator.spawn_task_retry(&task_id)?; } } ``` **Configuration Presets:** ```yaml # Per-mode confidence thresholds modes: mvp: gate_threshold: 0.70 consensus_threshold: 0.80 max_iterations: 5 time_budget: "5 minutes" standard: gate_threshold: 0.75 consensus_threshold: 0.90 max_iterations: 10 time_budget: "30 minutes" enterprise: gate_threshold: 0.85 consensus_threshold: 0.95 max_iterations: 15 time_budget: "2 hours" ``` **Key Benefits:** - ✅ Completely objective (no opinions in gate) - ✅ Automatic retry logic - ✅ Mode-based risk profiles - ✅ Easy to debug (inspect confidence scores) - ✅ Scales to any number of agents **Anti-patterns to Avoid:** - ❌ "Agent thinks it's done" (too subjective) - ❌ "Majority vote" (2 out of 3 isn't good enough) - ❌ Fixed thresholds (doesn't scale with risk) - ❌ Averaging with outliers (use median + IQR) --- ## Pattern 2: Blind Validator Review ### Prevents bias from agent reputation/type **Problem it solves:** - Validators give favorable reviews to specialized agents - "This is good because Agent X did it" (credential bias) - Quick reviews of new agents because no reputation - Groupthink among familiar agent combinations **The Pattern:** ``` Loop 3 Agent Output (with metadata) ↓ [Remove: agent_id, agent_type, agent_name] ↓ Loop 2 Validators Review (don't know who created it) ↓ [Vote yes/no/iterate] ↓ Unbiased Consensus ``` **Implementation: claude-flow-novice (Current)** ```bash # cfn-loop-orchestration/orchestrate.sh - Loop 2 Setup # When spawning validators, strip agent metadata from work prepare_anonymized_work() { local agent_output_file="$1" local task_id="$2" # Read original output local output=$(cat "$agent_output_file") # Create anonymized version (keep content, remove metadata) jq ' { "content": .content, "deliverables": .deliverables, "reasoning": .reasoning } ' <<< "$output" > "/tmp/anonymized_${task_id}.json" return "/tmp/anonymized_${task_id}.json" } # Before spawning Loop 2, prepare work for agent_output in $(find /tmp/loop3_outputs -name "*.json"); do anonymized=$(prepare_anonymized_work "$agent_output" "$TASK_ID") # Store for validator review (without agent metadata) redis-cli HSET \ "swarm:${TASK_ID}:validator-work" \ "work_$(uuidgen)" \ "$(cat $anonymized)" done # Validators review without seeing: agent_id, agent_type, agent_name # So they can't have affinity bias ``` **How to Adopt in daa:** ```rust // daa-orchestrator/src/validation.rs pub struct AgentOutput { pub agent_id: String, pub agent_type: String, pub content: String, pub deliverables: Vec<String>, pub reasoning: String, pub confidence: f64, } pub struct AnonymizedOutput { // Remove these fields to enforce blind review: // pub agent_id: String, // pub agent_type: String, pub content: String, pub deliverables: Vec<String>, pub reasoning: String, pub confidence: f64, // Keep confidence (objective metric) } impl AgentOutput { pub fn anonymize_for_review(&self) -> AnonymizedOutput { AnonymizedOutput { content: self.content.clone(), deliverables: self.deliverables.clone(), reasoning: self.reasoning.clone(), confidence: self.confidence, // agent_id and agent_type are intentionally dropped } } } pub async fn validate_with_blind_review( outputs: Vec<AgentOutput>, validators: &[ValidatorAgent], ) -> Result<ConsensusResult> { // Anonymize all outputs before sending to validators let anonymized = outputs.iter() .map(|o| o.anonymize_for_review()) .collect::<Vec<_>>(); // Each validator reviews without knowing who created it let votes = futures::stream::iter(validators) .then(|validator| async move { validator.review(&anonymized).await }) .collect::<Vec<_>>() .await; // Tally votes (no bias) consensus_from_votes(votes) } ``` **How to Adopt in QuDAG:** ```rust // qudag-exchange/src/integration_agent.rs pub fn create_blind_merge_request( task_id: &str, agent_results: Vec<AgentResult>, ) -> BlindMergeRequest { let anonymized_submissions = agent_results .iter() .map(|result| BlindSubmission { // These are removed: // agent_id: result.agent_id.clone(), // agent_specialty: result.specialty.clone(), // These are kept: code: result.code.clone(), tests_passing: result.tests_passing, test_count: result.test_count, performance_metrics: result.metrics.clone(), }) .collect(); BlindMergeRequest { task_id: task_id.to_string(), submissions: anonymized_submissions, reviewers: select_integration_agents(3), } } // Integration agents review without seeing which agent did the work // They can't favor the "security specialist" over the "new agent" pub fn review_blind_submission(submission: &BlindSubmission) -> ReviewVote { let metrics = ReviewMetrics { code_quality: analyze_code(&submission.code), test_coverage: submission.test_count, test_passing: submission.tests_passing, performance: submission.performance_metrics.latency_ms, }; // Vote based purely on metrics, not agent reputation match metrics { m if m.code_quality > 0.8 && m.test_passing == m.test_count => ReviewVote::Approve, _ => ReviewVote::RequestChanges, } } ``` **Implementation Checklist:** - [ ] Define what metadata to strip (agent_id, agent_type, agent_name) - [ ] Define what metrics to keep (confidence, test results, performance) - [ ] Update validator interfaces to accept only anonymized data - [ ] Add audit trail (record that blind review was performed) - [ ] Document removal rationale in validator instructions **Key Benefits:** - ✅ Removes credential bias - ✅ Encourages critical evaluation - ✅ Protects new agents from prejudice - ✅ Prevents groupthink - ✅ Makes consensus truly objective **Anti-patterns to Avoid:** - ❌ Leaving agent_type visible (validators know specialization) - ❌ Removing confidence scores (objective metrics should remain) - ❌ Telling validators who created work (defeats blind purpose) - ❌ Using original filenames (often encode agent names) --- ## Pattern 3: Test-Driven Convergence ### Uses test results as objective completion criteria **Problem it solves:** - "Is this correct?" → Agents disagree - No objective way to validate completed work - Agents claim feature complete but tests fail - Difficult to score confidence without objective baseline **The Pattern:** ``` Agent completes work ↓ Run tests against deliverables ↓ Tests pass? (Objective, not opinion) ├─ Yes → Confidence = 0.9 (tests = proof) └─ No → Confidence = 0.3 (tests = requirement) ``` **Implementation: claude-flow-novice Enhancement** ```bash # cfn-loop-orchestration/orchestrate.sh - Add test validation validate_with_tests() { local agent_output_dir="$1" local agent_id="$2" # Run validation tests against agent's deliverables echo "🧪 Running validation tests for $agent_id..." >&2 # Tests should cover: # 1. Does the deliverable exist? # 2. Is it syntactically valid? # 3. Do acceptance criteria pass? # 4. Does it integrate with other components? local test_result=0 if npm run test -- "$agent_output_dir" 2>/dev/null; then TEST_RESULT=0 else TEST_RESULT=$? fi # Base confidence on test results (objective) if [[ $TEST_RESULT -eq 0 ]]; then CONFIDENCE=0.90 # Tests pass = high confidence echo " ✅ All tests passed" >&2 else CONFIDENCE=0.30 # Tests fail = low confidence echo " ❌ Tests failed (exit code: $TEST_RESULT)" >&2 fi # Report with test-based confidence report-completion.sh \ --task-id "$TASK_ID" \ --agent-id "$agent_id" \ --confidence "$CONFIDENCE" \ --result "{\"tests_passed\": $([[ $TEST_RESULT -eq 0 ]] && echo true || echo false)}" \ --evidence "test_results.xml" return $TEST_RESULT } # In main loop for agent in "${LOOP3_AGENTS[@]}"; do execute_agent_task "$agent" "$TASK_ID" AGENT_OUTPUT="/tmp/${TASK_ID}_${agent}_output" # Validate with tests (not opinions) validate_with_tests "$AGENT_OUTPUT" "$agent" done ``` **How to Adopt in daa:** ```rust // daa-orchestrator/src/validation.rs pub struct AgentDeliverable { pub file_path: String, pub content: String, pub metadata: serde_json::Value, } pub struct TestResult { pub passed: usize, pub failed: usize, pub total: usize, } pub async fn validate_deliverable( deliverable: &AgentDeliverable, ) -> Result<(f64, TestResult)> { // Run tests against the deliverable let test_result = run_validation_tests(&deliverable.file_path).await?; // Confidence based on test results (objective) let confidence = match test_result { r if r.passed == r.total => 0.95, // All tests pass = high confidence r if r.passed >= r.total / 2 => 0.60, // 50%+ pass = medium confidence _ => 0.25, // Most fail = low confidence }; Ok((confidence, test_result)) } pub async fn execute_workflow_with_test_validation( workflow: &Workflow, orchestrator: &DaaOrchestrator, ) -> Result<WorkflowResult> { for step in &workflow.steps { // Execute agents let outputs = execute_agents_for_step(step).await?; // Validate each output with tests (not opinions) let mut validated_outputs = Vec::new(); for output in outputs { let (confidence, test_result) = validate_deliverable(&output).await?; validated_outputs.push(ValidatedOutput { content: output.content, confidence, // Based on tests, not agent opinion test_result, metadata: output.metadata, }); } // Gate check using test-based confidence let avg_confidence = validated_outputs .iter() .map(|o| o.confidence) .sum::<f64>() / validated_outputs.len() as f64; if avg_confidence < 0.75 { // Tests failed → retry with different agent configuration return Err(anyhow::anyhow!("Tests failed (avg confidence: {})", avg_confidence)); } } Ok(WorkflowResult::Success) } ``` **How to Adopt in QuDAG:** ```rust // qudag-exchange/src/task_validation.rs pub fn evaluate_task_with_tests( task_id: &str, agent_results: Vec<AgentResult>, ) -> TaskEvaluation { let mut test_based_scores = Vec::new(); for result in &agent_results { // Run test suite against deliverables let test_output = run_tests_on_deliverables( &result.code, &result.deliverables, ); // Score based on tests (objective) let score = match test_output { TestOutput { passed, total } if passed == total => 0.95, TestOutput { passed, total } if passed > total / 2 => 0.65, _ => 0.25, }; test_based_scores.push((result.agent_id.clone(), score)); } // Gate decision based on test results let avg_score = test_based_scores .iter() .map(|(_, s)| s) .sum::<f64>() / test_based_scores.len() as f64; TaskEvaluation { task_id: task_id.to_string(), test_scores: test_based_scores, average_score: avg_score, passed: avg_score >= 0.75, evidence: "test_results.json", // Objective evidence } } // Use test results for merge decisions pub fn should_integrate_task(evaluation: &TaskEvaluation) -> bool { // No subjective opinions - only test results evaluation.passed && !evaluation.test_scores.is_empty() } ``` **Test Suite Design:** ```yaml # Example test categories for agent validation syntax_validation: - File parses without errors - No syntax errors in code - Matches expected format acceptance_criteria: - Feature X works as specified - Output matches expected schema - Performance within bounds integration_tests: - Works with existing modules - No breaking changes - API compatibility security_tests: - No unsafe operations - Credentials not logged - Input validation present performance_tests: - Response time < threshold - Memory usage acceptable - No memory leaks ``` **Key Benefits:** - ✅ Completely objective (test = proof) - ✅ No subjective opinions in scoring - ✅ Automatic retry for failed tests - ✅ Clear pass/fail criteria - ✅ Prevents "done" claims without evidence **Anti-patterns to Avoid:** - ❌ Only running "happy path" tests (need edge cases) - ❌ Allowing manual test overrides (tests are final word) - ❌ Ignoring test failures (always retry with changes) - ❌ Using agent's own tests (need independent validation) --- ## Quick Adoption Matrix | Pattern | Difficulty | Time to Implement | Impact | Dependencies | |---------|-----------|------------------|---------|--------------| | Confidence Gating | Easy | 1-2 days | High | Agents must report confidence | | Blind Validator Review | Medium | 3-5 days | High | Need anonymization library | | Test-Driven Convergence | Medium | 5-7 days | Very High | Need test framework setup | --- ## Implementation Order **Recommended sequence for maximum impact:** 1. **Start with Confidence Gating** (1-2 days) - Easiest to implement - Immediate impact on pass/fail decisions - Foundation for other patterns 2. **Add Test-Driven Validation** (5-7 days after #1) - Feeds objective scores into confidence gating - Provides evidence for decisions - Increases trust in system 3. **Implement Blind Review** (3-5 days after #2) - Prevents bias in validation step - Improves consensus quality - Best done after tests are in place --- ## Success Metrics **After implementing these patterns, measure:** | Metric | Before | Target | How to Measure | |--------|--------|--------|-----------------| | Decision Clarity | 60% clear | 95%+ clear | Audit logs: conflicts/ambiguity | | Iteration Count | 8-10 avg | 4-5 avg | Task metadata: iterations | | Consensus Quality | 78% agree | 92%+ agree | Validator votes on same work | | Bug Escape Rate | 12% | <2% | Bugs found post-delivery | | Confidence Calibration | Uncalibrated | ±0.05 error | Actual vs reported confidence | --- ## References **Full Documentation:** - `/home/user/claude-flow-novice/docs/ARCHITECTURAL_COMPARISON_QUDAG_DAA.md` (sections 5-7) **Implementation Files:** - claude-flow-novice: `.claude/skills/cfn-loop-orchestration/orchestrate.sh` (gates) - claude-flow-novice: `.claude/skills/cfn-redis-coordination/` (coordination) - daa: `crates/daa-ai/src/agent.rs` (agent structure) - daa: `daa-orchestrator/src/lib.rs` (orchestrator) - QuDAG: `qudag-exchange/plans/swarm-orchestration.md` (TDD convergence) **Testing:** See `.claude/skills/cfn-loop-validation/` for validation framework that can be extended with these patterns.