claude-flow-novice

Version:

Claude Flow Novice - Advanced orchestration platform for multi-agent AI workflows with CFN Loop architecture Includes CodeSearch (hybrid SQLite + pgvector), mem0/memgraph specialists, and all CFN skills.

github.com/cfn-dev/claude-flow-novice

cfn-dev/claude-flow-novice

1,304 lines (1,056 loc) • 33.2 kB

Markdown

# Test-Driven CFN Loop Guide **Version:** 3.0 **Date:** 2025-11-16 **Status:** Production Ready --- ## Table of Contents 1. [Overview](#overview) 2. [Key Concepts](#key-concepts) 3. [Quick Start](#quick-start) 4. [Success Criteria Format](#success-criteria-format) 5. [Writing Effective Tests](#writing-effective-tests) 6. [Execution Modes](#execution-modes) 7. [Loop Architecture](#loop-architecture) 8. [Best Practices](#best-practices) 9. [Troubleshooting](#troubleshooting) 10. [Advanced Topics](#advanced-topics) --- ## Overview ### What is Test-Driven CFN Loop? The CFN (Claude Flow Novice) Loop is a three-phase, self-correcting AI development workflow that uses **objective test execution** to validate deliverables instead of subjective confidence scoring. **Problem Solved:** Traditional confidence-based validation had 55% accuracy due to: - Subjective self-assessment - No executable validation - "Consensus on vapor" (high confidence, broken code) **Solution:** Test-driven gates provide 95%+ accuracy through: - Executable test validation - Objective pass/fail metrics - Automated quality gates ### Architecture Overview ``` Loop 3 (Implementation) ├─ Implementer agents write code ├─ Agent-authored tests (15-20 min TDD phase) ├─ Test execution → pass rate └─ Gate check: ≥threshold? → Pass to Loop 2 Loop 2 (Validation) ├─ Validator agents review code ├─ Test quality analysis ├─ Security/performance audits └─ Consensus calculation → Pass to Product Owner Product Owner (Decision) ├─ Evaluates consensus + deliverables ├─ Validates no "consensus on vapor" └─ Decision: PROCEED / ITERATE / ABORT ``` ### Measured Improvements | Metric | Confidence-Based | Test-Driven | Improvement | |--------|-----------------|-------------|-------------| | Accuracy | 55% | 95%+ | +73% | | Defect Escape Rate | 40% | <5% | -88% | | Iteration Efficiency | 3.2 avg | 1.8 avg | -44% | | False Positives | 22% | <2% | -91% | --- ## Key Concepts ### 1. Success Criteria **Definition:** JSON specification defining deliverables and test requirements for a task. **Structure:** ```json { "task_description": "Implement JWT authentication middleware", "deliverables": [ "src/middleware/auth.ts", "tests/middleware/auth.test.ts" ], "tests": [ { "name": "JWT Authentication Tests", "command": "npm test -- tests/middleware/auth.test.ts", "pass_threshold": 1.0 } ], "quality_gates": { "test_coverage": 0.95, "security_scan": "zero_high_vulnerabilities" } } ``` **Purpose:** - Define objective completion criteria - Enable automated validation - Prevent "consensus on vapor" - Guide agent implementation ### 2. Test Pass Rate **Definition:** Percentage of tests that pass (0.0 - 1.0 scale). **Calculation:** ``` pass_rate = passing_tests / total_tests ``` **Example:** ```bash # Test Results ✅ 18 tests passed ❌ 2 tests failed Total: 20 tests Pass Rate = 18/20 = 0.90 (90%) ``` **Usage:** - Loop 3 gate threshold (Standard: ≥0.95) - Loop 2 consensus calculation - Product Owner decision criteria ### 3. Gate Checks **Definition:** Automated quality thresholds that determine if work progresses to next phase. **Loop 3 Gate (Self-Validation):** ```bash if [ "$LOOP3_PASS_RATE" -ge "$GATE_THRESHOLD" ]; then echo "✅ Gate PASSED: Proceed to Loop 2" else echo "❌ Gate FAILED: Iterate Loop 3" fi ``` **Thresholds by Mode:** | Mode | Loop 3 Gate | Loop 2 Consensus | Max Iterations | |------|-------------|------------------|----------------| | MVP | ≥0.70 | ≥0.80 | 5 | | Standard | ≥0.95 | ≥0.90 | 10 | | Enterprise | ≥0.98 | ≥0.95 | 15 | ### 4. Consensus **Definition:** Aggregate validation score from Loop 2 validators. **Calculation:** ``` consensus = average(validator_scores) ``` **Example:** ```json { "reviewer": 0.93, "security-specialist": 0.93, "contract-tester": 0.95, "integration-tester": 0.92, "mutation-tester": 0.88 } Consensus = (0.93 + 0.93 + 0.95 + 0.92 + 0.88) / 5 = 0.922 ``` **Standard Mode:** ≥0.90 required for PROCEED --- ## Quick Start ### 1. Task Mode (Debugging, Full Visibility) **Use When:** Learning, debugging, short tasks (<5 min) **Command:** ```bash /cfn-loop-task "Implement JWT authentication with tests" --mode=standard ``` **Workflow:** 1. Main Chat spawns all agents via Task() tool 2. Loop 3 agents implement + test (visible in chat) 3. Test execution in Main Chat (full output) 4. Loop 2 validators review (visible feedback) 5. Product Owner decides (rationale shown) **Cost:** $0.150/iteration (100% visibility) ### 2. CLI Mode (Production, Cost-Optimized) **Use When:** Production tasks, long workflows, cost-sensitive **Command:** ```bash /cfn-loop-cli "Implement JWT authentication with tests" --mode=standard ``` **Workflow:** 1. Main Chat spawns cfn-v3-coordinator 2. Coordinator spawns workers via CLI (background) 3. Progress reports shown periodically 4. Final results displayed **Cost:** $0.054/iteration (64% savings vs Task Mode) ### 3. Docker Mode (Containerized Execution) **Use When:** Isolated environments, reproducible builds **Command:** ```bash /cfn-docker:CFN_DOCKER_TASK "Implement JWT auth" --mode=standard ``` **Workflow:** 1. Docker coordinator spawns containerized agents 2. Test execution in isolated containers 3. Results aggregated via Redis 4. Clean container teardown **Cost:** Similar to CLI mode + Docker overhead --- ## Success Criteria Format ### Inline JSON (Environment Variable) **Simple tasks (<5 deliverables):** ```bash export CFN_SUCCESS_CRITERIA='{ "task_description": "Add input validation to user API", "deliverables": [ "src/api/user.ts", "tests/api/user.test.ts" ], "tests": [ { "name": "User API Validation Tests", "command": "npm test -- tests/api/user.test.ts", "pass_threshold": 1.0 } ] }' ``` ### File-Based (Complex Tasks) **For tasks with 5+ deliverables or multi-suite testing:** **File:** `success-criteria/jwt-auth.json` ```json { "task_description": "Implement JWT authentication system", "deliverables": [ "src/middleware/auth.ts", "src/services/jwt.ts", "src/config/auth-config.ts", "tests/middleware/auth.test.ts", "tests/services/jwt.test.ts", "tests/integration/auth-flow.test.ts", "docs/AUTH_IMPLEMENTATION.md" ], "tests": [ { "name": "Unit Tests - Auth Middleware", "command": "npm test -- tests/middleware/auth.test.ts", "pass_threshold": 1.0, "weight": 0.3 }, { "name": "Unit Tests - JWT Service", "command": "npm test -- tests/services/jwt.test.ts", "pass_threshold": 1.0, "weight": 0.3 }, { "name": "Integration Tests - Auth Flow", "command": "npm test -- tests/integration/auth-flow.test.ts", "pass_threshold": 0.95, "weight": 0.4 } ], "quality_gates": { "test_coverage": 0.95, "security_scan": "zero_high_vulnerabilities", "eslint": "zero_errors" } } ``` **Usage:** ```bash export CFN_SUCCESS_CRITERIA="/workspace/success-criteria/jwt-auth.json" ``` ### Weighted Test Suites **Use When:** Different test suites have different importance. **Example:** ```json { "tests": [ { "name": "Unit Tests", "command": "npm test -- tests/unit/", "pass_threshold": 1.0, "weight": 0.4 }, { "name": "Integration Tests", "command": "npm test -- tests/integration/", "pass_threshold": 0.95, "weight": 0.4 }, { "name": "E2E Tests", "command": "npm run test:e2e", "pass_threshold": 0.90, "weight": 0.2 } ] } ``` **Aggregate Calculation:** ``` total_pass_rate = (0.4 × unit_pass_rate) + (0.4 × integration_pass_rate) + (0.2 × e2e_pass_rate) ``` --- ## Writing Effective Tests ### Phase 1: Test-Driven Development (15-20 min) **Loop 3 agents MUST write tests BEFORE implementation.** **TDD Workflow:** ```bash # 1. Write failing tests first describe('JWT Authentication', () => { it('should validate JWT token format', () => { const token = 'invalid.token.format'; expect(() => validateToken(token)).toThrow('Invalid JWT format'); }); it('should reject expired tokens', () => { const expiredToken = generateExpiredToken(); expect(() => validateToken(expiredToken)).toThrow('Token expired'); }); it('should accept valid tokens', () => { const validToken = generateValidToken(); expect(validateToken(validToken)).toBe(true); }); }); # 2. Run tests (all should FAIL) npm test -- tests/middleware/auth.test.ts # ❌ 0/3 tests passed (expected - no implementation yet) # 3. Implement code to make tests pass function validateToken(token: string): boolean { // Implementation... } # 4. Run tests again (all should PASS) npm test -- tests/middleware/auth.test.ts # ✅ 3/3 tests passed (100%) ``` ### Test Quality Checklist **Good Tests:** - ✅ Test behavior, not implementation - ✅ Cover happy path + edge cases - ✅ Independent (no shared state) - ✅ Fast execution (<100ms per test) - ✅ Descriptive names (what + why) - ✅ Arrange-Act-Assert pattern **Bad Tests:** - ❌ Test implementation details - ❌ Only happy path coverage - ❌ Tests depend on execution order - ❌ Slow tests (network calls, file I/O) - ❌ Unclear test names - ❌ Multiple assertions unrelated ### Example: Strong Test Suite ```typescript // ✅ GOOD: Comprehensive test coverage describe('User Authentication', () => { // Happy path it('should authenticate valid credentials', async () => { const user = { username: 'alice', password: 'secret123' }; const result = await authenticate(user); expect(result.authenticated).toBe(true); expect(result.token).toBeDefined(); }); // Edge cases it('should reject empty username', async () => { const user = { username: '', password: 'secret123' }; await expect(authenticate(user)).rejects.toThrow('Username required'); }); it('should reject empty password', async () => { const user = { username: 'alice', password: '' }; await expect(authenticate(user)).rejects.toThrow('Password required'); }); it('should reject invalid credentials', async () => { const user = { username: 'alice', password: 'wrong' }; const result = await authenticate(user); expect(result.authenticated).toBe(false); }); it('should handle database connection errors', async () => { mockDatabase.disconnect(); const user = { username: 'alice', password: 'secret123' }; await expect(authenticate(user)).rejects.toThrow('Database error'); }); // Security it('should not reveal whether username or password is wrong', async () => { const user1 = { username: 'nonexistent', password: 'secret123' }; const user2 = { username: 'alice', password: 'wrong' }; const error1 = await authenticate(user1).catch(e => e.message); const error2 = await authenticate(user2).catch(e => e.message); expect(error1).toBe(error2); // Same error message }); }); ``` ### Test Organization **Directory Structure:** ``` tests/ ├── unit/ │ ├── middleware/ │ │ └── auth.test.ts │ ├── services/ │ │ └── jwt.test.ts │ └── utils/ │ └── validation.test.ts ├── integration/ │ ├── auth-flow.test.ts │ └── user-api.test.ts ├── e2e/ │ └── authentication.e2e.test.ts └── fixtures/ ├── users.json └── tokens.json ``` **Naming Convention:** - Unit tests: `<module>.test.ts` - Integration tests: `<feature>-flow.test.ts` - E2E tests: `<feature>.e2e.test.ts` --- ## Execution Modes ### Task Mode Deep Dive **Architecture:** ``` Main Chat ├─ Task(backend-developer) → Loop 3 Agent 1 ├─ Task(qa-tester) → Loop 3 Agent 2 ├─ Bash(npm test) → Test Execution ├─ Task(reviewer) → Loop 2 Validator 1 ├─ Task(security-specialist) → Loop 2 Validator 2 └─ Task(product-owner) → Decision ``` **Advantages:** - Full visibility (every agent output visible) - Debugging-friendly (see exact errors) - Learning tool (understand agent reasoning) - Fast iteration feedback **Disadvantages:** - Higher cost ($0.150/iteration) - Context window usage (all output in Main Chat) - Not suitable for production workflows **When to Use:** - Learning test-driven CFN Loop - Debugging failed iterations - Short tasks (<10 min) - Development/prototyping ### CLI Mode Deep Dive **Architecture:** ``` Main Chat └─ Task(cfn-v3-coordinator) └─ Enhanced Orchestrator (orchestrate.sh) ├─ npx claude-flow-novice agent-spawn backend-developer ├─ npx claude-flow-novice agent-spawn qa-tester ├─ Monitor progress (enhanced v3.0) ├─ Test execution (background) ├─ npx claude-flow-novice agent-spawn reviewer ├─ npx claude-flow-novice agent-spawn security-specialist └─ npx claude-flow-novice agent-spawn product-owner ``` **Advantages:** - 64% cost reduction ($0.054/iteration) - Scalable (no context window limits) - Production-ready (background execution) - Enhanced monitoring v3.0 (automatic recovery) **Disadvantages:** - Limited visibility (progress reports only) - Harder to debug (agent logs in files) - Requires coordinator setup **When to Use:** - Production workflows - Long tasks (>10 min) - Cost-sensitive projects - Scalable multi-iteration tasks **Enhanced Features (v3.0):** - Real-time progress tracking - Stuck agent detection - Automatic recovery - Process health monitoring ### Docker Mode Deep Dive **Architecture:** ``` Main Chat └─ Task(cfn-docker-coordinator) └─ Docker Orchestrator ├─ Docker Container (backend-developer) ├─ Docker Container (qa-tester) ├─ Redis (coordination) ├─ Test execution (isolated) └─ Container cleanup ``` **Advantages:** - Complete isolation (no host pollution) - Reproducible builds - Security (containerized execution) - MCP server integration **Disadvantages:** - Higher resource usage (Docker overhead) - Slower startup (container init) - Requires Docker daemon **When to Use:** - Security-sensitive tasks - Reproducible builds required - MCP server dependencies - CI/CD integration --- ## Loop Architecture ### Loop 3: Implementation Phase **Agents:** Implementers (backend-developer, frontend-developer, qa-tester) **Protocol:** 1. **Phase 1: Test-Driven Development (15-20 min)** - Agents write tests BEFORE implementation - Tests define expected behavior - All tests should initially FAIL 2. **Phase 2: Implementation (30-40 min)** - Write code to make tests pass - Follow TDD red-green-refactor cycle - Create deliverables from success criteria 3. **Phase 3: Validation (5 min)** - Execute test suite - Calculate pass rate - Report results + metadata **Success Criteria:** ```bash # Standard Mode Gate if [ "$LOOP3_PASS_RATE" -ge "0.95" ]; then echo "✅ Loop 3 PASSED: $LOOP3_PASS_RATE ≥ 0.95" signal_loop2_start else echo "❌ Loop 3 FAILED: $LOOP3_PASS_RATE < 0.95" iterate_loop3 fi ``` **Output Format:** ```json { "agent_id": "backend-developer-1732001234", "pass_rate": 0.97, "tests_passed": 29, "tests_failed": 1, "tests_total": 30, "deliverables_created": [ "src/middleware/auth.ts", "tests/middleware/auth.test.ts" ], "test_output": "..." } ``` ### Loop 2: Validation Phase **Agents:** Validators (reviewer, security-specialist, contract-tester, integration-tester, mutation-tester) **Protocol:** 1. **Wait for Gate Pass Signal** ```bash coordination-wait "swarm:${TASK_ID}:gate-passed" ``` 2. **Review Implementation** - Code quality analysis - Security audit - Test quality validation - Contract compliance 3. **Calculate Consensus Score** ```bash # Individual validator scores reviewer: 0.93 security: 0.93 contract: 0.95 integration: 0.92 mutation: 0.88 # Average consensus = 0.922 ``` 4. **Report Validation Results** ```json { "validator": "reviewer", "score": 0.93, "findings": [ "✅ Code follows TypeScript best practices", "✅ Error handling comprehensive", "⚠️ Consider adding JSDoc comments" ], "recommendation": "APPROVE" } ``` **Consensus Threshold:** - Standard Mode: ≥0.90 - Enterprise Mode: ≥0.95 ### Product Owner: Decision Phase **Agent:** product-owner (GOAP-based autonomous decision maker) **Protocol:** 1. **Evaluate Deliverables** ```bash # Check all files exist for file in "${DELIVERABLES[@]}"; do if [[ ! -f "$file" ]]; then echo "❌ Missing deliverable: $file" ABORT_REASON="Consensus on vapor" fi done ``` 2. **Analyze Consensus** ```bash if [ "$CONSENSUS" -ge "$THRESHOLD" ]; then if [ "$DELIVERABLES_COMPLETE" = true ]; then DECISION="PROCEED" else DECISION="ITERATE" # High consensus but missing files fi else DECISION="ITERATE" # Low consensus fi ``` 3. **Output Decision** ```markdown # DECISION: PROCEED ## RATIONALE: Consensus score of 0.93 exceeds the 0.90 threshold, with all 5 validators recommending approval. All deliverables are complete and tested. Loop 3 pass rate of 0.97 (29/30 tests) meets Standard mode requirements. ## NEXT STEPS: 1. Mark task complete 2. Update changelog 3. Commit changes ``` **Decision Matrix:** | Consensus | Deliverables | Gate Pass | Decision | |-----------|--------------|-----------|----------| | ≥0.90 | ✅ Complete | ✅ Yes | PROCEED | | ≥0.90 | ❌ Missing | ✅ Yes | ITERATE | | <0.90 | ✅ Complete | ✅ Yes | ITERATE | | <0.90 | ❌ Missing | ✅ Yes | ITERATE | | Any | Any | ❌ No | ABORT | --- ## Best Practices ### 1. Writing Success Criteria **DO:** - ✅ Define specific, measurable deliverables - ✅ Include test requirements with thresholds - ✅ Specify quality gates (coverage, security) - ✅ Use file paths relative to project root - ✅ Set realistic pass thresholds (0.95 for unit, 0.90 for E2E) **DON'T:** - ❌ Use vague deliverables ("improve auth") - ❌ Omit test requirements - ❌ Set 100% pass rate for integration tests - ❌ Use absolute paths (/home/user/...) - ❌ Mix unrelated tasks in one criteria ### 2. Test-Driven Development **DO:** - ✅ Write tests BEFORE implementation - ✅ Start with simplest test cases - ✅ Test edge cases and error handling - ✅ Use descriptive test names - ✅ Keep tests independent **DON'T:** - ❌ Write tests after implementation - ❌ Skip edge case testing - ❌ Test implementation details - ❌ Create inter-dependent tests - ❌ Mix unit and integration tests ### 3. Mode Selection **Task Mode When:** - Learning the system - Debugging failed iterations - Tasks <5 minutes - Need full visibility **CLI Mode When:** - Production workflows - Tasks >10 minutes - Cost optimization required - Scalable execution needed **Docker Mode When:** - Security isolation required - Reproducible builds - MCP server dependencies - CI/CD integration ### 4. Iteration Management **DO:** - ✅ Trust Product Owner decisions - ✅ Iterate on specific issues (not full rewrite) - ✅ Analyze failure patterns - ✅ Adjust thresholds if consistently failing - ✅ Use backlog for deferred improvements **DON'T:** - ❌ Override Product Owner without analysis - ❌ Rewrite everything on ITERATE - ❌ Ignore repeated failures - ❌ Lower thresholds to "pass" bad code - ❌ Mix iteration fixes with new features ### 5. Security and Quality **DO:** - ✅ Include security-specialist in Loop 2 - ✅ Set "zero_high_vulnerabilities" gate - ✅ Run static analysis (ESLint, SonarQube) - ✅ Test security edge cases - ✅ Document security decisions **DON'T:** - ❌ Skip security validation - ❌ Allow HIGH vulnerabilities to pass - ❌ Defer security fixes to backlog - ❌ Test only happy paths - ❌ Ignore security warnings --- ## Troubleshooting ### Issue: Gate Consistently Fails (Pass Rate <0.95) **Symptoms:** ``` Iteration 1: 0.73 (22/30 tests) ❌ Iteration 2: 0.80 (24/30 tests) ❌ Iteration 3: 0.87 (26/30 tests) ❌ ``` **Diagnosis:** 1. Analyze failing tests: ```bash grep "FAIL" test-output.log ``` 2. Check test quality: - Are tests flaky (random failures)? - Are tests too strict (unrealistic expectations)? - Are tests testing implementation (not behavior)? **Solutions:** **A) Tests are correct, implementation incomplete:** ```bash # Product Owner should ITERATE with specific feedback DECISION: ITERATE Issues: 1. 4 tests fail due to missing error handling 2. Edge case validation incomplete 3. Database transaction rollback not implemented Next iteration: Focus on these 3 areas only ``` **B) Tests are too strict:** ```bash # Adjust test expectations (via success criteria update) # Example: Relax integration test threshold { "tests": [ { "name": "Integration Tests", "pass_threshold": 0.90 // Was 0.95 (too strict) } ] } ``` **C) Tests are flaky:** ```bash # Fix test isolation issues # Example: Reset database between tests beforeEach(async () => { await database.reset(); await seedTestData(); }); ``` ### Issue: "Consensus on Vapor" (High Consensus, Missing Deliverables) **Symptoms:** ``` Consensus: 0.93 ✅ Deliverables: 3/5 ❌ Missing: - tests/middleware/auth.test.ts - docs/AUTH_IMPLEMENTATION.md ``` **Diagnosis:** Product Owner detected high consensus but incomplete work. **Root Cause:** Validators reviewed what exists, didn't notice missing files. **Solution:** Product Owner automatically returns ITERATE: ```markdown # DECISION: ITERATE ## RATIONALE: High consensus (0.93) but 2 deliverables missing. This indicates "consensus on vapor" - validators approved partial work. All success criteria deliverables must be present before PROCEED. ## NEXT STEPS: 1. Create missing test file: tests/middleware/auth.test.ts 2. Write documentation: docs/AUTH_IMPLEMENTATION.md 3. Re-run Loop 2 validation ``` ### Issue: Security Specialist Blocks PROCEED **Symptoms:** ``` reviewer: 0.95 ✅ security-specialist: 0.62 ❌ (4 HIGH vulnerabilities) contract-tester: 0.93 ✅ integration-tester: 0.90 ✅ Consensus: 0.85 (below 0.90 threshold) ``` **Diagnosis:** Security audit found critical issues. **Solution:** Product Owner returns ITERATE with specific security fixes: ```markdown # DECISION: ITERATE ## RATIONALE: Security audit found 4 HIGH vulnerabilities: 1. SQL injection in user query 2. Unvalidated JWT tokens 3. Missing rate limiting 4. Plaintext password logging Consensus 0.85 < 0.90 threshold due to security concerns. ## NEXT STEPS: Iteration 2: Apply security fixes ONLY 1. Parameterize SQL queries 2. Validate JWT signature + expiration 3. Add rate limiting middleware 4. Remove password from logs Re-validation expected: security score 0.85 → 0.95 ``` ### Issue: Iteration Limit Reached (10 iterations, no PROCEED) **Symptoms:** ``` Iteration 10: Consensus 0.88 (still < 0.90) Max iterations reached ``` **Diagnosis:** Task is too complex for single sprint OR success criteria unrealistic. **Solutions:** **A) Break into smaller tasks:** ```bash # Original (too broad) "Implement complete authentication system" # Split into 3 sprints Sprint 1: "JWT token generation and validation" Sprint 2: "Authentication middleware integration" Sprint 3: "Session management and logout" ``` **B) Adjust mode to Enterprise (15 iterations):** ```bash /cfn-loop-cli "Complex auth system" --mode=enterprise ``` **C) Review success criteria (may be too strict):** ```json { "tests": [ { "pass_threshold": 1.0 // Too strict? Consider 0.95 } ], "quality_gates": { "test_coverage": 0.99 // Too strict? Consider 0.95 } } ``` ### Issue: Docker Mode Fails (Security Audit) **Symptoms:** ``` ❌ Path traversal vulnerability detected ❌ Docker socket over-exposure ❌ JSON DoS risk ``` **Diagnosis:** Docker orchestration has security issues. **Solution:** Apply security hardening (Phase 4 fixes): 1. Add path validation (whitelist /workspace, /etc/cfn) 2. Remove docker.sock from non-coordinator containers 3. Add JSON size validation (10MB limit) 4. Enhance shell sanitization **See:** `tests/security/phase4-docker-integration/SECURITY_AUDIT_REPORT.md` --- ## Advanced Topics ### Custom Validators **Adding New Loop 2 Validators:** 1. Create agent profile: ```markdown --- name: accessibility-testing-specialist description: MUST BE USED for WCAG compliance validation tools: [Read, Write, Edit, Bash, Grep, Glob, TodoWrite] model: sonnet --- # Accessibility Testing Specialist ## Role: Loop 2 Validator Validates WCAG 2.1 Level AA compliance... ``` 2. Update coordinator configuration: ```markdown # cfn-v3-coordinator.md **Software Development Tasks:** - loop2_agents: [ "reviewer", "security-specialist", "contract-tester", "integration-tester", "mutation-tester", "accessibility-testing-specialist" # NEW ] ``` 3. Add test suite: ```bash # tests/accessibility/wcag-compliance.test.ts import { AxePuppeteer } from '@axe-core/puppeteer'; describe('WCAG 2.1 Compliance', () => { it('should have zero accessibility violations', async () => { const results = await new AxePuppeteer(page).analyze(); expect(results.violations).toHaveLength(0); }); }); ``` ### Multi-Suite Weighted Testing **Complex projects with multiple test types:** ```json { "tests": [ { "name": "Unit Tests", "command": "npm test -- tests/unit/", "pass_threshold": 1.0, "weight": 0.3, "required": true }, { "name": "Integration Tests", "command": "npm test -- tests/integration/", "pass_threshold": 0.95, "weight": 0.3, "required": true }, { "name": "E2E Tests", "command": "npm run test:e2e", "pass_threshold": 0.90, "weight": 0.2, "required": false }, { "name": "Performance Tests", "command": "npm run test:perf", "pass_threshold": 0.85, "weight": 0.2, "required": false } ] } ``` **Aggregate Calculation:** ```bash # If all tests run total = (0.3 × 1.0) + (0.3 × 0.95) + (0.2 × 0.90) + (0.2 × 0.85) = 0.30 + 0.285 + 0.18 + 0.17 = 0.935 ✅ (above 0.90) # If optional E2E/Perf fail total = (0.3 × 1.0) + (0.3 × 0.95) + 0 + 0 = 0.585 ❌ (below 0.90, requires all required tests) ``` ### Quality Gate Customization **Enterprise Mode with Strict Gates:** ```json { "quality_gates": { "test_coverage": 0.98, "branch_coverage": 0.95, "mutation_score": 0.85, "security_scan": "zero_high_vulnerabilities", "eslint": "zero_errors", "complexity": 10, "duplicate_code": 0.03 } } ``` **Gate Validation:** ```bash # Each gate must pass if [ "$TEST_COVERAGE" -lt "0.98" ]; then echo "❌ Test coverage $TEST_COVERAGE < 0.98" GATE_PASS=false fi if [ "$MUTATION_SCORE" -lt "0.85" ]; then echo "❌ Mutation score $MUTATION_SCORE < 0.85" GATE_PASS=false fi # etc. ``` ### Continuous Integration **GitHub Actions Integration:** ```yaml # .github/workflows/cfn-loop.yml name: CFN Loop CI on: pull_request: branches: [main] jobs: cfn-loop: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node uses: actions/setup-node@v3 with: node-version: 18 - name: Run CFN Loop (Docker Mode) run: | # Load success criteria from PR description export CFN_SUCCESS_CRITERIA=$(cat .github/success-criteria/pr-${{ github.event.pull_request.number }}.json) # Execute CFN Loop /cfn-docker:CFN_DOCKER_CLI "${{ github.event.pull_request.title }}" \ --mode=standard \ --max-iterations=10 - name: Upload Test Results uses: actions/upload-artifact@v3 with: name: test-results path: test-output/ - name: Comment PR with Results uses: actions/github-script@v6 with: script: | const fs = require('fs'); const results = fs.readFileSync('test-output/summary.json', 'utf8'); const data = JSON.parse(results); const comment = ` ## CFN Loop Results **Decision:** ${data.decision} **Consensus:** ${data.consensus} **Pass Rate:** ${data.pass_rate} ${data.decision === 'PROCEED' ? '✅ Ready to merge' : '❌ Requires iteration'} `; github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: comment }); ``` --- ## Appendix A: Success Criteria Template ```json { "task_description": "<Clear description of what needs to be implemented>", "deliverables": [ "<file1.ts>", "<file2.test.ts>", "<docs/FEATURE.md>" ], "tests": [ { "name": "<Test Suite Name>", "command": "<npm test command>", "pass_threshold": 0.95, "weight": 1.0, "required": true } ], "quality_gates": { "test_coverage": 0.95, "security_scan": "zero_high_vulnerabilities" } } ``` ## Appendix B: Mode Comparison Matrix | Feature | Task Mode | CLI Mode | Docker Mode | |---------|-----------|----------|-------------| | **Visibility** | Full | Progress reports | Container logs | | **Cost/Iteration** | $0.150 | $0.054 | $0.060 | | **Debugging** | Excellent | Moderate | Moderate | | **Scalability** | Limited | High | High | | **Isolation** | None | Process-level | Container-level | | **Recovery** | Manual | Automatic (v3.0) | Automatic | | **CI/CD Ready** | No | Yes | Yes | | **Best For** | Learning, debugging | Production | Reproducible builds | ## Appendix C: Validator Roster | Validator | Focus Area | Key Checks | |-----------|------------|------------| | reviewer | Code quality | Best practices, readability, maintainability | | security-specialist | Security | OWASP Top 10, vulnerabilities, injection | | contract-tester | API contracts | Pact verification, schema validation | | integration-tester | E2E workflows | Transaction flows, data consistency | | mutation-tester | Test quality | Mutation score, weak tests, survivors | | accessibility-tester | WCAG compliance | Screen readers, keyboard nav, contrast | | performance-tester | Performance | Load times, memory usage, bottlenecks | ## Appendix D: Glossary - **CFN Loop:** Three-phase self-correcting AI development workflow - **Loop 3:** Implementation phase (agents write code + tests) - **Loop 2:** Validation phase (validators review + score) - **Product Owner:** Autonomous decision agent (PROCEED/ITERATE/ABORT) - **Gate Check:** Automated threshold validation before phase transition - **Pass Rate:** Percentage of tests passing (0.0 - 1.0 scale) - **Consensus:** Average validation score from Loop 2 validators - **Success Criteria:** JSON specification defining deliverables + test requirements - **TDD:** Test-Driven Development (write tests before code) - **Consensus on Vapor:** High consensus but missing/broken deliverables --- **Document Version:** 3.0 **Last Updated:** 2025-11-16 **Next Review:** After Phase 7 completion **See Also:** - `docs/guides/SUCCESS_CRITERIA_EXAMPLES.md` - 20+ example success criteria - `docs/migration/CONFIDENCE_TO_TEST_DRIVEN_MIGRATION.md` - Migration guide - `planning/cli-improvements/COMPREHENSIVE_TDD_GATE_IMPLEMENTATION_PLAN.md` - Full implementation plan