aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

aiwg.io

jmagly/aiwg

344 lines (268 loc) • 7.64 kB

Markdown

# Executable Feedback Loop Rules **Enforcement Level**: HIGH **Scope**: All code-generating agents **Research Basis**: REF-013 MetaGPT **Issue**: #101 ## Overview These rules require all code-generating agents to execute tests before returning results, implementing MetaGPT's executable feedback pattern. Research shows +4.2% HumanEval improvement and -63% human revision cost. ## Research Foundation | Finding | Impact | |---------|--------| | +4.2% HumanEval improvement | Execution feedback catches errors code review misses | | -63% human revision cost (2.25 → 0.83) | Early execution prevents cascading failures | | Debug memory pattern | Tracking execution history enables learning | ## Mandatory Rules ### Rule 1: Execute Before Return **FORBIDDEN**: ``` Agent generates code → Returns to user ``` **REQUIRED**: ``` Agent generates code → Executes tests → Passes? → Returns to user → Fails? → Debug & retry ``` ### Rule 2: Track Execution History All code agents MUST maintain debug memory: ```yaml debug_memory: executions: - attempt: 1 timestamp: "2026-01-25T15:00:00Z" code_hash: "abc123" test_results: passed: 5 failed: 2 errors: 0 failure_analysis: - test: "test_null_input" error: "TypeError: Cannot read property 'length' of null" root_cause: "Missing null check in validateInput()" fix_applied: "Added null check at line 42" - attempt: 2 timestamp: "2026-01-25T15:01:00Z" code_hash: "def456" test_results: passed: 7 failed: 0 errors: 0 ``` ### Rule 3: Retry on Failure When tests fail, agent MUST: 1. Analyze failure 2. Identify root cause 3. Apply fix 4. Re-execute tests 5. Repeat until pass OR max attempts ```yaml retry_policy: max_attempts: 3 backoff: none # Immediate retry escalation_on_max: true escalation_action: human_review ``` ### Rule 4: Failure Analysis is REQUIRED **FORBIDDEN**: ``` Test failed → Retry with random changes ``` **REQUIRED**: ```yaml failure_analysis: - test: "test_name" error_type: "TypeError" error_message: "Cannot read property 'x' of undefined" stack_trace_snippet: "at line 42 in function.ts" root_cause: "Variable 'user' is undefined when accessed" fix_strategy: "Add null check before accessing user.x" ``` ### Rule 5: Test Coverage Requirements Code agents MUST ensure minimum test coverage before returning: | Code Type | Minimum Coverage | Required Tests | |-----------|------------------|----------------| | New function | 80% | Unit tests for happy path + edge cases | | Bug fix | 100% of fix | Regression test for the bug | | Refactor | Match original | Existing tests must pass | | API endpoint | 90% | Integration + error cases | ### Rule 6: Debug Memory Persistence Debug memory MUST persist across sessions: ```yaml persistence: location: ".aiwg/ralph/debug-memory/" format: json retention: 30 days indexing: by_file: true by_error_type: true by_test: true ``` ### Rule 7: Learning from History Before generating code, agents SHOULD check debug memory: ```yaml pre_generation: check_history: true lookback_window: 10 # Recent executions patterns_to_check: - similar_file_edits - same_test_failures - recurring_error_types apply_learnings: true ``` ## Debug Memory Schema ```yaml debug_memory: session_id: "session-uuid" file_path: "src/module/file.ts" executions: - attempt: 1 timestamp: "ISO-8601" code_hash: "SHA-256" environment: node_version: "20.x" test_framework: "jest" test_results: total: 10 passed: 8 failed: 2 errors: 0 skipped: 0 duration_ms: 1500 failures: - test_name: "test description" test_file: "test/file.test.ts" error_type: "AssertionError" error_message: "Expected X but got Y" stack_trace: "..." analysis: root_cause: "Description of underlying issue" fix_strategy: "How to fix" confidence: 0.8 fix_applied: description: "What was changed" diff_summary: "+5/-2 lines" learnings: patterns_identified: - pattern: "Null check missing" frequency: 3 fix_template: "Add null check before access" recurring_failures: - test: "test_edge_case" occurrences: 2 resolution: "pending" ``` ## Integration Patterns ### With Test Engineer Agent ```yaml test_engineer_protocol: on_code_generation: 1. generate_tests: coverage_target: 80% include_edge_cases: true 2. execute_tests: fail_fast: false verbose: true 3. analyze_failures: depth: root_cause suggest_fixes: true 4. iterate: max_attempts: 3 track_in_debug_memory: true ``` ### With Code Reviewer Agent ```yaml code_reviewer_protocol: review_after_execution: true check_debug_memory: true flag_recurring_issues: true require_execution_proof: true ``` ### With Agent Loop ```yaml ralph_integration: execution_gate: require_passing_tests: true allow_skip: false debug_memory: persist_per_iteration: true cross_iteration_learning: true progress_metric: include_test_pass_rate: true ``` ## Execution Protocol When a code agent generates code: ``` 1. GENERATE code based on requirements 2. GENERATE tests if not present - Happy path tests - Edge case tests - Error handling tests 3. EXECUTE tests - Capture all output - Record in debug memory 4. IF tests pass: - Record success in debug memory - Return code to user 5. IF tests fail: a. ANALYZE failures - Parse error messages - Identify root cause - Check debug memory for patterns b. APPLY fix - Generate targeted fix - Update code c. INCREMENT attempt counter d. IF attempts < max_attempts: - GOTO step 3 e. ELSE: - ESCALATE to human - Include debug memory context ``` ## Escalation Protocol When max attempts reached: ```yaml escalation: include: - original_code - all_test_results - failure_analyses - fix_attempts - debug_memory_summary notification: channel: issue_comment template: | ## Execution Feedback Escalation **File**: {{file_path}} **Attempts**: {{attempt_count}} / {{max_attempts}} ### Failures {{failure_summary}} ### Analysis {{root_cause_analysis}} ### Attempted Fixes {{fix_attempts}} **Human review required** ``` ## Metrics Track these metrics for continuous improvement: | Metric | Target | Purpose | |--------|--------|---------| | First-attempt pass rate | >70% | Code generation quality | | Average attempts to pass | <2 | Iteration efficiency | | Escalation rate | <10% | Self-sufficiency | | Debug memory reuse | >30% | Learning effectiveness | ## Checklist Before returning code: - [ ] Tests generated for new code - [ ] Tests executed (not skipped) - [ ] All tests passing - [ ] Debug memory updated - [ ] Failures analyzed (if any occurred) - [ ] Coverage meets minimum - [ ] Learnings recorded ## References - @.aiwg/research/findings/REF-013-metagpt.md - Research paper - @$AIWG_ROOT/agentic/code/addons/ralph/schemas/debug-memory.yaml - Debug memory schema - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md - Test Engineer agent - #101 - Implementation issue --- **Rule Status**: ACTIVE **Last Updated**: 2026-01-25