aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

aiwg.io

jmagly/aiwg

1,665 lines (1,364 loc) • 59.7 kB

Markdown

# Ralph External Epic #26: Intelligent Control System Architecture **Version:** 1.0.0 **Date:** 2026-02-03 **Status:** Active **Epic Issue:** [#26](https://git.integrolabs.net/roctinam/ai-writing-guide/issues/26) ## Table of Contents 1. [Overview](#overview) 2. [Four-Layer Architecture](#four-layer-architecture) 3. [Layer 1: PID Control Layer](#layer-1-pid-control-layer) 4. [Layer 2: Claude Intelligence Layer](#layer-2-claude-intelligence-layer) 5. [Layer 3: Memory Layer](#layer-3-memory-layer) 6. [Layer 4: Overseer Layer](#layer-4-overseer-layer) 7. [Data Flow](#data-flow) 8. [Integration Points](#integration-points) 9. [Configuration](#configuration) 10. [Performance Characteristics](#performance-characteristics) --- ## Overview ### What Epic #26 Adds to Ralph External Epic #26 transforms Ralph External from a simple iteration loop into an **intelligent self-managing control system**. It adds four layers of coordinated intelligence that work together to provide: - **Adaptive behavior** through PID-inspired control feedback - **Strategic planning** via AI-powered prompt generation - **Cross-loop learning** through persistent semantic memory - **Autonomous oversight** with intervention and escalation ### The Intelligent Self-Managing Control System Concept Traditional Ralph External operated as a fixed-parameter loop: run Claude session → analyze output → repeat. Epic #26 introduces a **closed-loop control system** that continuously monitors performance, adjusts behavior, accumulates knowledge, and intervenes when pathological patterns emerge. **Key Insight:** By treating the Ralph loop as a control problem (similar to how a thermostat maintains temperature), we can dynamically adjust parameters, strategies, and focus based on real-time feedback. This enables the system to: 1. **Self-correct** when stuck or oscillating 2. **Learn** from past successes and failures across loops 3. **Adapt** prompts and strategies to current progress state 4. **Intervene** before wasting resources on unproductive iterations ### Benefits | Benefit | Without Epic #26 | With Epic #26 | |---------|------------------|---------------| | **Stuck Detection** | Human must notice and abort | PID detects no progress, adjusts gains automatically | | **Learning Transfer** | Each loop starts from scratch | Semantic memory provides relevant past learnings | | **Prompt Quality** | Static templates | Claude generates context-aware prompts per iteration | | **Safety** | Runs until max iterations | Overseer monitors and pauses/aborts on pathology | | **Resource Efficiency** | Fixed iteration budget | Early stopping when confidence high | --- ## Four-Layer Architecture The four layers form a **hierarchical control system** with increasing levels of abstraction: ``` ┌─────────────────────────────────────────────────────────────┐ │ Layer 4: OVERSEER │ │ • Health monitoring │ │ • Intervention system │ │ • Escalation to humans │ └─────────────────────────────────────────────────────────────┘ ▲ │ Health Reports │ ┌─────────────────────────────────────────────────────────────┐ │ Layer 3: MEMORY │ │ • Semantic knowledge store │ │ • Learning extraction │ │ • Cross-loop retrieval │ └─────────────────────────────────────────────────────────────┘ ▲ │ Learnings │ ┌─────────────────────────────────────────────────────────────┐ │ Layer 2: CLAUDE INTELLIGENCE │ │ • Dynamic prompt generation │ │ • Pre/post iteration validation │ │ • Strategy planning │ └─────────────────────────────────────────────────────────────┘ ▲ │ Strategic Guidance │ ┌─────────────────────────────────────────────────────────────┐ │ Layer 1: PID CONTROL │ │ • Metrics extraction (P/I/D) │ │ • Gain scheduling │ │ • Control signal generation │ │ • Alarm monitoring │ └─────────────────────────────────────────────────────────────┘ ▲ │ Iteration Results │ ┌─────┴─────┐ │ Claude │ │ Session │ └───────────┘ ``` ### Layer Responsibilities | Layer | Primary Function | Key Output | |-------|-----------------|------------| | **PID Control** | Measure and adjust | Control signals, alarms | | **Claude Intelligence** | Plan and validate | Optimized prompts, strategy | | **Memory** | Learn and recall | Relevant knowledge | | **Overseer** | Monitor and protect | Health status, interventions | --- ## Layer 1: PID Control Layer ### Purpose Implements **adaptive loop behavior** using control theory principles. Continuously measures the "error" (completion gap), accumulates history of issues (integral), and tracks rate of change (derivative) to provide feedback for parameter adjustment. ### Components #### 1. MetricsCollector **Responsibility:** Extract P/I/D values from iteration data. **Metrics Extraction:** ```javascript // From tools/ralph-external/metrics-collector.mjs extractIterationMetrics(iteration) { return { // Proportional: Current completion gap completionPercentage: normalize(iteration.analysis.completionPercentage), // Quality indicators qualityScore: normalize(iteration.analysis.qualityScore), errorCount: iteration.errorCount || 0, testsPassing: iteration.testsPassing || 0, // Integral inputs: accumulated issues learnings: extractLearnings(iteration), blockers: extractBlockers(iteration), // Derivative inputs: velocity indicators duration: iteration.duration, filesModified: iteration.filesModified, toolCallCount: iteration.toolCallCount, }; } ``` **PID Metric Computation:** | Metric | Formula | Meaning | |--------|---------|---------| | **Proportional (P)** | `1.0 - completionPercentage` | How far from done (0.0 = complete, 1.0 = not started) | | **Integral (I)** | `Σ(accumulated_learnings * decay)` | Memory of past issues and learnings | | **Derivative (D)** | `Δ(completionPercentage) / Δt` | Rate of progress (positive = improving, negative = regressing) | **Example:** ```javascript // Iteration 3: 60% complete P = 1.0 - 0.60 = 0.40 // 40% gap remaining // Accumulated 5 learnings over 3 iterations I = 5 * 0.9 = 4.5 // With 0.9 decay factor // Progress increased from 50% to 60% (10% gain) D = (0.60 - 0.50) / 1 = +0.10 // Positive velocity ``` #### 2. GainScheduler **Responsibility:** Adapt PID gains based on task phase and volatility. **Gain Profiles:** ```javascript const GAIN_PROFILES = { exploration: { kp: 0.3, // Low proportional (allow exploration) ki: 0.1, // Low integral (don't over-react to early issues) kd: 0.2, // Moderate derivative (track trends) }, focused: { kp: 0.5, // Medium proportional (balanced response) ki: 0.2, // Medium integral (learn from issues) kd: 0.3, // Higher derivative (detect stalls quickly) }, aggressive: { kp: 0.7, // High proportional (strong correction) ki: 0.3, // Higher integral (accumulate learnings) kd: 0.4, // High derivative (rapid trend detection) }, }; ``` **Adaptive Scheduling:** ```javascript adjustGains({ phase, volatility, errorMagnitude }) { if (phase === 'early') { return GAIN_PROFILES.exploration; } if (volatility > 0.3 || errorMagnitude > 0.7) { return GAIN_PROFILES.aggressive; } return GAIN_PROFILES.focused; } ``` #### 3. PIDController **Responsibility:** Integrate metrics, gains, and alarms to produce control decisions. **Control Output Calculation:** ```javascript compute(proportional) { // Update integral with decay this.integralAccumulator = (this.integralAccumulator * this.integralDecay) + proportional; // Calculate derivative (rate of change) const derivative = this.history.length > 0 ? proportional - this.history[this.history.length - 1].proportional : 0; // PID formula const output = (this.kp * proportional) + (this.ki * this.integralAccumulator) + (this.kd * derivative); // Clamp to [0, 1] return Math.max(0, Math.min(1, output)); } ``` **Control Signal Interpretation:** | Signal Range | Meaning | Suggested Action | |--------------|---------|------------------| | `0.0 - 0.3` | Near completion | Continue current approach | | `0.3 - 0.5` | Moderate gap | Maintain focus, monitor progress | | `0.5 - 0.7` | Significant gap | Consider strategy adjustment | | `0.7 - 1.0` | Critical gap | Pivot or escalate | #### 4. ControlAlarms **Responsibility:** Monitor for pathological behaviors and recommend interventions. **Alarm Types:** ```javascript const ALARM_TYPES = { STUCK: { threshold: 3, // 3 iterations with <5% progress severity: 'warning', interventions: ['pivot_strategy', 'simplify_scope'], }, OSCILLATING: { threshold: 3, // 3 sign changes in progress direction severity: 'critical', interventions: ['stabilize', 'constrain_scope'], }, REGRESSING: { threshold: 2, // 2 consecutive negative deltas severity: 'critical', interventions: ['review_changes', 'consider_rollback'], }, BUDGET_EXCEEDED: { threshold: 0.9, // 90% of max iterations severity: 'emergency', interventions: ['pause', 'escalate'], }, }; ``` ### PID Control Flow ``` PRE-ITERATION: ┌─────────────────────────────────────────────────────────┐ │ 1. MetricsCollector extracts P/I/D from last iteration │ │ • Proportional: 1 - completionPercentage │ │ • Integral: Accumulated learnings + issues │ │ • Derivative: ΔcompletionPercentage / Δt │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 2. GainScheduler adapts gains based on phase/volatility│ │ • Early phase → exploration gains │ │ • High volatility → aggressive gains │ │ • Normal → focused gains │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 3. PIDController computes control output │ │ output = (kp × P) + (ki × I) + (kd × D) │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 4. ControlAlarms check for pathologies │ │ • Stuck? Oscillating? Regressing? │ │ • Generate alarms with interventions │ └────────────────────┬────────────────────────────────────┘ ▼ Control Decision (signals + alarms → Layer 2) ``` --- ## Layer 2: Claude Intelligence Layer ### Purpose Replaces static prompt templates with **AI-powered iteration guidance**. Uses Claude itself to analyze loop state, plan strategy, generate optimized prompts, and validate results. ### Components #### 1. ClaudePromptGenerator **Responsibility:** Generate dynamic, context-aware prompts for each iteration. **Two-Phase Approach:** **Phase 1: Analysis Prompt Generation** ```javascript _buildAnalysisPrompt(context) { // Build analysis request for Claude return ` # Generate Dynamic Claude Code Session Prompt ## Task Information **Objective**: ${context.objective} **Completion Criteria**: ${context.completionCriteria} **Iteration**: ${context.iteration} / ${context.maxIterations} ## Current State **Progress**: ${context.stateAssessment.estimatedProgress}% **Phase**: ${context.stateAssessment.phase} **Blockers**: ${context.stateAssessment.blockers.join(', ')} ## Control Metrics **Completion Gap**: ${(context.pidMetrics.proportional * 100).toFixed(1)}% **Trend**: ${context.pidMetrics.trend} **Velocity**: ${(context.pidMetrics.velocity * 100).toFixed(1)}%/iteration ## Recommended Strategy **Approach**: ${context.strategyPlan.approach} **Reasoning**: ${context.strategyPlan.reasoning} **Priorities**: ${context.strategyPlan.priorities.join(', ')} ## Output Format Provide JSON with: { "prompt": "Main prompt text", "systemContext": "Context injection", "focusAreas": ["Priority 1", "Priority 2", "Priority 3"], "urgency": "low|normal|high|critical" } `.trim(); } ``` **Phase 2: Prompt Invocation** ```javascript async generate(context) { const analysisPrompt = this._buildAnalysisPrompt(context); // Invoke Claude to analyze and generate prompt const result = spawnSync('claude', [ '--dangerously-skip-permissions', '--print', '--output-format', 'json', '--model', 'sonnet', analysisPrompt, ]); const generated = this._parseClaudeResponse(result.stdout); return { prompt: generated.prompt, systemContext: generated.systemContext, focusAreas: generated.focusAreas, toolSuggestions: this._suggestTools(context, generated), metadata: { urgency: generated.urgency, generatedAt: Date.now(), }, }; } ``` **Adaptive Prompt Features:** - **Blocker-aware:** If blockers detected, prioritizes debugging - **Velocity-aware:** If regressing, suggests review/rollback - **Phase-aware:** Early iterations get exploration prompts, late get completion focus - **PID-aware:** High control signal increases urgency tone #### 2. ValidationAgent **Responsibility:** Pre and post-iteration validation. **Pre-Iteration Validation:** ```javascript async validatePre({ objective, strategy, assessment }) { const issues = []; // Check strategy coherence if (strategy.approach === 'pivot' && !strategy.reasoning) { issues.push({ severity: 'high', message: 'Pivot strategy lacks reasoning', }); } // Check for unresolved blockers if (assessment.blockers.length > 0) { issues.push({ severity: 'medium', message: `${assessment.blockers.length} blockers remain unaddressed`, }); } // Check iteration budget if (assessment.iteration > assessment.maxIterations * 0.9) { issues.push({ severity: 'critical', message: 'Approaching iteration limit', }); } return { valid: issues.filter(i => i.severity === 'critical').length === 0, issues, warnings: issues.filter(i => i.severity !== 'critical'), }; } ``` **Post-Iteration Validation:** ```javascript async validatePost({ analysis, sessionResult, completionCriteria }) { const errors = []; // Check for completion signal if (!sessionResult.completionSignal && analysis.completionPercentage >= 95) { errors.push({ type: 'missing_completion_signal', message: 'Near completion but no signal received', }); } // Check for tool errors if (sessionResult.errorCount > 5) { errors.push({ type: 'excessive_errors', message: `${sessionResult.errorCount} errors during session`, }); } return { valid: errors.length === 0, errors, }; } ``` #### 3. StrategyPlanner **Responsibility:** Analyze iteration history and recommend strategy adjustments. **Strategy Selection:** ```javascript _selectStrategy(situation, history, metrics) { // Critical: Has blockers - debug first if (situation.hasBlockers) { return { approach: 'pivot', reasoning: 'Blockers preventing progress - address root causes', adjustments: { focus: 'blocker_resolution' }, }; } // Stuck pattern - try different approach if (situation.stuck) { return { approach: 'pivot', reasoning: `No progress for ${this.stuckThreshold}+ iterations`, adjustments: { temperature: 0.8, reframeProblem: true }, }; } // Near completion - persist to finish if (situation.nearCompletion) { return { approach: 'persist', reasoning: 'Near completion (>80%) - finish current approach', adjustments: { focusOnCompletion: true }, }; } // Default: continue return { approach: 'persist', reasoning: 'Normal progress - continue', adjustments: {}, }; } ``` ### Claude Intelligence Flow ``` PRE-ITERATION: ┌─────────────────────────────────────────────────────────┐ │ 1. StrategyPlanner analyzes history + PID metrics │ │ → Recommends: persist|pivot|escalate │ │ → Provides priorities and adjustments │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 2. ValidationAgent runs pre-iteration checks │ │ → Validates strategy coherence │ │ → Flags unresolved blockers │ │ → Warns about iteration budget │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 3. ClaudePromptGenerator builds analysis prompt │ │ Includes: state, PID metrics, strategy, validation │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 4. Invoke Claude to generate optimized prompt │ │ → Context-aware prompt text │ │ → Focus areas prioritized │ │ → Urgency level set │ └────────────────────┬────────────────────────────────────┘ ▼ Launch Claude Session ▼ POST-ITERATION: ┌─────────────────────────────────────────────────────────┐ │ 5. ValidationAgent validates session results │ │ → Check completion signal │ │ → Check error counts │ │ → Validate quality metrics │ └─────────────────────────────────────────────────────────┘ ``` --- ## Layer 3: Memory Layer ### Purpose Provides **cross-loop persistent knowledge** that accumulates successful strategies, anti-patterns, project conventions, and time estimates across multiple loop executions. ### Three-Tier Memory Architecture ``` ┌──────────────────────────────────────────────────────────┐ │ L1: Working Memory (Claude Session) │ │ • Temporary conversation context │ │ • Lost when session ends │ │ • Scope: Single iteration │ └──────────────────────────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────┐ │ L2: Episodic Memory (Loop State) │ │ • Iteration history within current loop │ │ • Accumulated learnings for this task │ │ • Scope: Single loop execution │ └──────────────────────────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────┐ │ L3: Semantic Memory (Persistent Knowledge) │ │ • Proven strategies across all loops │ │ • Anti-patterns to avoid │ │ • Project conventions │ │ • Time/iteration estimates by task type │ │ • Scope: All loops, all time │ └──────────────────────────────────────────────────────────┘ ``` ### Components #### 1. SemanticMemory **Responsibility:** Persistent cross-loop knowledge store. **Learning Types:** ```javascript const LEARNING_TYPES = { strategy: { // Proven approaches for specific task types example: { taskType: 'test-fix', content: { approach: 'Start with test file, work backward to implementation', toolSequence: ['Grep', 'Read', 'Edit', 'Bash'], successRate: 0.85, }, }, }, antipattern: { // Approaches that consistently fail example: { taskType: 'refactor', content: { pattern: 'Large multi-file refactors in single iteration', consequence: 'High regression risk, difficult rollback', alternative: 'Incremental refactor with tests per step', }, }, }, estimate: { // Time/iteration predictions example: { taskType: 'feature', content: { estimatedIterations: 3.5, confidence: 0.75, basedOnSamples: 12, }, }, }, convention: { // Project-specific patterns example: { taskType: 'general', content: { convention: 'Always update CHANGELOG.md before commit', source: 'Project guidelines', }, }, }, }; ``` **Storage Format:** ```yaml # .aiwg/knowledge/ralph-learnings.json { "version": "1.0.0", "checksum": "a1b2c3d4...", # SHA-256 for corruption detection "lastUpdated": "2026-02-03T10:30:00Z", "learnings": [ { "id": "learn-a1b2c3", "type": "strategy", "taskType": "test-fix", "content": { "approach": "Start with failing test, trace to root cause", "toolSequence": ["Bash", "Grep", "Read", "Edit", "Bash"], "avgIterations": 2.3 }, "confidence": 0.85, "sourceLoops": ["loop-001", "loop-005", "loop-012"], "useCount": 5, "successRate": 0.8, "createdAt": "2026-01-15T08:00:00Z", "updatedAt": "2026-02-01T14:00:00Z" } ], "stats": { "totalLearnings": 42, "byType": { "strategy": 15, "antipattern": 8, "estimate": 12, "convention": 7 } } } ``` **Query API:** ```javascript // Retrieve relevant strategies for current task const relevant = semanticMemory.query({ type: 'strategy', taskType: 'test-fix', minConfidence: 0.7, minSuccessRate: 0.6, limit: 5, }); ``` #### 2. LearningExtractor **Responsibility:** Extract actionable learnings from iteration outcomes. **Extraction Logic:** ```javascript async extract({ iteration, analysis, strategy, outcome }) { const learnings = []; // Success pattern extraction if (outcome === 'success' && analysis.completionPercentage >= 90) { learnings.push({ type: 'strategy', content: { approach: strategy.approach, toolsUsed: this._extractToolSequence(iteration), blockers: analysis.blockers, resolution: analysis.resolutionSteps, }, confidence: 0.9, }); } // Anti-pattern extraction if (outcome === 'failure' && iteration.repeatedIssues.length > 0) { learnings.push({ type: 'antipattern', content: { pattern: iteration.repeatedIssues[0], consequence: 'Failed after multiple attempts', detectedIn: iteration.number, }, confidence: 0.7, }); } return learnings; } ``` #### 3. MemoryPromotion **Responsibility:** Promote valuable episodic memories (L2) to semantic memory (L3). **Promotion Criteria:** ```javascript promote({ learnings, source }) { for (const learning of learnings) { // Check if learning meets promotion threshold if (learning.confidence < 0.6) continue; if (learning.useCount < 2) continue; // Check if similar learning exists const existing = this.semanticMemory.query({ type: learning.type, taskType: learning.taskType, limit: 1, }); if (existing.length > 0) { // Merge: increase confidence, update success rate this.semanticMemory.update(existing[0].id, { confidence: (existing[0].confidence + learning.confidence) / 2, successRate: this._updateSuccessRate(existing[0], learning), sourceLoops: [...existing[0].sourceLoops, source], useCount: existing[0].useCount + 1, }); } else { // Store new learning this.semanticMemory.store( learning.type, learning.taskType, learning.content, { confidence: learning.confidence, sourceLoops: [source] } ); } } } ``` #### 4. MemoryRetrieval **Responsibility:** Retrieve relevant cross-loop knowledge for current task. **Retrieval Strategy:** ```javascript async retrieve({ query, context, maxResults }) { // Semantic similarity search (simplified example) const taskType = this._classifyTaskType(query); // Get strategies const strategies = this.semanticMemory.query({ type: 'strategy', taskType, minConfidence: 0.7, limit: 3, }); // Get anti-patterns to avoid const antipatterns = this.semanticMemory.query({ type: 'antipattern', taskType, limit: 2, }); // Get time estimates const estimates = this.semanticMemory.query({ type: 'estimate', taskType, limit: 1, }); return { strategies, antipatterns, estimates, contextSummary: this._buildContextSummary(strategies, antipatterns), }; } ``` ### Memory Layer Flow ``` POST-ITERATION: ┌─────────────────────────────────────────────────────────┐ │ 1. LearningExtractor analyzes iteration outcome │ │ → Successful? Extract strategy │ │ → Failed? Extract anti-pattern │ │ → Timing data? Extract estimate │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 2. MemoryPromotion evaluates learnings │ │ → Check promotion threshold (confidence, useCount) │ │ → Merge with existing or store new │ │ → Update L3 semantic memory │ └─────────────────────────────────────────────────────────┘ PRE-ITERATION (Next Loop): ┌─────────────────────────────────────────────────────────┐ │ 3. MemoryRetrieval queries semantic memory │ │ → Classify current task type │ │ → Retrieve relevant strategies │ │ → Get anti-patterns to avoid │ │ → Include in prompt context │ └─────────────────────────────────────────────────────────┘ ``` --- ## Layer 4: Overseer Layer ### Purpose Provides **autonomous health monitoring and intervention** to prevent pathological loop behaviors: stuck loops, oscillation, objective deviation, resource burn, and quality regression. ### Five Intervention Levels | Level | Trigger | Action | Example | |-------|---------|--------|---------| | **LOG** | Informational | Record to log | "Progress slightly slower than average" | | **WARN** | Minor issue | Display warning | "Approaching iteration budget (70%)" | | **REDIRECT** | Moderate issue | Adjust strategy | "Oscillation detected - stabilize" | | **PAUSE** | Serious issue | Pause loop, request human guidance | "Stuck for 5 iterations" | | **ABORT** | Critical issue | Abort loop immediately | "Budget exhausted, no progress" | ### Components #### 1. BehaviorDetector **Responsibility:** Detect pathological patterns in iteration history. **Detection Patterns:** ```javascript const BEHAVIOR_PATTERNS = { stuck: { // No meaningful progress for N iterations detect: (history, threshold = 3) => { const recent = history.slice(-threshold); const progressChanges = recent.map((h, i) => i === 0 ? 0 : h.completionPercentage - recent[i-1].completionPercentage ); return progressChanges.every(c => Math.abs(c) < 5); // <5% change }, severity: 'high', }, oscillating: { // Back-and-forth changes in progress direction detect: (history, threshold = 3) => { const deltas = history.map((h, i) => i === 0 ? 0 : h.completionPercentage - history[i-1].completionPercentage ); let signChanges = 0; for (let i = 1; i < deltas.length; i++) { if (Math.sign(deltas[i]) !== Math.sign(deltas[i-1])) signChanges++; } return signChanges >= threshold; }, severity: 'critical', }, regressing: { // Consecutive negative progress detect: (history, threshold = 2) => { const recent = history.slice(-threshold); return recent.every((h, i) => i === 0 ? true : h.completionPercentage < recent[i-1].completionPercentage ); }, severity: 'critical', }, objective_deviation: { // Working on unrelated tasks detect: (history, objective) => { const lastIteration = history[history.length - 1]; const relevance = this._calculateRelevance(lastIteration.filesModified, objective); return relevance < 0.3; // <30% relevance }, severity: 'medium', }, resource_burn: { // High token/time usage with low progress detect: (history) => { const totalTokens = history.reduce((sum, h) => sum + h.toolCallCount, 0); const avgProgress = history.reduce((sum, h) => sum + h.completionPercentage, 0) / history.length; return totalTokens > 5000 && avgProgress < 30; }, severity: 'high', }, }; ``` #### 2. InterventionSystem **Responsibility:** Apply interventions based on detections. **Intervention Logic:** ```javascript intervene(detection) { const { type, severity } = detection; // Map severity to intervention level const level = this._mapSeverityToLevel(severity, type); // Build intervention const intervention = { level, detection, reason: this._buildReason(detection), recommendations: this._buildRecommendations(detection), timestamp: Date.now(), }; // Execute intervention switch (level) { case 'LOG': this._log(intervention); break; case 'WARN': this._warn(intervention); break; case 'REDIRECT': this._redirect(intervention); break; case 'PAUSE': this._pause(intervention); break; case 'ABORT': this._abort(intervention); break; } return intervention; } ``` **REDIRECT Example:** ```javascript _redirect(intervention) { const { detection } = intervention; if (detection.type === 'oscillating') { // Inject stabilization instructions this.injectedInstructions = [ 'Focus on incremental changes', 'Commit progress after each step', 'Avoid large refactors mid-iteration', ]; // Adjust PID gains to dampen oscillation this.pidController.updateGains({ kp: 0.3, // Reduce proportional (less reactive) kd: 0.5, // Increase derivative (dampen oscillation) }); } } ``` **PAUSE Example:** ```javascript _pause(intervention) { this.isPaused = true; console.log(` ╔═══════════════════════════════════════════════════════╗ ║ OVERSEER: LOOP PAUSED ║ ╠═══════════════════════════════════════════════════════╣ ║ Reason: ${intervention.reason} ║ ║ ║ ║ Detected: ${intervention.detection.type} ║ ║ Severity: ${intervention.detection.severity} ║ ║ ║ ║ Recommendations: ║ ${intervention.recommendations.map(r => `║ • ${r}`).join('\n')} ║ ║ ║ Awaiting human guidance to continue... ║ ╚═══════════════════════════════════════════════════════╝ `); } ``` #### 3. EscalationHandler **Responsibility:** Escalate to humans via Gitea issues. **Escalation Levels:** ```javascript const ESCALATION_LEVELS = { INFO: { priority: 'low', labels: ['ralph-external', 'info'], }, WARNING: { priority: 'medium', labels: ['ralph-external', 'warning'], }, CRITICAL: { priority: 'high', labels: ['ralph-external', 'critical', 'needs-review'], }, EMERGENCY: { priority: 'urgent', labels: ['ralph-external', 'emergency', 'blocking'], }, }; ``` **Escalation Process:** ```javascript async escalate(level, context) { const { loopId, iterationNumber, reason, detection } = context; // Build issue title const title = `[Ralph External] ${level}: ${detection.type} in ${loopId}`; // Build issue body const body = ` ## Overseer Escalation **Loop ID:** ${loopId} **Iteration:** ${iterationNumber} **Level:** ${level} **Detection:** ${detection.type} ### Reason ${reason} ### Context ${JSON.stringify(context, null, 2)} ### Recommendations ${context.recommendations.map(r => `- ${r}`).join('\n')} ### Next Steps Please review and provide guidance to continue. `.trim(); // Create Gitea issue const issue = await this.giteaClient.createIssue({ title, body, labels: ESCALATION_LEVELS[level].labels, }); this.escalationLog.push({ level, issueNumber: issue.number, issueUrl: issue.html_url, context, timestamp: Date.now(), }); return issue; } ``` #### 4. Overseer (Coordinator) **Responsibility:** Coordinate detection, intervention, and escalation. **Health Check Flow:** ```javascript async check(iterationResult) { // 1. Add to history this.iterationHistory.push(iterationResult); // 2. Run detections const detections = this.behaviorDetector.detect(this.iterationHistory); // 3. Process interventions const interventions = []; for (const detection of detections) { const intervention = this.interventionSystem.intervene(detection); interventions.push(intervention); // 4. Auto-escalate if needed if (this.shouldEscalate(intervention)) { await this.escalate(intervention, iterationResult); } } // 5. Determine overall status const status = this._determineStatus(detections, interventions); // 6. Create health check record const healthCheck = { iterationNumber: iterationResult.number, timestamp: Date.now(), detections, interventions, status, metrics: this._computeMetrics(), }; // 7. Log and persist this.healthCheckLog.push(healthCheck); this.saveLog(); return healthCheck; } ``` ### Overseer Flow ``` POST-ITERATION: ┌─────────────────────────────────────────────────────────┐ │ 1. BehaviorDetector scans iteration history │ │ → Check for: stuck, oscillating, regressing │ │ → Check for: objective deviation, resource burn │ │ → Generate detections with severity │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 2. InterventionSystem processes detections │ │ → Map severity to intervention level │ │ → Execute: LOG, WARN, REDIRECT, PAUSE, or ABORT │ │ → Record intervention in log │ └────────────────────┬────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────┐ │ 3. Check if escalation needed │ │ → PAUSE/ABORT always escalate │ │ → Critical detections escalate │ │ → Prolonged issues escalate │ └────────────────────┬────────────────────────────────────┘ ▼ (if escalation needed) ┌─────────────────────────────────────────────────────────┐ │ 4. EscalationHandler creates Gitea issue │ │ → Build context summary │ │ → Set priority and labels │ │ → Provide recommendations │ │ → Await human guidance │ └─────────────────────────────────────────────────────────┘ ``` --- ## Data Flow ### Complete Iteration Data Flow ``` ═══════════════════════════════════════════════════════════ PRE-ITERATION ═══════════════════════════════════════════════════════════ 1. StateAssessor → Assess current situation ├─ Parse last iteration output ├─ Compute estimated progress └─ Identify blockers 2. MemoryRetrieval → Get relevant knowledge (L3) ├─ Query semantic memory for task type ├─ Retrieve proven strategies └─ Get anti-patterns to avoid 3. StrategyPlanner → Plan iteration strategy (Layer 2) ├─ Analyze situation from StateAssessor ├─ Consider iteration history ├─ Recommend: persist | pivot | escalate └─ Build priority list 4. MetricsCollector → Extract PID metrics (Layer 1) ├─ P = 1 - completionPercentage ├─ I = Σ(accumulated_learnings × decay) └─ D = Δ(completionPercentage) / Δt 5. GainScheduler → Adapt PID gains (Layer 1) ├─ Check phase: early | mid | late ├─ Check volatility: low | medium | high └─ Select gain profile: exploration | focused | aggressive 6. PIDController → Compute control signal (Layer 1) ├─ output = (kp × P) + (ki × I) + (kd × D) └─ Clamp to [0, 1] 7. ControlAlarms → Check for pathologies (Layer 1) ├─ Stuck? Oscillating? Regressing? └─ Generate alarms with interventions 8. ValidationAgent → Pre-iteration validation (Layer 2) ├─ Validate strategy coherence ├─ Check for unresolved blockers └─ Warn about iteration budget 9. ClaudePromptGenerator → Generate dynamic prompt (Layer 2) ├─ Build analysis prompt with all context ├─ Invoke Claude to analyze and generate ├─ Parse JSON response └─ Add tool suggestions ───────────────────────────────────────────────────────── DURING ITERATION ───────────────────────────────────────────────────────── 10. SessionLauncher → Execute Claude session ├─ Launch with generated prompt ├─ Monitor for completion signal └─ Capture output and errors ═══════════════════════════════════════════════════════════ POST-ITERATION ═══════════════════════════════════════════════════════════ 11. ValidationAgent → Post-iteration validation (Layer 2) ├─ Check completion signal ├─ Validate error counts └─ Check quality metrics 12. BehaviorDetector → Detect pathologies (Layer 4) ├─ Analyze iteration history ├─ Check for: stuck, oscillating, regressing └─ Generate detections with severity 13. InterventionSystem → Apply interventions (Layer 4) ├─ Map severity to intervention level ├─ Execute: LOG, WARN, REDIRECT, PAUSE, ABORT └─ Record in intervention log 14. Overseer → Health check (Layer 4) ├─ Coordinate detection and intervention ├─ Determine overall health status ├─ Escalate to human if needed └─ Persist health check log 15. LearningExtractor → Extract learnings (Layer 3) ├─ Success? Extract strategy ├─ Failure? Extract anti-pattern └─ Record timing estimates 16. MemoryPromotion → Promote to L3 (Layer 3) ├─ Check promotion threshold ├─ Merge with existing or store new └─ Update semantic memory 17. StateManager → Update loop state ├─ Record iteration result ├─ Update accumulated learnings └─ Persist state to disk ``` ### Simplified Flow Diagram ``` ┌─────────────┐ │ Last │ │ Iteration │ │ Result │ └──────┬──────┘ │ ▼ ┌──────────────────────────────────────────┐ │ PRE-ITERATION INTELLIGENCE │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ Memory │───▶│ Strategy │ │ │ │ Retrieval │ │ Planner │ │ │ │ (L3) │ │ (L2) │ │ │ └────────────┘ └─────┬──────┘ │ │ │ │ │ ┌────────────┐ ┌─────▼──────┐ │ │ │ PID │───▶│ Claude │ │ │ │ Controller │ │ Prompt │ │ │ │ (L1) │ │ Generator │ │ │ └────────────┘ │ (L2) │ │ │ └─────┬──────┘ │ │ │ │ │ ┌────────────┐ │ │ │ │ Validation │◀─────────┘ │ │ │ Agent (L2) │ │ │ └────────────┘ │ └──────────────┬───────────────────────────┘ │ ▼ ┌───────────────┐ │ Claude │ │ Session │ │ Execution │ └───────┬───────┘ │ ▼ ┌──────────────────────────────────────────┐ │ POST-ITERATION OVERSIGHT │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ Validation │───▶│ Behavior │ │ │ │ Agent │ │ Detector │ │ │ │ (L2) │ │ (L4) │ │ │ └────────────┘ └─────┬──────┘ │ │ │ │ │ ┌────────────┐ ┌─────▼──────┐ │ │ │ Learning │───▶│ Overseer │ │ │ │ Extractor │ │ (L4) │ │ │ │ (L3) │ │ │ │ │ └─────┬──────┘ └────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────┐ │ │ │ Memory │ │ │ │ Promotion │ │ │ │ (L3) │ │ │ └────────────┘ │ └──────────────────────────────────────────┘ ``` --- ## Integration Points ### Layer Communication | From | To | Data Type | Example | |------|----|-----------| --------| | PID Controller (L1) | Claude Prompt (L2) | Control signals | `{ output: 0.65, urgency: 'high' }` | | Memory Retrieval (L3) | Strategy Planner (L2) | Relevant knowledge | `{ strategies: [...], antipatterns: [...] }` | | Strategy Planner (L2) | Prompt Generator (L2) | Strategy plan | `{ approach: 'pivot', priorities: [...] }` | | Validation Agent (L2) | Overseer (L4) | Validation issues | `{ errors: [...], warnings: [...] }` | | Behavior Detector (L4) | Intervention System (L4) | Detections | `{ type: 'stuck', severity: 'high' }` | | Learning Extractor (L3) | Memory Promotion (L3) | Learnings | `{ type: 'strategy', content: {...} }` | ### State Object Structure The central state object flows through all layers: ```javascript const state = { // Identification loopId: 'loop-abc123', sessionId: 'session-def456', // Progress currentIteration: 3, maxIterations: 10, // Task objective: 'Fix failing authentication tests', completionCriteria: 'All tests in test/auth/** pass', // Configuration config: { enablePIDControl: true, enableOverseer: true, enableSemanticMemory: true, // ... }, // Layer 1: PID State pidMetrics: { proportional: 0.35, integral: 2.1, derivative: -0.05, }, // Layer 2: Strategy State currentStrategy: { approach: 'persist', reasoning: 'Making good progress, continue', priorities: ['Complete test fixes', 'Validate coverage'], }, // Layer 3: Memory State relevantKnowledge: { strategies: [...], antipatterns: [...], }, // Layer 4: Health State overseerStatus: { status: 'healthy', detections: [], interventions: [], }, // Iteration history iterations: [ { number: 1, ... }, { number: 2, ... }, { number: 3, ... }, ], }; ``` ### Error Propagation Errors are handled hierarchically: ``` Layer 4 (Overseer) ├─ Catches: Critical failures, intervention failures └─ Action: Escalate to human, abort loop Layer 3 (Memory) ├─ Catches: Memory corruption, promotion failures └─ Action: Use backup, continue without memory Layer 2 (Claude Intelligence) ├─ Catches: Prompt generation failures, validation errors └─ Action: Use fallback prompt, log warning Layer 1 (PID Control) ├─ Catches: Metric calculation errors, gain adjustment failures └─ Action: Use default gains, continue with warning ``` --- ## Configuration ### Three Main Flags ```javascript const config = { // Layer 1: PID Control enablePIDControl: true, // Default: true pidOptions: { windowSize: 5, integralDecay: 0.9, noiseThreshold: 0.05, }, // Layer 4: Overseer enableOverseer: true, // Default: true overseerOptions: { stuckThreshold: 3, oscillationThreshold: 3, autoEscalate: true, }, // Layer 3: Semantic Memory enableSemanticMemory: true, // Default: true memoryOptions: { storagePath: '.aiwg/knowledge', minConfidence: 0.7, }, }; ``` ### Layer-Specific Configuration #### PID Control Configuration ```javascript pidConfig: { // Metrics collection windowSize: 5, // Iterations for derivative calculation integralDecay: 0.9, // Decay factor for integral noiseThreshold: 0.05, // Deadband for noise filtering // Gain scheduling adaptiveGains: true, // Enable adaptive gain scheduling initialProfile: 'standard', // exploration | standard | aggressive transitionSmoothing: 0.3, // Gain transition smoothing // Alarms stuckThreshold: 3, // Iterations before stuck alarm budgetWarning: 0.9, // Warn at 90% iterations autoIntervene: false, // Auto-apply alarm interventions } ``` #### Claude Intelligence Configuration ```javascript intelligenceConfig: { // Prompt generation model: 'sonnet', // Claude model for prompt generation timeout: 90000, // Generation timeout (ms) verbose: false, // Enable verbose output // Strategy planning stuckThreshold: 3, // Iterations to consider stuck oscillationThreshold: 3, // Sign changes for oscillation escalationThreshold: 7, // Iterations before escalation // Validation strictValidation: true, // Strict pre/post validation } ``` #### Memory Configuration ```javascript memoryConfig: { // Storage storagePath: '.aiwg/knowledge', backupEnabled: true, // Query minConfidence: 0.7, minSuccessRate: 0.6, maxResults: 5, // Promotion promotionThreshold: 0.6, // Min confidence to promote minUseCount: 2, // Min uses before promotion } ``` #### Overseer Configuration ```javascript overseerConfig: { // Detection stuckThreshold: 3, oscillationThreshold: 3, regressionThreshold: 2, // Intervention autoEscalate: true, // Auto-escalate critical issues pauseOnCritical: true, // Pause on critical detections // Escalation giteaRepo: 'roctinam/ai-writing-guide', escalationLevels: { info: { priority: 'low', labels: ['ralph-external', 'info'] }, warning: { priority: 'medium', labels: ['ralph-external', 'warning'] }, critical: { priority: 'high', labels: ['ralph-external', 'critical'] }, }, } ``` ### Default Behaviors | Flag | Default | Disabled Behavior | |------|---------|-------------------| | `enablePIDControl` | `true` | Uses static template prompts, no adaptive gains | | `enableOverseer` | `true` | No health monitoring, runs until max iterations | | `enableSemanticMemory` | `true` | No cross-loop learning, each loop starts fresh | --- ## Performance Characteristics ### Layer Overhead | Layer | Overhead per Iteration | Bottleneck | |-------|------------------------|------------| | **PID Control** | ~50ms | Metric extraction and gain calculation | | **Claude Intelligence** | ~5-10s | Prompt generation via Claude invocation | | **Memory** | ~100ms | Semantic memory query and retrieval | | **Overseer** | ~200ms | Behavior detection across history | | **Total Epic #26 Overhead** | ~6-11s | Dominated by prompt generation | ### Scaling Characteristics | Metric | Scaling | Notes | |--------|---------|-------| | **PID History** | O(windowSize) | Typically 5 iterations, bounded | | **Memory Query** | O(log n) | Efficient indexing in semantic memory | | **Behavior Detection** | O(history length) | Linear scan, but history is bounded | | **Prompt Generation** | O(1) | Fixed cost per invocation | ### Memory Footprint | Component | Memory Usage | |-----------|--------------| | PID History | ~1 KB per iteration (5 iter = ~5 KB) | | Semantic Memory | ~50-500 KB (depends on learning count) | | Overseer Log | ~10 KB per iteration | | Total Epic #26 | ~100-600 KB additional | ### Trade-offs | Feature | Benefit | Cost | |---------|---------|------| | **Claude-Generated Prompts** | Higher quality, context-aware | +5-10s per iteration | | **Semantic Memory** | Cross-loop learning | +100ms query, +50-500 KB storage | | **Overseer Monitoring** | Safety and intervention | +200ms detection | | **PID Adaptive Gains** | Better convergence | +50ms metrics | --- ## References ### Implementation Files - **Orchestrator:** `tools/ralph-external/orchestrator.mjs` - **PID Controller:** `tools/ralph-external/pid-controller.mjs` - **Metrics Collector:** `tools/ralph-external/metrics-collector.mjs` - **Gain Scheduler:** `tools/ralph-external/gain-scheduler.mjs` - **Control Alarms:** `tools/ralph-external/control-alarms.mjs` - **Claude Prompt Generator:** `tools/ralph-external/lib/claude-prompt-generator.mjs` - **Validation Agent:** `tools/ralph-external/lib/validation-agent.mjs` - **Strategy Planner:** `tools/ralph-external/lib/strategy-planner.mjs` - **Semantic Memory:** `tools/ralph-external/lib/semantic-memory.mjs` - **Learning Extractor:** `tools/ralph-external/lib/learning-extractor.mjs` - **Memory Promotion:** `tools/ralph-external/lib/memory-promotion.mjs` - **Memory Retrieval:** `tools/ralph-external/lib/memory-retrieval.mjs` - **Overseer:** `tools/ralph-external/lib/overseer.mjs` - **Behavior Detector:** `tools/ralph-external/