aiwg
Version:
Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.
1,617 lines (1,246 loc) • 43.1 kB
Markdown
# Ralph External Epic #26: Best Practices Guide
**Version:** 1.0.0
**Date:** 2026-02-03
**Status:** Active
**Epic:** [#26](https://git.integrolabs.net/roctinam/ai-writing-guide/issues/26)
## Table of Contents
1. [Philosophy](#philosophy)
2. [Getting Started Best Practices](#getting-started-best-practices)
3. [Task Definition Best Practices](#task-definition-best-practices)
4. [PID Control Best Practices](#pid-control-best-practices)
5. [Claude Intelligence Best Practices](#claude-intelligence-best-practices)
6. [Memory Layer Best Practices](#memory-layer-best-practices)
7. [Oversight Best Practices](#oversight-best-practices)
8. [Anti-Patterns to Avoid](#anti-patterns-to-avoid)
9. [Team Usage Patterns](#team-usage-patterns)
10. [Performance Optimization](#performance-optimization)
## Philosophy
### "Iteration Beats Perfection"
Epic #26 embodies the principle that **intelligent iteration is more valuable than upfront perfection**. The four-layer architecture is designed to:
1. **Learn from mistakes** (Memory Layer) rather than avoid them
2. **Adapt dynamically** (PID Control) rather than plan exhaustively
3. **Self-correct autonomously** (Overseer) rather than abort on issues
4. **Improve prompts iteratively** (Claude Intelligence) rather than craft perfect templates
**Key insight:** The system treats each iteration as an experiment. Failures are data points that improve future iterations, not reasons to stop.
### When to Trust the Intelligent System
Epic #26 is most effective when you **trust its automation** in these areas:
| Area | Trust Level | Why |
|------|-------------|-----|
| **Prompt generation** | HIGH | Claude generates better context-aware prompts than static templates |
| **Gain scheduling** | HIGH | PID adapts to task dynamics better than fixed parameters |
| **Memory retrieval** | MEDIUM | Relevant learnings improve with corpus size over time |
| **Intervention timing** | MEDIUM | Overseer catches stuck/oscillating patterns humans might miss |
| **Strategy pivoting** | MEDIUM | Strategy planner detects when to change approach |
**Trust, but verify:** Enable verbose mode for the first few loops to understand how the system makes decisions. Once confident, let it run autonomously.
### When Manual Control Is Better
Override automation for:
1. **Highly novel tasks** - First-time task types (no memory to draw from)
2. **Security-critical work** - Human review trumps automation for breaking changes
3. **Rapidly changing requirements** - Frequent objective shifts confuse the system
4. **Very short tasks** - Overhead dominates benefit for 1-2 iteration tasks
5. **Known bad patterns** - If semantic memory has low-quality learnings, disable temporarily
### Finding the Right Balance
**Start automated, tune when needed:**
```bash
# First run: Full automation
ralph-external "Implement OAuth2" --completion "tests pass"
# Second run: Same task type with learnings
ralph-external "Implement SAML" --completion "tests pass"
# → Memory provides OAuth2 strategy automatically
# Third run: Tune if needed
ralph-external "Implement LDAP" --completion "tests pass" \
--pid-profile conservative # Override if last run was unstable
```
**The 80/20 rule:** 80% of the value comes from default configuration. Only tune the 20% that matters for your specific task.
## Getting Started Best Practices
### Start with Default Configuration
**DO:**
```bash
# First time using Epic #26 - use defaults
ralph-external "Fix authentication bugs" \
--completion "npm test -- auth passes"
```
**Rationale:** Default configuration is tuned for typical development tasks. Starting simple lets you understand baseline behavior before customizing.
### Enable Layers Incrementally
If unfamiliar with Epic #26, enable layers one at a time:
**Week 1: PID Control Only**
```bash
export AIWG_RALPH_OVERSEER_ENABLED=false
export AIWG_RALPH_MEMORY_ENABLED=false
ralph-external "Task 1" --completion "criteria"
# Observe: How do control signals change? When does it pivot?
```
**Week 2: Add Overseer**
```bash
export AIWG_RALPH_OVERSEER_ENABLED=true
export AIWG_RALPH_MEMORY_ENABLED=false
ralph-external "Task 2" --completion "criteria"
# Observe: What interventions trigger? Are thresholds appropriate?
```
**Week 3: Add Memory**
```bash
export AIWG_RALPH_OVERSEER_ENABLED=true
export AIWG_RALPH_MEMORY_ENABLED=true
ralph-external "Task 3" --completion "criteria"
# Observe: What learnings are retrieved? Are they helpful?
```
**Week 4: Full System**
```bash
# All layers enabled (default)
ralph-external "Task 4" --completion "criteria"
```
### Monitor Before Tuning
**DO:**
```bash
# Enable verbose mode for first 5 loops
AIWG_RALPH_VERBOSE=true ralph-external "Task" --completion "criteria"
# Review logs after each iteration
tail -f .aiwg/ralph-external/iterations/001/stdout.log
```
**DON'T:**
```bash
# Don't tune blindly without understanding baseline
ralph-external "Task" --completion "criteria" \
--pid-profile aggressive \
--overseer-threshold 5 \
--memory-min-confidence 0.9 # Too many changes at once
```
**Monitoring checklist:**
- [ ] Read completion report after first loop
- [ ] Check PID control signals (stdout shows `Control signal: 0.XX`)
- [ ] Review overseer health checks (`.aiwg/ralph-external/overseer/health-log.json`)
- [ ] Examine retrieved learnings (`.aiwg/knowledge/ralph-learnings.json`)
- [ ] Identify any repeated interventions (signals a tuning opportunity)
## Task Definition Best Practices
### Writing Clear, Measurable Completion Criteria
**GOOD:**
```bash
# Objective: Verifiable with automated check
ralph-external "Fix failing authentication tests" \
--completion "npm test -- --testPathPattern=auth exits 0"
# Objective: Specific metric threshold
ralph-external "Improve test coverage" \
--completion "npx nyc report | grep 'Statements.*: 80'"
# Objective: Multiple verifiable conditions
ralph-external "Migrate to TypeScript" \
--completion "npx tsc --noEmit && npm test"
```
**BAD:**
```bash
# Too vague - what does "better" mean?
ralph-external "Make code better" \
--completion "code is improved"
# Not verifiable - requires human judgment
ralph-external "Refactor authentication" \
--completion "architecture is clean"
# Ambiguous - what is "most"?
ralph-external "Fix bugs" \
--completion "most bugs are fixed"
```
**Criteria quality checklist:**
- [ ] Can be verified by running a command (test, lint, build, etc.)
- [ ] Returns clear success/failure (exit code 0/1)
- [ ] Doesn't require human interpretation
- [ ] Specific enough to guide iteration (not too broad)
- [ ] Achievable within iteration budget
### Providing Sufficient Context
**GOOD:**
```bash
# Context in objective, clear success criteria
ralph-external \
"Migrate authentication from session-based to JWT, maintaining backward compatibility for 30 days" \
--completion "npm test -- auth && npm run migration-check"
```
**BETTER:**
```bash
# Context + key files to monitor
ralph-external \
"Migrate authentication to JWT (backward compatible)" \
--completion "npm test -- auth && npm run migration-check" \
--key-files "src/auth/session.ts,src/auth/jwt.ts,test/auth"
```
**BEST:**
```bash
# Context + key files + Claude assessment for complex migration
ralph-external \
"Migrate authentication to JWT (backward compatible)" \
--completion "npm test -- auth && npm run migration-check" \
--key-files "src/auth/session.ts,src/auth/jwt.ts,test/auth" \
--use-claude-assessment # Claude evaluates progress beyond test pass/fail
```
### Scoping Tasks Appropriately
**Too Small** (overhead dominates):
```bash
# Single-line fix - use internal Ralph or do manually
ralph-external "Fix typo in README" --completion "spelling correct"
```
**Just Right** (sweet spot):
```bash
# 3-5 iterations expected, clear scope
ralph-external "Implement password reset flow" \
--completion "npm test -- password-reset passes"
```
**Too Large** (scope creep risk):
```bash
# Too broad - break into sub-tasks
ralph-external "Build complete user management system" \
--completion "all user features working"
```
**Scoping guidelines:**
| Task Type | Expected Iterations | Max Iterations | Budget/Iteration |
|-----------|---------------------|----------------|------------------|
| Bug fix (simple) | 1-2 | 3 | $1.0 |
| Bug fix (complex) | 3-5 | 8 | $2.0 |
| Feature (small) | 2-4 | 6 | $1.5 |
| Feature (medium) | 4-8 | 12 | $2.0 |
| Feature (large) | 8-15 | 20 | $3.0 |
| Refactoring | 5-10 | 15 | $2.5 |
| Migration | 10-20 | 30 | $4.0 |
### Examples: Good vs Bad Task Definitions
**Example 1: Authentication Feature**
❌ **Bad:**
```bash
ralph-external "Add login" --completion "login works"
```
**Issues:**
- Too vague objective
- Unverifiable completion
- No context about requirements
✅ **Good:**
```bash
ralph-external \
"Implement email/password login with bcrypt hashing and session management" \
--completion "npm test -- --testPathPattern=auth/login passes && npm run security-check"
```
**Why better:**
- Specific technical approach
- Verifiable success criteria
- Security validation included
**Example 2: Refactoring Task**
❌ **Bad:**
```bash
ralph-external "Clean up code" --completion "code is clean"
```
✅ **Good:**
```bash
ralph-external \
"Extract authentication logic from controllers into dedicated auth service module" \
--completion "npm test && npx eslint src/auth/ --max-warnings 0" \
--key-files "src/controllers/,src/services/auth/"
```
**Example 3: Migration Task**
❌ **Bad:**
```bash
ralph-external "Update dependencies" --completion "packages updated"
```
✅ **Good:**
```bash
ralph-external \
"Migrate from Express 4.x to Express 5.x while maintaining API compatibility" \
--completion "npm test && npm run integration-tests && node scripts/verify-express5.js" \
--max-iterations 15 \
--budget 3.0 \
--use-claude-assessment
```
## PID Control Best Practices
### When to Use Different Gain Profiles
**Conservative Profile** (`kp=0.3, ki=0.05, kd=0.4`):
**Use for:**
- Security-sensitive changes (auth, encryption, access control)
- Breaking changes (API contract changes, schema migrations)
- Production deployments
- Tasks with high rollback cost
**Example:**
```bash
ralph-external "Migrate to new encryption algorithm" \
--completion "all encrypted data readable && security-scan passes" \
--pid-profile conservative
```
**Why:** High derivative gain (`kd=0.4`) dampens oscillation, low integral gain (`ki=0.05`) prevents overreaction to noise. Perfect for high-stakes tasks.
**Standard Profile** (`kp=0.5, ki=0.15, kd=0.25`) - **DEFAULT**:
**Use for:**
- Typical development tasks
- Feature implementation
- Bug fixes
- Refactoring
**Example:**
```bash
# No need to specify - standard is default
ralph-external "Implement user profile page" \
--completion "npm test -- profile passes"
```
**Why:** Balanced gains for normal task dynamics. Works for 80% of use cases.
**Aggressive Profile** (`kp=0.8, ki=0.25, kd=0.1`):
**Use for:**
- Documentation updates
- Configuration changes
- Simple fixes with clear solutions
- Tasks where fast iteration is valuable
**Example:**
```bash
ralph-external "Update README with new API examples" \
--completion "markdownlint README.md passes" \
--pid-profile aggressive
```
**Why:** High proportional gain (`kp=0.8`) responds quickly to completion gap. Low derivative (`kd=0.1`) allows rapid changes without damping.
**Recovery Profile** (`kp=1.0, ki=0.4, kd=-0.1`):
**Use for:**
- Stuck loops (manually switched)
- Oscillating tasks
- When default profiles aren't making progress
**Example:**
```bash
# Detected stuck pattern, switching to recovery
ralph-external "Complex state machine refactor" \
--completion "tests pass" \
--pid-profile recovery
```
**Why:** Maximum proportional/integral gains force aggressive action. Negative derivative (`kd=-0.1`) amplifies momentum changes to break stuck patterns.
**Cautious Profile** (`kp=0.2, ki=0.02, kd=0.5`):
**Use for:**
- Near-completion fine-tuning (auto-selected by gain scheduler)
- Tasks requiring precision over speed
- Situations where overcorrection is risky
**Auto-selected when:** `completionPercentage > 80%`
**Why:** Low gains prevent overshooting when close to goal. High derivative detects when to stop.
### Interpreting Control Signals
Control output is a value from 0.0 to 1.0 indicating "urgency of action":
| Signal Range | Interpretation | Suggested Behavior |
|--------------|----------------|-------------------|
| `0.0 - 0.3` | **Near completion** | Continue current approach, focus on finishing |
| `0.3 - 0.5` | **Moderate gap** | Maintain pace, monitor for blockers |
| `0.5 - 0.7` | **Significant gap** | Consider strategy adjustment if no progress in 2 iterations |
| `0.7 - 0.9` | **Critical gap** | Likely stuck or wrong approach - expect pivot |
| `0.9 - 1.0` | **Emergency** | Immediate intervention needed (overseer likely triggering) |
**Example control signal progression:**
```
Iteration 1: 0.85 (Critical gap - starting from scratch)
Iteration 2: 0.72 (Improving - made some progress)
Iteration 3: 0.55 (Moderate - steady progress)
Iteration 4: 0.38 (Near completion - almost done)
Iteration 5: 0.15 (Finishing - final touches)
```
**Healthy pattern:** Decreasing signal over iterations (converging).
**Unhealthy patterns:**
```
# Stuck pattern
Iteration 1: 0.80
Iteration 2: 0.78 (Minimal change)
Iteration 3: 0.79 (No progress)
Iteration 4: 0.81 (Slight regression)
→ Expect: Overseer REDIRECT or PAUSE
# Oscillating pattern
Iteration 1: 0.70
Iteration 2: 0.45 (Good progress)
Iteration 3: 0.68 (Regressed)
Iteration 4: 0.42 (Improved again)
→ Expect: PID dampening + Overseer WARNING
```
### When to Disable PID Control
**Disable when:**
1. **Task is deterministic** - Single clear path, no adaptation needed
```bash
AIWG_RALPH_PID_ENABLED=false ralph-external \
"Update version numbers in package.json" \
--completion "grep '\"version\": \"2.0.0\"' package.json"
```
2. **Debugging Epic #26** - Isolate PID behavior
```bash
AIWG_RALPH_PID_ENABLED=false ralph-external "Task" --completion "..."
# See how loop behaves without adaptive gains
```
3. **Extremely simple tasks** (1-2 iterations max)
```bash
# PID overhead not worth it
AIWG_RALPH_PID_ENABLED=false ralph-external \
"Fix typo in docs" \
--completion "docs updated"
```
**Keep enabled for:**
- Tasks with uncertain iteration count
- Tasks where progress is non-linear
- Long-running tasks (>5 iterations)
- Tasks with potential for getting stuck
### Signs of Misconfigured PID
**Problem:** Overshooting (completing too fast with quality issues)
**Symptoms:**
- Tests pass but logic is wrong
- Coverage drops
- Regressions introduced
**Fix:**
```bash
# Use conservative profile for more careful approach
ralph-external "Task" --completion "..." \
--pid-profile conservative
```
**Problem:** Oscillation (back-and-forth changes)
**Symptoms:**
- Derivative shows frequent sign changes
- Files modified then reverted
- Control signal oscillates
**Fix:**
```bash
# Increase derivative gain to dampen oscillation
# (Usually auto-handled by gain scheduler, but can force conservative)
ralph-external "Task" --completion "..." \
--pid-profile conservative # Higher kd=0.4
```
**Problem:** Slow convergence (too many iterations)
**Symptoms:**
- Control signal decreases very slowly
- Progress is steady but inefficient
- Hitting max iteration limit
**Fix:**
```bash
# Use aggressive profile for faster response
ralph-external "Task" --completion "..." \
--pid-profile aggressive
```
**Problem:** Stuck detection too sensitive
**Symptoms:**
- Frequent "stuck" alarms when actually making progress
- PID signals suggest movement but alarms trigger
**Fix:**
```javascript
// In programmatic usage
await orchestrator.execute({
objective: "...",
completionCriteria: "...",
pidConfig: {
thresholds: {
stagnationWindow: 5, // Increase from default 3
stagnationThreshold: 0.01, // Decrease sensitivity
},
},
});
```
## Claude Intelligence Best Practices
### When AI-Generated Prompts Help vs Hurt
**AI prompts HELP when:**
1. **Task context is complex** - Multiple files, intricate dependencies
```bash
ralph-external "Refactor authentication across 12 files" \
--completion "tests pass" \
--use-claude-assessment
# Claude generates context-aware prompts better than templates
```
2. **Task evolves during execution** - Blockers appear, scope shifts
- Claude adapts prompt based on iteration history
- Incorporates learnings from semantic memory
- Adjusts strategy based on PID signals
3. **Cross-loop learnings available** - Similar tasks executed before
- Memory layer provides relevant strategies
- Claude integrates learnings into prompt context
**AI prompts HURT when:**
1. **Task is extremely simple** - Overhead not worth it
```bash
# For trivial tasks, static template is fine
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external \
"Update version in package.json" \
--completion "grep '2.0.0' package.json"
```
2. **Requirements are rigid** - No room for interpretation
- Example: Compliance tasks with exact steps
- Better to use explicit instructions in objective
3. **Debugging/isolation needed** - Want to control exact prompts
```bash
# Disable intelligence to test specific prompt
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external "..." --completion "..."
```
### Balancing Validation Strictness
**Strict validation** (more checks, slower iterations):
```javascript
await orchestrator.execute({
objective: "Security-critical feature",
completionCriteria: "...",
validation: {
preIteration: {
checkBuild: true,
checkGit: true,
checkDependencies: true,
checkLint: true, // Enable for security tasks
},
postIteration: {
checkTests: true,
checkRegression: true,
checkCoverage: true, // Ensure no coverage drop
},
},
});
```
**Relaxed validation** (fewer checks, faster iterations):
```javascript
await orchestrator.execute({
objective: "Update documentation",
completionCriteria: "...",
validation: {
preIteration: {
checkBuild: false, // No build for docs
checkGit: true,
checkDependencies: false,
},
postIteration: {
checkTests: false, // No tests for docs
checkRegression: false,
},
},
});
```
**Balanced validation** (default):
```javascript
// Default validation is reasonable for most tasks
await orchestrator.execute({
objective: "Implement feature",
completionCriteria: "tests pass",
// Uses default validation (build, tests, git)
});
```
### Strategy Planner Tuning
**Default strategy planner** works well for most tasks. Tune only if:
1. **Escalation too early** - Planner gives up before truly stuck
```javascript
await orchestrator.execute({
objective: "Complex refactoring",
completionCriteria: "...",
strategyPlanner: {
escalationThreshold: 0.2, // Lower = escalate later (default: 0.3)
confidenceThreshold: 0.4, // Lower = more willing to try (default: 0.5)
},
});
```
2. **Not pivoting when stuck** - Planner persists too long
```javascript
await orchestrator.execute({
objective: "Task prone to getting stuck",
completionCriteria: "...",
strategyPlanner: {
historyDepth: 3, // Shorter history = faster pivot detection (default: 5)
},
});
```
### When to Use Simpler Prompts
**Disable Claude-generated prompts when:**
1. **Budget constrained** - Saves 5-10 seconds per iteration
```bash
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external "..." --completion "..."
```
2. **Task is straightforward** - Static template sufficient
```bash
# Simple fix - no need for AI prompt
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external \
"Fix linting errors" \
--completion "npx eslint . exits 0"
```
3. **Testing/debugging** - Want deterministic prompts
```bash
# Isolate other layers, disable intelligence
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external "..." --completion "..."
```
**Keep Claude prompts enabled for:**
- Complex tasks with uncertain paths
- Tasks requiring strategy adaptation
- Tasks benefiting from semantic memory integration
- Long-running loops where prompt quality matters
## Memory Layer Best Practices
### Building a Useful Knowledge Base Over Time
**Phase 1: Initial Corpus (Weeks 1-2)**
Start with empty memory, let Epic #26 extract learnings:
```bash
# Enable memory from day 1
export AIWG_RALPH_MEMORY_ENABLED=true
export AIWG_KNOWLEDGE_DIR=.aiwg/knowledge
# Run normal tasks
ralph-external "Implement feature A" --completion "..."
ralph-external "Fix bug B" --completion "..."
ralph-external "Refactor module C" --completion "..."
```
After 10-15 loops, inspect corpus:
```bash
cat .aiwg/knowledge/ralph-learnings.json | jq '.stats'
# Should show:
# {
# "totalLearnings": 15-25,
# "byType": {
# "strategy": 8,
# "antipattern": 3,
# "estimate": 7,
# "convention": 4
# }
# }
```
**Phase 2: Quality Review (Week 3)**
Review learnings for quality:
```bash
# List all strategies
cat .aiwg/knowledge/ralph-learnings.json | \
jq '.learnings[] | select(.type == "strategy") | .content.description'
```
Look for:
- ✅ Specific, actionable strategies
- ✅ High confidence (>0.7) and success rate (>0.6)
- ❌ Vague/generic advice
- ❌ Low success rate (<0.5) strategies
**Phase 3: Maintenance (Ongoing)**
Monthly tasks:
1. **Prune stale learnings** - Remove outdated patterns
2. **Promote high-value learnings** - Manually boost confidence for proven strategies
3. **Tag learnings** - Add task type tags for better retrieval
```javascript
// Example: Prune stale learnings
import { SemanticMemory } from './lib/semantic-memory.mjs';
const memory = new SemanticMemory('.aiwg/knowledge');
const learnings = await memory.query({ type: 'all' });
// Remove learnings with:
// - Low success rate (<0.4)
// - Not used in 60 days
// - Confidence <0.5
for (const learning of learnings) {
const daysSinceUpdate = (Date.now() - new Date(learning.updatedAt)) / (1000 * 60 * 60 * 24);
if (learning.successRate < 0.4 || daysSinceUpdate > 60 || learning.confidence < 0.5) {
console.log(`Removing stale learning: ${learning.id}`);
await memory.remove(learning.id);
}
}
```
### Pruning Stale Learnings
**When to prune:**
| Condition | Action |
|-----------|--------|
| Success rate <0.4 | Remove (unreliable) |
| Not used in 90 days | Remove (stale) |
| Confidence <0.5 and useCount <3 | Remove (unvalidated) |
| Task type deprecated | Remove (obsolete) |
**How to prune:**
```bash
# Manual prune via script
node tools/prune-learnings.mjs --threshold 0.4 --days 90
# Or programmatic
import { SemanticMemory } from './lib/semantic-memory.mjs';
const memory = new SemanticMemory('.aiwg/knowledge');
// Prune learnings
await memory.prune({
minSuccessRate: 0.4,
maxStaleDays: 90,
minConfidence: 0.5,
minUseCount: 3,
});
```
### Manual Promotion for High-Value Insights
When you discover a particularly valuable learning manually, promote it:
```javascript
import { SemanticMemory } from './lib/semantic-memory.mjs';
const memory = new SemanticMemory('.aiwg/knowledge');
// Manually add high-value learning
await memory.store({
type: 'strategy',
taskType: 'security',
content: {
description: "Always run OWASP ZAP scan before deploying auth changes",
reasoning: "Prevents 90% of auth vulnerabilities (company policy)",
applicability: ["auth", "security", "deployment"],
},
confidence: 0.95, // High confidence for policy
sourceLoops: ['manual-entry'],
metadata: {
priority: 'critical',
source: 'security-team',
},
});
```
**High-value learning characteristics:**
- Directly prevents known failures
- Enforces company policies/conventions
- Proven across multiple projects
- Time-saving shortcuts discovered by team
### When Cross-Loop Learning Doesn't Help
**Disable memory when:**
1. **Task is highly novel** - No similar tasks in history
```bash
# First time working with new technology
AIWG_RALPH_MEMORY_ENABLED=false ralph-external \
"Initial setup of GraphQL server" \
--completion "..."
```
2. **Memory has low-quality learnings** - Need to reset corpus
```bash
# Temporarily disable while cleaning memory
AIWG_RALPH_MEMORY_ENABLED=false ralph-external "..." --completion "..."
# Clean memory
rm .aiwg/knowledge/ralph-learnings.json
# Re-enable
AIWG_RALPH_MEMORY_ENABLED=true ralph-external "..." --completion "..."
```
3. **Task is one-off** - Won't repeat, learning not valuable
```bash
# One-time migration - no future value
AIWG_RALPH_MEMORY_ENABLED=false ralph-external \
"Migrate legacy DB schema (deprecated system)" \
--completion "..."
```
**Keep memory enabled for:**
- Recurring task types (auth, testing, deployment)
- Tasks with reusable patterns
- Team tasks where learnings transfer across developers
## Oversight Best Practices
### Setting Appropriate Thresholds for Your Project
**Default thresholds** (good for most projects):
```javascript
{
stuckThreshold: 3, // 3 iterations without progress
oscillationWindow: 5, // Check last 5 iterations
regressionSensitivity: 0.1, // 10% quality drop
resourceBurnThreshold: 0.8, // 80% budget with <50% progress
}
```
**Tight thresholds** (for critical projects):
```javascript
// Detect issues faster, intervene earlier
await orchestrator.execute({
objective: "Production deployment",
completionCriteria: "...",
overseer: {
stuckThreshold: 2, // Only 2 iterations without progress
oscillationWindow: 3, // Smaller window
regressionSensitivity: 0.05, // 5% quality drop triggers alarm
},
});
```
**Loose thresholds** (for experimental work):
```javascript
// Allow more exploration, fewer interventions
await orchestrator.execute({
objective: "Exploratory refactoring",
completionCriteria: "...",
overseer: {
stuckThreshold: 5, // Allow 5 iterations of exploration
oscillationWindow: 10, // Wider tolerance
regressionSensitivity: 0.2, // 20% quality drop acceptable
},
});
```
### Responding to Interventions
**LOG** (informational):
- **Action:** None required, just awareness
- **Example:** "Progress slower than average"
- **Response:** Monitor next iteration
**WARN** (minor issue):
- **Action:** Review current approach
- **Example:** "Approaching iteration budget (70%)"
- **Response:** Consider increasing `maxIterations` or simplifying scope
**REDIRECT** (moderate issue):
- **Action:** Overseer injects guidance
- **Example:** "Oscillation detected - stabilizing"
- **Response:** Trust injected instructions, monitor for stabilization
**PAUSE** (serious issue):
- **Action:** Loop paused, awaiting human decision
- **Example:** "Stuck for 5 iterations"
- **Response:** Review progress, provide guidance:
```bash
# Option 1: Resume with adjusted scope
ralph-external-resume --simplify-scope
# Option 2: Resume with strategy change
ralph-external-resume --pivot "try test-first approach"
# Option 3: Abort and restart
ralph-external-abort
```
**ABORT** (critical issue):
- **Action:** Loop terminated immediately
- **Example:** "Budget exhausted, no progress"
- **Response:** Review logs, adjust configuration for retry
### When to Trust ABORT vs Override
**Trust ABORT when:**
1. **Budget exhausted with <30% progress** - Unlikely to complete
2. **Objective deviation >70%** - Working on wrong thing
3. **Regression for 3+ consecutive iterations** - Digging deeper hole
4. **Critical errors detected** (syntax errors, build failures)
**Override ABORT when:**
1. **Near completion** - 80%+ progress, just needs one more iteration
```bash
# Override abort, force one more iteration
ralph-external-resume --force
```
2. **Known temporary issue** - Flaky test, transient error
```bash
# Acknowledge issue, retry
ralph-external-resume --ignore-last-error
```
3. **Overseer misconfigured** - Threshold too strict
```bash
# Adjust threshold, resume
export AIWG_RALPH_STUCK_THRESHOLD=5
ralph-external-resume
```
**Decision tree:**
```
ABORT triggered
├─ Progress >80%? → OVERRIDE (likely transient issue)
├─ Budget >90% used? → TRUST ABORT (won't finish)
├─ Regressing for 3+ iter? → TRUST ABORT (wrong approach)
└─ Unclear? → REVIEW LOGS
├─ Blocker identified? → OVERRIDE with blocker fix
└─ No clear blocker? → TRUST ABORT (something fundamentally wrong)
```
### Escalation Channel Setup
**Desktop notifications** (default):
```javascript
// Enabled by default
overseer: {
escalation: {
desktop: true,
},
}
```
**Gitea issues** (recommended for teams):
```bash
# Configure Gitea escalation
export AIWG_GITEA_TOKEN=$(cat ~/.config/gitea/token)
export AIWG_GITEA_REPO=roctinam/ai-writing-guide
ralph-external "Critical task" --completion "..." --gitea-issue
```
Result: Critical interventions create Gitea issues with:
- Loop ID
- Iteration number
- Detection details
- Recommended actions
- Full context for review
**Webhook** (for custom integrations):
```javascript
overseer: {
escalation: {
webhook: 'https://hooks.example.com/ralph',
headers: {
'Authorization': 'Bearer token123',
},
},
}
```
**Slack** (planned, not yet implemented):
```javascript
// Future
overseer: {
escalation: {
slack: {
webhook: 'https://hooks.slack.com/services/...',
channel: '#ralph-alerts',
},
},
}
```
## Anti-Patterns to Avoid
### Over-Tuning: Constantly Adjusting Parameters
**ANTI-PATTERN:**
```bash
# Iteration 1
ralph-external "Task" --completion "..." --pid-profile standard
# Iteration 2 (saw oscillation)
ralph-external "Task" --completion "..." --pid-profile conservative
# Iteration 3 (too slow)
ralph-external "Task" --completion "..." --pid-profile aggressive
# Iteration 4 (stuck)
ralph-external "Task" --completion "..." --pid-profile recovery
```
**Problem:** Constant tuning prevents learning stable configuration. PID needs consistent parameters to adapt.
**SOLUTION:**
```bash
# Start with default, let adaptive gains handle it
ralph-external "Task" --completion "..."
# Only tune if pattern persists across 3+ runs
ralph-external "Task A" --completion "..." # Run 1: oscillation
ralph-external "Task B" --completion "..." # Run 2: oscillation
ralph-external "Task C" --completion "..." # Run 3: oscillation
# Now justified to tune
ralph-external "Task D" --completion "..." --pid-profile conservative
```
### Under-Monitoring: Not Checking Oversight Reports
**ANTI-PATTERN:**
```bash
# Fire and forget
ralph-external "Long task" --completion "..." &
# Never check:
# - .aiwg/ralph-external/overseer/health-log.json
# - .aiwg/ralph-external/iterations/*/analysis.json
# - Completion report
# Discover issues after loop aborts
```
**Problem:** Miss early warning signs of issues.
**SOLUTION:**
```bash
# Monitor during execution
ralph-external "Long task" --completion "..." &
# Check health periodically
watch -n 60 'cat .aiwg/ralph-external/overseer/health-log.json | jq ".[-1]"'
# Review after completion
cat .aiwg/ralph-external/completion-report.md
```
**Monitoring checklist:**
- [ ] Check health status after each iteration (verbose mode)
- [ ] Review completion report after loop finishes
- [ ] Examine intervention log for repeated patterns
- [ ] Verify memory promotions are sensible
### Ignoring Interventions
**ANTI-PATTERN:**
```bash
# Overseer: "WARN - Approaching iteration budget"
# Human: (Ignores)
# Overseer: "REDIRECT - Oscillation detected"
# Human: (Ignores)
# Overseer: "PAUSE - Stuck for 5 iterations"
# Human: (Forces resume without addressing root cause)
# Overseer: "ABORT - Budget exhausted"
# Human: "Why did it abort?!" (Ignored all warnings)
```
**Problem:** Interventions exist for a reason. Ignoring them leads to wasted resources.
**SOLUTION:**
When intervention triggers:
1. **Read the reason** - Don't blindly override
2. **Check logs** - Understand what overseer detected
3. **Address root cause** - Fix the underlying issue
4. **Resume or adjust** - Then continue
```bash
# Overseer: "PAUSE - Stuck for 5 iterations"
# Good response:
cat .aiwg/ralph-external/iterations/*/analysis.json
# Identify blocker: "Missing dependency X"
npm install X
ralph-external-resume --with-fix "Installed missing dependency X"
# Bad response:
ralph-external-resume --force # Ignore root cause
```
### Trusting Memory Blindly
**ANTI-PATTERN:**
```bash
# Memory says: "For auth tasks, always disable tests temporarily to make progress"
# (Bad learning from one-off situation)
# Loop retrieves this learning
# Applies it blindly
# Introduces security bug
# Memory never questioned
```
**Problem:** Semantic memory can learn bad patterns if not curated.
**SOLUTION:**
1. **Review learnings monthly**
```bash
cat .aiwg/knowledge/ralph-learnings.json | \
jq '.learnings[] | select(.successRate < 0.6)'
# Inspect low-success learnings
```
2. **Remove bad patterns**
```javascript
await memory.remove('learn-abc123');
```
3. **Override bad retrieval**
```bash
# If memory suggests bad approach, disable for this run
AIWG_RALPH_MEMORY_ENABLED=false ralph-external "..." --completion "..."
```
4. **Manually curate high-stakes learnings**
```javascript
// Ensure critical learnings are high quality
await memory.update('learn-xyz789', {
confidence: 0.95,
metadata: { validated: true, source: 'team-lead' },
});
```
### Running Without Oversight on Critical Tasks
**ANTI-PATTERN:**
```bash
# Production deployment without oversight
AIWG_RALPH_OVERSEER_ENABLED=false ralph-external \
"Deploy to production" \
--completion "deployment successful"
```
**Problem:** No safety net for critical operations.
**SOLUTION:**
```bash
# Always enable oversight for critical tasks
# Use strict thresholds
ralph-external "Deploy to production" \
--completion "deployment successful && health-check passes" \
--overseer-stuck-threshold 2 \
--overseer-regression-sensitivity 0.05 \
--gitea-issue # Escalate any issues
```
**Critical task checklist:**
- [ ] Oversight ENABLED
- [ ] Strict thresholds configured
- [ ] Escalation channel tested (Gitea/webhook)
- [ ] Conservative PID profile
- [ ] Comprehensive validation checks
- [ ] Budget headroom for retries
## Team Usage Patterns
### Shared Memory Stores
**Setup shared knowledge directory:**
```bash
# Team convention: shared knowledge in project
mkdir -p .aiwg/knowledge
# All team members use same dir
export AIWG_KNOWLEDGE_DIR=.aiwg/knowledge
# Commit learnings to version control
git add .aiwg/knowledge/ralph-learnings.json
git commit -m "Update shared learnings"
```
**Benefits:**
- New team members inherit proven strategies
- Cross-developer learning transfer
- Consistent approaches across team
**Considerations:**
- Occasional conflicts in `ralph-learnings.json`
- Resolve by merging `learnings` arrays
- Re-compute `stats` section
- May need pruning as team grows
**Merge script:**
```bash
#!/bin/bash
# tools/merge-learnings.sh
# Merge learnings from two branches
OURS=.aiwg/knowledge/ralph-learnings.json
THEIRS=$1
jq -s '.[0].learnings + .[1].learnings | unique_by(.id) | {learnings: .}' \
$OURS $THEIRS > merged.json
mv merged.json $OURS
echo "Merged learnings from $THEIRS"
```
### Consistent Configurations
**Team conventions file:**
```yaml
# .aiwg/ralph/team-config.yaml
pidControl:
defaultProfile: standard
conservativeFor:
- security
- auth
- encryption
- production
overseer:
stuckThreshold: 3
escalateVia: gitea
giteaRepo: team/project
memory:
sharedPath: .aiwg/knowledge
minConfidence: 0.7
pruneInterval: monthly
validation:
alwaysRunTests: true
alwaysCheckBuild: true
requireLintForPR: true
```
**Usage:**
```bash
# Load team config (planned feature)
ralph-external "Task" --completion "..." --config .aiwg/ralph/team-config.yaml
```
**Until config files supported, use shell aliases:**
```bash
# .bashrc or .zshrc
alias ralph-team='ralph-external --pid-profile standard --overseer-stuck-threshold 3 --use-claude-assessment'
# Usage
ralph-team "Task" --completion "..."
```
### Learning from Team Loops
**Pattern: Weekly learning review**
```bash
# Every Friday, team reviews new learnings
cat .aiwg/knowledge/ralph-learnings.json | \
jq '.learnings[] | select(.createdAt > "2026-01-27")' | \
jq -s 'group_by(.type) | map({type: .[0].type, count: length, items: .})'
# Discuss:
# - Which learnings are valuable?
# - Any bad patterns to remove?
# - Promote any to higher confidence?
```
**Pattern: Tagging conventions**
```javascript
// Add team member tags to learnings
await memory.update('learn-abc123', {
metadata: {
discoveredBy: 'alice',
validated: true,
team: 'backend',
priority: 'high',
},
});
// Query by tag
const backendStrategies = await memory.query({
type: 'strategy',
metadata: { team: 'backend' },
});
```
**Pattern: Pair programming with Ralph**
```bash
# Developer A starts task
ralph-external "Implement feature X" --completion "tests pass" --gitea-issue
# Loop gets stuck, creates Gitea issue
# Developer B sees issue, reviews context
# Developer B provides guidance in issue
# Developer A resumes with guidance
ralph-external-resume --with-guidance "Use test-first approach from issue #123"
```
## Performance Optimization
### Reducing Overhead for Simple Tasks
Epic #26 adds ~6-11 seconds per iteration (dominated by Claude prompt generation). For simple tasks, reduce overhead:
**Disable Claude Intelligence:**
```bash
# Simple task - static prompt is fine
AIWG_RALPH_INTELLIGENCE_ENABLED=false ralph-external \
"Update version number" \
--completion "grep '2.0.0' package.json"
# Savings: ~5-10 seconds per iteration
```
**Disable Semantic Memory:**
```bash
# One-off task - no learning value
AIWG_RALPH_MEMORY_ENABLED=false ralph-external \
"One-time migration" \
--completion "..."
# Savings: ~100ms per iteration
```
**Reduce Validation:**
```javascript
await orchestrator.execute({
objective: "Simple docs update",
completionCriteria: "...",
validation: {
preIteration: {
checkBuild: false,
checkDependencies: false,
checkLint: false,
},
postIteration: {
checkTests: false,
checkRegression: false,
},
},
});
// Savings: 5-30 seconds per iteration (depending on build/test time)
```
### When to Disable Layers
**PID Control:**
- **Disable for:** Deterministic 1-2 iteration tasks
- **Keep for:** Anything >3 iterations or uncertain
**Claude Intelligence:**
- **Disable for:** Simple tasks, tight budgets
- **Keep for:** Complex tasks, adaptive prompts valuable
**Semantic Memory:**
- **Disable for:** One-off tasks, novel work
- **Keep for:** Recurring patterns, team knowledge
**Overseer:**
- **Disable for:** Trusted simple tasks
- **Keep for:** Long-running tasks, critical work
**Combinations:**
```bash
# Minimal Epic #26 (fastest)
AIWG_RALPH_PID_ENABLED=false \
AIWG_RALPH_INTELLIGENCE_ENABLED=false \
AIWG_RALPH_MEMORY_ENABLED=false \
AIWG_RALPH_OVERSEER_ENABLED=false \
ralph-external "Simple task" --completion "..."
# Moderate (PID + Overseer only)
AIWG_RALPH_INTELLIGENCE_ENABLED=false \
AIWG_RALPH_MEMORY_ENABLED=false \
ralph-external "Medium task" --completion "..."
# Full intelligence (all layers)
ralph-external "Complex task" --completion "..."
```
### Monitoring Resource Usage
**Track token usage per iteration:**
```bash
# Check iteration analysis for token counts
cat .aiwg/ralph-external/iterations/001/analysis.json | jq '.tokenUsage'
# {
# "input": 12000,
# "output": 3000,
# "total": 15000,
# "cost": 0.45
# }
```
**Track total loop cost:**
```bash
# In completion report
cat .aiwg/ralph-external/completion-report.md | grep "Total Cost"
# Total Cost: $3.20 (5 iterations × ~$0.64 avg)
```
**Budget alerts:**
```javascript
await orchestrator.execute({
objective: "Task",
completionCriteria: "...",
budgetPerIteration: 2.0, # $2 per iteration
maxIterations: 10, # Max $20 total
onIterationComplete: (iteration, state) => {
const spent = state.iterations.reduce((sum, i) => sum + i.cost, 0);
const budget = state.maxIterations * state.budgetPerIteration;
if (spent > budget * 0.8) {
console.warn(`Budget warning: ${spent}/${budget} spent (${(spent/budget*100).toFixed(0)}%)`);
}
},
});
```
**Resource usage dashboard (planned):**
```bash
# Future feature
ralph-external-dashboard --loop ralph-001
# Shows:
# - Token usage per iteration
# - Cost breakdown (prompt gen, session, validation)
# - Time per layer
# - Memory retrieval efficiency
```
## Summary
### Quick Reference
| Scenario | Recommended Configuration |
|----------|--------------------------|
| **Simple task (1-2 iter)** | Disable intelligence, memory, validation |
| **Typical feature (3-8 iter)** | Default configuration (all layers enabled) |
| **Complex refactoring (8-15 iter)** | Default + increased budget, use Claude assessment |
| **Security-critical** | Conservative PID, strict validation, Gitea escalation |
| **Experimental work** | Aggressive PID, relaxed overseer, no memory |
| **Team collaboration** | Shared memory, Gitea issues, consistent config |
### Decision Tree
```
New task
├─ Simple (<3 iter)? → Disable intelligence + memory
├─ Critical (security/production)? → Conservative + strict validation
├─ Novel (first-time)? → Disable memory, standard PID
├─ Recurring (similar to past)? → Enable all layers (leverage memory)
└─ Experimental? → Aggressive PID, loose overseer
```
### Key Takeaways
1. **Start with defaults** - Epic #26 defaults work for 80% of tasks
2. **Monitor first, tune later** - Understand baseline before customizing
3. **Trust the system** - Let intelligence layers do their job
4. **Intervene when needed** - Override when system is clearly wrong
5. **Maintain memory** - Monthly pruning keeps knowledge base healthy
6. **Escalate critical issues** - Use Gitea/webhooks for team visibility
7. **Optimize selectively** - Disable layers only when overhead dominates value
## References
- **Epic #26 Architecture:** @docs/ralph-epic-26-architecture.md
- **Epic #26 Configuration:** @docs/ralph-epic-26-configuration.md
- **Ralph External Command:** @.claude/commands/ralph-external.md
- **Issue #22:** Claude Intelligence Layer
- **Issue #23:** PID Control Layer
- **Issue #24:** Memory Layer
- **Issue #25:** Overseer Layer
**Document Status:** Complete
**Last Updated:** 2026-02-03
**Maintainer:** AIWG Team