aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

aiwg.io

jmagly/aiwg

811 lines (587 loc) • 33.8 kB

Markdown

# RLM Context Management Rules **Enforcement Level**: HIGH **Scope**: All agents operating on large codebases or document corpora **Addon**: rlm (Recursive Language Model patterns) **Research Basis**: REF-089 Recursive Language Models (Zhang et al., 2026) **Issue**: #322 (Core RLM Addon) ## Overview These rules enforce Recursive Language Model (RLM) patterns for context management when working with large codebases, documentation corpora, or multi-file operations. Research shows that treating context as an external environment accessed programmatically through code outperforms loading entire contexts into the conversation window — up to 3x cost reduction while maintaining stronger performance. ## Research Foundation These rules synthesize five references in the AIWG research corpus. Each finding is hedged according to its GRADE quality assessment. | REF | Source | GRADE | Used For | |-----|--------|-------|----------| | REF-089 (Zhang et al., MIT CSAIL, 2026) | arXiv 2512.24601v2 | LOW (peer-review pending) | Core RLM paradigm — Rules 1-5, Rule 10 | | REF-086 (Kim et al., Google DeepMind, 2025) | arXiv 2512.08296 | LOW (peer-review pending) | Coordination topology — Rule 6, Rule 7 | | REF-088 (Wexford, DEV blog, 2026) | dev.to | VERY LOW (practitioner synthesis) | Sub-agent count cap — Rule 8 | | REF-127 (Zylos Research, 2026) | Industry report | VERY LOW (aggregated industry data) | Long-running degradation — Rule 9 | | REF-169 (Evans et al., Google PoI, 2026) | arXiv 2603.20639 | MODERATE (preprint, established institution) | Centaur-mode design direction (forward-looking, not yet enforced) | ### Core findings From **REF-089** (Recursive Language Models — Zhang et al., 2026): > "The key insight is that arbitrarily long user prompts should not be fed into the neural network directly but should instead be treated as *part of the environment* that the LLM is tasked to *symbolically and recursively interact with*." (p. 1) > "Compared to the summarization agent which ingests the entire input context, RLMs are up to 3× cheaper while maintaining stronger performance across all tasks because the RLM is able to selectively view context." (p. 6) > "Unfortunately, compaction is rarely expressive enough for tasks that require dense access throughout the prompt. It presumes that *some* details that appear early in the prompt can safely be forgotten to make room for new content." (p. 1) From **REF-086** (Towards a Science of Scaling Agent Systems — Kim et al., DeepMind, 2025): > "Independent multi-agent systems amplify errors at 17.2x the rate of single agents, while centralized coordination reduces this to 4.4x magnification." (Kim et al., 2025) > "Multi-agent coordination produces diminishing or negative returns once single-agent baselines exceed approximately 45% task performance." (Kim et al., 2025) > "Sequential reasoning tasks degraded by 39-70% across all multi-agent variants." (Kim et al., 2025) From **REF-088** (How to Build Multi-Agent Systems — Wexford, 2026 — *practitioner synthesis, not primary research*): > "Beyond 7 agents, coordination overhead begins to dominate actual productive work. The cognitive complexity of managing agent interactions grows faster than the capabilities gained." (Wexford, 2026) From **REF-127** (Long-Running AI Agents and Task Decomposition — Zylos Research, 2026 — *industry report, aggregated data*): > "Industry reports suggest agent success rate degrades after approximately 35 minutes of operation, and that doubling task duration quadruples the failure rate." (Zylos Research, 2026; primary citation not provided in source) From **REF-169** (Agentic AI and the Next Intelligence Explosion — Evans et al., 2026): > "A recursive descent into collective deliberation that expands when complexity demands and collapses when the problem resolves." (Evans, Bratton, & Agüera y Arcas, 2026) ## Problem Statement When working with large codebases or document sets, agents frequently: - Load entire files into conversation context unnecessarily - Process all content sequentially instead of filtering first - Exhaust context windows with raw text rather than using programmatic access - Lose information through compaction/summarization when full details are needed - Fail to leverage recursive decomposition for complex multi-file tasks This produces: - Context window overflow and truncation - Degraded output quality as details are lost to compaction - Expensive token costs from processing irrelevant content - Inability to handle codebases larger than context window - Loss of information through lossy summarization ## Mandatory Rules ### Rule 1: Symbolic Handles Over Raw Text **Research Basis**: REF-089 Design Choice 1 — Treat context as external environment When working with large files or multiple documents, reference by symbolic handle (file path) rather than loading full content into conversation. **FORBIDDEN**: ``` Agent task: Check authentication logic across the codebase Agent: Let me read all 47 files in src/auth/ into context *Loads 15,000 lines of code* *Context window 80% full* *Compaction kicks in, loses details* ``` **REQUIRED**: ``` Agent task: Check authentication logic across the codebase Agent: I'll work with the codebase as an external environment *Uses Grep to find password handling patterns* *Identifies 8 relevant functions* *Reads only those specific sections with line ranges* *Preserves full detail, uses <10% of context window* Symbolic handles maintained: src/auth/login.ts, src/auth/validate.ts Full content never loaded — accessed programmatically via Read when needed ``` ### Rule 2: Programmatic Access Over Full-Context Loading **Research Basis**: REF-089 Observation 4 — "Even without explicit training, RLMs exhibit interesting context decomposition and problem decomposition behavior." (p. 7) Before loading a large file, use code-based filtering (Grep, targeted Read with line ranges) rather than reading the entire content. **FORBIDDEN**: ``` Task: Find all functions that handle user input validation Agent: Let me read the entire 3,000-line utils.ts file *Loads full file* *Scans through 3,000 lines for validation functions* *Context now contains 2,800 lines of irrelevant code* ``` **REQUIRED**: ``` Task: Find all functions that handle user input validation Agent: I'll use programmatic access to filter before loading Step 1: Grep for validation patterns grep -n "validate.*input" utils.ts → Lines 42, 156, 892, 1240 contain matches Step 2: Read only relevant sections Read utils.ts (lines 35-65) # First function Read utils.ts (lines 150-180) # Second function ... Result: Context contains only 120 lines of relevant code (4% of file) Full details preserved, no compaction needed ``` **Emergent Strategies** (from REF-089 Section 4.1, pp. 7-8): - **Chunk by structure**: Use headers, function boundaries, class definitions - **Keyword filtering**: Grep for relevant patterns before reading - **Incremental aggregation**: Build understanding progressively via targeted reads - **Model priors**: Use domain knowledge to narrow search space first ### Rule 3: Recursive Sub-Calls for Dense Tasks **Research Basis**: REF-089 pp. 5-6 — "On information-dense tasks like OOLONG or OOLONG-Pairs, we observed several cases where recursive LM sub-calling is necessary... Across all information-dense tasks, RLMs outperform the ablation without sub-calling by 10%-59%." When a task requires processing information distributed across many files, delegate to sub-agents via Task tool rather than trying to process everything in one context. **FORBIDDEN**: ``` Task: Analyze error handling patterns across 30 API endpoints Agent: Let me read all 30 endpoint files into context *Attempts to load 30 files × 500 lines = 15,000 lines* *Context overflow* *Compaction loses critical error handling details* ``` **REQUIRED**: ``` Task: Analyze error handling patterns across 30 API endpoints Agent: This is an information-dense task requiring distributed access. I'll use recursive sub-calls. Step 1: Identify all endpoint files glob "src/api/**/*.ts" → 30 endpoint files identified Step 2: Spawn parallel sub-agents (via Task tool) Sub-agent 1: Analyze error handling in endpoints 1-10 Sub-agent 2: Analyze error handling in endpoints 11-20 Sub-agent 3: Analyze error handling in endpoints 21-30 Each sub-agent: - Uses Grep to find error handling code - Reads only try/catch blocks and error returns - Summarizes patterns found (not raw code) Step 3: Aggregate sub-agent findings Combine 3 summaries (total: ~500 tokens) Identify common patterns and gaps Result: Full coverage, low context usage, no information loss ``` **When to Use Recursive Sub-Calls**: | Task Type | Use Sub-Calls? | Reason | |-----------|----------------|--------| | Single file analysis | No | Read with line ranges sufficient | | 2-5 related files | Maybe | Depends on total size | | 6-20 files | Yes | Parallel sub-agents more efficient | | >20 files | Definitely | Impossible to process in one context | | Cross-cutting concerns | Yes | Information distributed across codebase | ### Rule 4: Cost-Aware Sub-Call Management **Research Basis**: REF-089 Figure 3, p. 6 — RLM median cost comparable to base model, up to 3x cheaper than summarization, but high variance exists. Track sub-call count and estimated token cost. When total cost approaches budget, switch to more targeted strategies rather than broad scanning. **FORBIDDEN**: ``` Agent spawns 100 sub-agents to analyze every file in repository *Each sub-agent costs 5,000 tokens* *Total cost: 500,000 tokens* *Budget exhausted on preliminary analysis* *No tokens left for actual implementation* ``` **REQUIRED**: ``` Agent task: Find security vulnerabilities in authentication module Cost awareness protocol: 1. Estimate task scope: - 30 files in auth module - Average 300 lines per file - Potential cost: 30 sub-calls × 3k tokens = 90k tokens 2. Check budget: - Total budget: 100k tokens - Estimated usage: 90k tokens (90% of budget) - Decision: Budget is tight, use targeted strategy 3. Apply filtering first: - Grep for sensitive operations (password, token, session) - Reduces scope to 8 critical files - New estimate: 8 sub-calls × 3k tokens = 24k tokens (24%) - Decision: Proceed with filtered approach Result: Task completed in 26k tokens (26% of budget) Buffer maintained for implementation work ``` **Cost Thresholds**: | Budget Usage | Action | |--------------|--------| | <30% | Safe to proceed with current strategy | | 30-50% | Monitor usage, prepare to filter more aggressively | | 50-70% | Apply additional filtering before sub-calls | | >70% | Switch to highly targeted approach or escalate | | >90% | Abort current strategy, request human guidance | ### Rule 5: Emergent Decomposition Patterns **Research Basis**: REF-089 Section 4.1 — "Even without explicit training, RLMs exhibit interesting context decomposition and problem decomposition behavior." Lists emergent strategies: chunking by structure, keyword filtering, incremental aggregation. Use the decomposition patterns RLMs discover naturally. These are not prescribed strategies but observed best practices. **Pattern 1: Structural Chunking** **FORBIDDEN**: ``` Agent: Let me read the entire API documentation sequentially from top to bottom *Loads all 50 pages* *Loses details through compaction* ``` **REQUIRED**: ``` Agent: I'll chunk by document structure 1. Read table of contents (identifies 8 sections) 2. Determine which sections are relevant (3 of 8) 3. Read only those sections (15 pages instead of 50) Structural boundary: Section headers ``` **Pattern 2: Keyword Filtering** **FORBIDDEN**: ``` Agent: I need to understand the database schema *Reads entire 50-file ORM model directory* ``` **REQUIRED**: ``` Agent: I'll filter by keywords first 1. Grep for "@Entity" decorator (finds 15 model files) 2. Grep for "userId" field (reduces to 5 relevant models) 3. Read only those 5 model definitions Keyword filters: @Entity, userId ``` **Pattern 3: Incremental Aggregation** **FORBIDDEN**: ``` Agent: Let me collect all API endpoint signatures at once *Attempts to extract from 30 files in parallel* *Context overflow from trying to hold all results* ``` **REQUIRED**: ``` Agent: I'll aggregate incrementally via intermediate storage Iteration 1: Process endpoints 1-10 → Write results to .aiwg/working/endpoints-batch1.json Iteration 2: Process endpoints 11-20 → Append to .aiwg/working/endpoints-batch2.json Iteration 3: Process endpoints 21-30 → Append to .aiwg/working/endpoints-batch3.json Final step: Read all 3 batch files (compact summaries, not raw content) → Synthesize complete API surface Incremental storage: Files as REPL variables ``` **Pattern 4: Model Priors (Domain Knowledge)** **FORBIDDEN**: ``` Agent: I need to find the authentication logic *Searches entire codebase alphabetically* *Wastes tokens scanning irrelevant directories* ``` **REQUIRED**: ``` Agent: I'll leverage domain knowledge to narrow search Domain prior: Authentication logic typically in: - src/auth/ - src/middleware/ - Filenames containing "auth", "login", "session" 1. Glob for these patterns first 2. Only then examine matched files Search space reduced from 500 files to 12 files via priors ``` ### Rule 6: RLM Is Centralized Coordination — Aggregate, Don't Bag-of-Agents **Research Basis**: REF-086 — independent multi-agent systems amplify errors at 17.2x; centralized coordination reduces this to 4.4x. (GRADE: LOW, peer-review pending.) RLM's recursive sub-call architecture is **centralized by design**: the root LLM is the controller, sub-agents are dispatched by the root and their outputs are aggregated by the root. This puts RLM in the 4.4x error-magnification bucket, not 17.2x. But `rlm-batch` parallel fan-out can degrade into "bag of agents" behavior if results are silently merged without active reconciliation. **FORBIDDEN**: ``` Agent dispatches /rlm-batch with 5 sub-agents Each sub-agent produces a finding Agent concatenates the 5 outputs and returns "here's the report" ↑ no reconciliation, no conflict detection, no aggregation logic ``` **REQUIRED**: ``` Agent dispatches /rlm-batch with 5 sub-agents Each sub-agent produces a structured finding Agent (or aggregator strategy): - Reconciles conflicts between sub-agent outputs - Detects contradictions and flags them - Produces a coherent synthesis with provenance - Returns a single integrated result, not a concatenation ``` The `--aggregate` strategy on `rlm-batch` (e.g., `concat`, `summarize`) is the reconciliation layer. Choose it deliberately — `concat` is appropriate only when sub-agent outputs are guaranteed independent (one file each, no cross-cutting concerns). ### Rule 7: Don't Use RLM When a Single Agent Already Works **Research Basis**: REF-086 — multi-agent coordination produces diminishing or negative returns once single-agent baselines exceed approximately 45% task performance. Sequential reasoning tasks degrade 39-70% across all multi-agent variants. (GRADE: LOW, peer-review pending.) RLM is most valuable for tasks where a single agent struggles: long context, distributed information across many files, multi-file synthesis. For tasks where a single agent already performs well — focused queries, small files, single-file analysis — RLM adds coordination overhead without benefit. **Decision threshold**: If a single Read+Grep would resolve the task in <50% context utilization, do not escalate to RLM. **Sequential dependency warning**: If each step of the task depends on the prior step's result (each step needs the answer from the last), use a single agent. Splitting into sub-agents loses the chain. **FORBIDDEN**: ``` Task: Read this 200-line config file and tell me the database URL Agent: Let me dispatch /rlm-query against this file ↑ overkill — single Read suffices ``` **REQUIRED**: ``` Task: Read this 200-line config file and tell me the database URL Agent: Reading the file directly *Read with line range; extract the URL* ``` Reserve RLM for tasks where the single-agent baseline genuinely struggles. Below the 45% threshold the coordination tax is paid in *negative* returns. ### Rule 8: Concurrent Sub-Agent Cap — 3-7 Sweet Spot, Hard Cap at 7 **Research Basis**: REF-088 — practitioner synthesis reports 3-7 agents as the optimal range; n*(n-1)/2 communication paths cause coordination overhead to dominate beyond 7. (GRADE: VERY LOW — practitioner blog, no primary research; corroborated by REF-086 LOW-grade primary research on coordination tax.) Concurrent sub-agent count from a single RLM dispatch must respect the multi-agent coordination sweet spot: | Concurrent count | Coordination state | |---|---| | 1-2 | Trivial, but loses parallelism benefits | | 3-5 | Optimal for most tasks | | 5-7 | Peak for complex tasks | | 8+ | Coordination overhead dominates; auto-batch into waves of ≤7 | **`rlm-batch` defaults**: `--max-parallel=4` is the default — mid-sweet-spot, n*(n-1)/2 = 6 paths, fits all `AIWG_CONTEXT_WINDOW` tiers ≥65k. **Hard cap**: Never spawn more than 7 concurrent sub-agents from a single RLM dispatch. If `--max-parallel` requests >7, auto-batch into sequential waves of ≤7. **Cross-reference**: When `AIWG_CONTEXT_WINDOW` is declared in the project context, the `context-budget` rule provides additional caps based on context-window tier. The smaller of the two limits applies. See `@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/context-budget.md`. **Composes with the provider parallelism cap (#1359)**: When `.aiwg/aiwg.config` declares a `parallelism.max_parallel_subagents` cap, that value composes with the RLM 7-agent hard cap and the context-budget cap. The effective limit is the **minimum** of all applicable caps: ``` effective_rlm_parallel = min( parallelism.max_parallel_subagents, // provider rate-limit cap context_budget_tier_cap, // from AIWG_CONTEXT_WINDOW 7 // RLM hard cap (this rule) ) ``` For example, a Claude small-plan project with `parallelism.max_parallel_subagents=4` and an `AIWG_CONTEXT_WINDOW=512000` (which would otherwise allow 8-12 parallel) is capped at 4 for RLM dispatches. The provider cap wins because it reflects the actual rate-limit ceiling. See `@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/subagent-scoping.md` Rule 8 for the full composition formula. ### Rule 9: Long-Running RLM Operations Must Checkpoint **Research Basis**: REF-127 — industry reports suggest agent success rate degrades after ~35 minutes of operation; doubling task duration quadruples the failure rate. (GRADE: VERY LOW — aggregated industry data, no primary citation given. Treat as warning signal, not hard limit.) For any RLM operation expected to exceed 30 minutes of wall-clock time: 1. **Externalize state to filesystem** at regular intervals — intermediate result files, progress checkpoints under `.aiwg/working/rlm-runs/{id}/` 2. **Make state recoverable** — agent must be able to resume from last checkpoint, not start over 3. **Prefer split-into-loops over one-long-run** — if the task is shaped as "process N items, each takes M minutes," split into multiple `aiwg ralph`-style iterations with persistent state 4. **Surface elapsed-time warning** — `rlm-status` should display elapsed wall-clock time and warn at 25 minutes that the operation is approaching the practitioner-reported degradation threshold **Why this matters**: REF-127's quadratic failure scaling (industry-reported, not peer-reviewed) implies a 60-minute run is roughly 4x more likely to fail than a 30-minute run. For RLM operations on large corpora, this is the difference between successful completion and partial failure with no recovery path. **Hedging**: The 35-minute threshold is *not* primary research. It is practitioner heuristic from an industry report (REF-127, GRADE: VERY LOW). Treat the rule as a defensive checkpoint discipline, not a precise ceiling. ### Rule 10: Coding-Capable Models for the RLM Root **Research Basis**: REF-089 Appendix B — "Qwen3-8B (non-coder) struggled without sufficient coding capabilities." (GRADE: LOW, peer-review pending.) RLM relies on the root LLM emitting code (regex, glob, REPL operations) to filter and decompose context. Models without strong coding ability underperform as RLM root agents. **Required defaults**: - RLM root agents (the agent invoking `/rlm-query` or `/rlm-batch`): **sonnet or opus**, never haiku - RLM sub-agents performing simple extraction (single-file pattern matching, count, yes/no): **haiku is appropriate** - RLM sub-agents performing analysis or synthesis: **sonnet** **Output token limits matter** (REF-089 Appendix B): RLM root agents emit code, which can be verbose. Models with restrictive output token limits (<4k) cap RLM effectiveness. If the configured root model has lower output limits, surface a warning before dispatch. **Synchronous LM calls are slow** (REF-089 Appendix B): For deep recursive trees, synchronous sub-calls become prohibitive. Prefer `rlm-batch` (parallel fan-out) over chains of `rlm-query` (sequential) when recursion depth >1. ## Integration Patterns ### With Agent Loops RLM patterns integrate naturally with Al's TAO loop: **Al TAO + RLM**: ```yaml ralph_rlm_integration: thought_phase: - assess_context_needs - estimate_total_tokens_required - check_budget_vs_estimate - select_access_strategy (direct | filtered | recursive) action_phase: - if_strategy_direct: - Read with line ranges - Grep for patterns - if_strategy_filtered: - Grep first to identify targets - Read only matched sections - if_strategy_recursive: - Spawn sub-agents via Task tool - Each sub-agent uses filtered access - Aggregate results in intermediate files observation_phase: - capture_tokens_used - update_budget_remaining - check_information_completeness - decide_next_iteration_strategy ``` **Al State as RLM Variables**: ``` .aiwg/ralph/loop-{id}/ ├── state/ │ ├── endpoints_analyzed.json # RLM equivalent of REPL variable │ ├── error_patterns_found.json # Persistent intermediate results │ └── coverage_summary.json # Aggregated from sub-calls ``` These files act as REPL state variables — persistent across iterations, preventing context bloat. ### With Agent Supervisor The Agent Supervisor can orchestrate RLM-style recursive delegation: ```yaml agent_supervisor_rlm: on_complex_task: - estimate_task_complexity - if_complexity_high: - decompose_into_subtasks - spawn_specialist_agents (recursive sub-calls) - each_agent_uses_programmatic_access - supervisor_aggregates_results - if_complexity_moderate: - single_agent_with_filtered_access - if_complexity_low: - direct_processing ``` ### With Research Before Decision RLM context management complements research-before-decision: - **Research-before-decision**: Know what to look for - **RLM context management**: How to efficiently access it **Combined Pattern**: ``` 1. Research phase: Identify what needs to be known "I need to find password hashing configuration" 2. RLM filtering: Narrow search space before loading grep -r "hash.*password" config/ → Found in config/security.ts line 42 3. RLM targeted access: Read only relevant section Read config/security.ts lines 35-55 4. Decision: Act on complete, targeted information Use bcrypt cost factor 12 as configured ``` ### With Subagent Scoping RLM patterns strengthen subagent scoping rules: **Subagent Scoping Rule 2** + **RLM Rule 2**: ``` Before delegating to subagent: 1. Filter context programmatically (RLM) 2. Pass only filtered results to subagent (scoping) 3. Subagent receives minimal, relevant context 4. No context overflow, no compaction loss ``` **Example**: ``` Parent agent task: Analyze test coverage for auth module Parent agent (RLM filtering): - Globs for test files: test/**/*auth*.test.ts - Finds 8 test files - Greps each for coverage gaps: "// TODO", "skip", "xit" - Extracts summary metadata (not full test code) Delegation to subagent: Context: Summary metadata (200 tokens) NOT: All 8 test files (4,000 tokens) Subagent: Analyzes gaps, suggests improvements Uses <10% of context window vs 70% if given full files ``` ## Detection Patterns ### Signs of Missing RLM Patterns | Symptom | Indicates | RLM Solution | |---------|-----------|--------------| | Context window repeatedly at 90%+ usage | Loading full content | Use symbolic handles + programmatic access | | Compaction losing critical details | Too much raw text in context | Filter with Grep before loading | | Agent reports "cannot process all files" | Single-context limitation | Use recursive sub-calls | | High token costs on analysis tasks | Inefficient context usage | Apply keyword filtering first | | Agent provides superficial multi-file analysis | Context overflow | Delegate to parallel sub-agents | | Repeated re-reading of same files | No persistent state | Use intermediate files as REPL variables | ### Warning Signs Before Context Overload | Check | Red Flag | RLM Mitigation | |-------|----------|----------------| | File count | >10 files needed | Use sub-agents | | Total lines | >5,000 lines | Apply structural chunking | | Context estimate | >50% of window | Filter with Grep first | | Information density | High detail needed throughout | Recursive sub-calls | | Cross-cutting concern | Logic spread across many files | Parallel sub-agents with aggregation | ## Best Practices ### Good RLM Patterns **Pattern 1: Environment-First Mindset** ``` Agent receives task: "Update all API endpoints to use new auth middleware" Good RLM approach: THOUGHT: The codebase is my environment. I don't need to load everything into my context. I'll interact with it programmatically. ACTION 1: Glob for endpoint files → 24 endpoints found ACTION 2: Grep each for current auth middleware usage → 18 of 24 use old middleware ACTION 3: For each of 18: - Read only the middleware registration lines - Generate targeted edit - Write updated version Result: Updated 18 files using <5% of context window Full details preserved, no compaction ``` **Pattern 2: Filter → Read → Process** ``` Task: Find all TODO comments related to performance Bad: Read all files looking for TODOs (context overflow) Good RLM: 1. FILTER: grep -r "// TODO.*performance" src/ → 8 matches found 2. READ: For each match, read surrounding 10 lines → Context contains only 80 lines 3. PROCESS: Categorize and prioritize TODOs → Output summary Total tokens: <2,000 (vs 50,000+ for full codebase read) ``` **Pattern 3: Recursive Aggregation** ``` Task: Generate API documentation from 40 endpoint files Bad: Load all 40 files and try to document in one pass Good RLM: 1. Spawn 4 sub-agents, each handles 10 endpoints 2. Each sub-agent: - Uses Grep to extract route + handler signature - Generates doc snippet - Writes to intermediate file 3. Parent agent: - Reads 4 intermediate files (summaries, not raw code) - Combines into final documentation - Total context: ~5,000 tokens (vs 40,000+ direct) Recursive structure: Root Agent ├── Sub-agent 1 (endpoints 1-10) → docs-batch-1.md ├── Sub-agent 2 (endpoints 11-20) → docs-batch-2.md ├── Sub-agent 3 (endpoints 21-30) → docs-batch-3.md └── Sub-agent 4 (endpoints 31-40) → docs-batch-4.md Final aggregation from 4 batch files ``` ## Cost Model ### RLM vs Full-Context Processing **From REF-089**: RLMs are up to 3x cheaper than summarization agents while maintaining stronger performance. **Cost Comparison**: | Strategy | Token Cost | Information Loss | When Better | |----------|-----------|------------------|-------------| | **Full-context loading** | High (all content) | None (initially) | Files <2,000 tokens | | **Context compaction** | Medium (compression) | High (lossy) | Not recommended for dense tasks | | **RLM programmatic** | Low-Medium (targeted) | None (lossless) | Files >2,000 tokens, distributed info | | **RLM recursive** | Variable (sub-calls) | None (lossless) | Very large codebases, cross-cutting | **Median RLM Cost**: Comparable to or lower than base model (REF-089 Figure 3). **Variance**: High — some tasks are cheap (simple filtering), others expensive (deep recursion). Use percentile-based cost tracking (p25, p50, p75, p95) to capture distribution. ### Cost Optimization Strategies **Strategy 1: Depth-Based Budgeting** ```yaml cost_by_recursion_depth: depth_0: 5,000 tokens # Root agent direct processing depth_1: 15,000 tokens # Root + sub-agents (no sub-sub-agents) depth_2: 40,000 tokens # Root + sub + sub-sub (rare) guideline: Prefer depth 0-1, avoid depth 2 unless truly necessary ``` **Strategy 2: Batch Size Tuning** ```yaml batch_sizing: small_batch: 3-5 files per sub-agent # Higher parallelism, more sub-calls medium_batch: 8-12 files per sub-agent # Balanced large_batch: 15-20 files per sub-agent # Lower parallelism, fewer sub-calls rule: Tune batch size based on file complexity - Simple files (models, configs): Large batches - Complex files (business logic): Small batches ``` **Strategy 3: Incremental vs Parallel** ```yaml aggregation_strategy: incremental: pattern: Process sequentially, save intermediate results to files cost: Lower (one agent active at a time) latency: Higher (sequential) when: Cost-constrained, not time-sensitive parallel: pattern: Spawn all sub-agents simultaneously cost: Higher (N agents active) latency: Lower (parallel execution) when: Time-sensitive, budget available ``` ## Metrics Track these metrics for RLM effectiveness: | Metric | Target | Indicates | |--------|--------|-----------| | Context window utilization | <50% | Efficient programmatic access | | Sub-call count per task | <10 (depth 1) | Appropriate decomposition | | Cost ratio (RLM vs direct) | <1.5x | RLM efficiency maintained | | Information completeness | >95% | No critical loss through filtering | | Compaction rate | <10% | Minimal lossy summarization | | Median tokens per task | <20k | Sustainable token usage | | P95 tokens per task | <100k | Outlier control | ## Platform Applicability These rules apply universally across all AI coding platforms: - Claude Code, Codex, Copilot, Cursor, Warp, Factory, OpenCode, Windsurf - Any agent working with large codebases or document corpora RLM patterns are platform-agnostic — they depend on tool access (Read, Grep, Glob, Task) rather than specific model capabilities. ## Checklist Before processing large context: - [ ] Estimated total tokens if loaded directly (would it exceed 50% of context window?) - [ ] Applied keyword filtering via Grep before loading - [ ] Used line ranges in Read for targeted access - [ ] Maintained symbolic file handles rather than loading full content - [ ] Checked if recursive sub-calls would be more efficient (>10 files) - [ ] Budget allocated for sub-calls if using delegation - [ ] Intermediate results saved to files (REPL variables) if iterative - [ ] Cost tracking enabled to monitor token usage Before spawning sub-agents for RLM recursion: - [ ] Task requires distributed information access (>5 files) - [ ] Budget allocated (estimated cost < 70% of total budget) - [ ] Each sub-task has clear scope (subagent-scoping rules followed) - [ ] Aggregation strategy defined (parallel or incremental) - [ ] Parent agent will receive summaries, not raw content from sub-agents ## Limitations From REF-089 Appendix B, important limitations to be aware of: 1. **Synchronous sub-calls are slow**: Production systems should use async/parallel execution (AIWG already supports via Task tool parallelism) 2. **Models need coding ability**: Non-coder models struggle with programmatic context access (AIWG agents run in coding-capable environments by design) 3. **Output token limits matter**: Models with limited output tokens underperform as RLMs (provider model selection should consider this) 4. **High variance in costs**: Some RLM runs are expensive outliers (use percentile-based cost tracking, not just averages) ## References - @.aiwg/research/findings/REF-089-recursive-language-models.md - Complete research analysis - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md - Complementary research patterns - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/subagent-scoping.md - Context limits for delegation - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/tao-loop.md - TAO loop integration with RLM patterns - @$AIWG_ROOT/tools/ralph-external/ - Agent loop implementation - @$AIWG_ROOT/tools/daemon/agent-supervisor.mjs - Agent orchestration - @$AIWG_ROOT/tools/daemon/task-store.mjs - Persistent state (REPL variables) --- **Rule Status**: ACTIVE **Last Updated**: 2026-05-08 **Research Basis**: REF-089 (Zhang, Kraska, & Khattab, 2026, GRADE: LOW); REF-086 (Kim et al., DeepMind, 2025, GRADE: LOW); REF-088 (Wexford, 2026, GRADE: VERY LOW); REF-127 (Zylos Research, 2026, GRADE: VERY LOW); REF-169 (Evans et al., 2026, GRADE: MODERATE) **Issue**: #322 (Core RLM Addon); #1196 (research-corpus update epic); #1197, #1198, #1199 (this update)