UNPKG

aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

665 lines (490 loc) 23 kB
# RLM Context Management Rules **Enforcement Level**: HIGH **Scope**: All agents operating on large codebases or document corpora **Addon**: rlm (Recursive Language Model patterns) **Research Basis**: REF-089 Recursive Language Models (Zhang et al., 2026) **Issue**: #322 (Core RLM Addon) ## Overview These rules enforce Recursive Language Model (RLM) patterns for context management when working with large codebases, documentation corpora, or multi-file operations. Research shows that treating context as an external environment accessed programmatically through code outperforms loading entire contexts into the conversation window up to 3x cost reduction while maintaining stronger performance. ## Research Foundation From REF-089 Recursive Language Models (Zhang et al., 2026): > "The key insight is that arbitrarily long user prompts should not be fed into the neural network directly but should instead be treated as *part of the environment* that the LLM is tasked to *symbolically and recursively interact with*." (p. 1) > "Compared to the summarization agent which ingests the entire input context, RLMs are up to 3× cheaper while maintaining stronger performance across all tasks because the RLM is able to selectively view context." (p. 6) > "Unfortunately, compaction is rarely expressive enough for tasks that require dense access throughout the prompt. It presumes that *some* details that appear early in the prompt can safely be forgotten to make room for new content." (p. 1) ## Problem Statement When working with large codebases or document sets, agents frequently: - Load entire files into conversation context unnecessarily - Process all content sequentially instead of filtering first - Exhaust context windows with raw text rather than using programmatic access - Lose information through compaction/summarization when full details are needed - Fail to leverage recursive decomposition for complex multi-file tasks This produces: - Context window overflow and truncation - Degraded output quality as details are lost to compaction - Expensive token costs from processing irrelevant content - Inability to handle codebases larger than context window - Loss of information through lossy summarization ## Mandatory Rules ### Rule 1: Symbolic Handles Over Raw Text **Research Basis**: REF-089 Design Choice 1 Treat context as external environment When working with large files or multiple documents, reference by symbolic handle (file path) rather than loading full content into conversation. **FORBIDDEN**: ``` Agent task: Check authentication logic across the codebase Agent: Let me read all 47 files in src/auth/ into context *Loads 15,000 lines of code* *Context window 80% full* *Compaction kicks in, loses details* ``` **REQUIRED**: ``` Agent task: Check authentication logic across the codebase Agent: I'll work with the codebase as an external environment *Uses Grep to find password handling patterns* *Identifies 8 relevant functions* *Reads only those specific sections with line ranges* *Preserves full detail, uses <10% of context window* Symbolic handles maintained: src/auth/login.ts, src/auth/validate.ts Full content never loaded accessed programmatically via Read when needed ``` ### Rule 2: Programmatic Access Over Full-Context Loading **Research Basis**: REF-089 Observation 4 "Even without explicit training, RLMs exhibit interesting context decomposition and problem decomposition behavior." (p. 7) Before loading a large file, use code-based filtering (Grep, targeted Read with line ranges) rather than reading the entire content. **FORBIDDEN**: ``` Task: Find all functions that handle user input validation Agent: Let me read the entire 3,000-line utils.ts file *Loads full file* *Scans through 3,000 lines for validation functions* *Context now contains 2,800 lines of irrelevant code* ``` **REQUIRED**: ``` Task: Find all functions that handle user input validation Agent: I'll use programmatic access to filter before loading Step 1: Grep for validation patterns grep -n "validate.*input" utils.ts Lines 42, 156, 892, 1240 contain matches Step 2: Read only relevant sections Read utils.ts (lines 35-65) # First function Read utils.ts (lines 150-180) # Second function ... Result: Context contains only 120 lines of relevant code (4% of file) Full details preserved, no compaction needed ``` **Emergent Strategies** (from REF-089 Section 4.1, pp. 7-8): - **Chunk by structure**: Use headers, function boundaries, class definitions - **Keyword filtering**: Grep for relevant patterns before reading - **Incremental aggregation**: Build understanding progressively via targeted reads - **Model priors**: Use domain knowledge to narrow search space first ### Rule 3: Recursive Sub-Calls for Dense Tasks **Research Basis**: REF-089 pp. 5-6 "On information-dense tasks like OOLONG or OOLONG-Pairs, we observed several cases where recursive LM sub-calling is necessary... Across all information-dense tasks, RLMs outperform the ablation without sub-calling by 10%-59%." When a task requires processing information distributed across many files, delegate to sub-agents via Task tool rather than trying to process everything in one context. **FORBIDDEN**: ``` Task: Analyze error handling patterns across 30 API endpoints Agent: Let me read all 30 endpoint files into context *Attempts to load 30 files × 500 lines = 15,000 lines* *Context overflow* *Compaction loses critical error handling details* ``` **REQUIRED**: ``` Task: Analyze error handling patterns across 30 API endpoints Agent: This is an information-dense task requiring distributed access. I'll use recursive sub-calls. Step 1: Identify all endpoint files glob "src/api/**/*.ts" 30 endpoint files identified Step 2: Spawn parallel sub-agents (via Task tool) Sub-agent 1: Analyze error handling in endpoints 1-10 Sub-agent 2: Analyze error handling in endpoints 11-20 Sub-agent 3: Analyze error handling in endpoints 21-30 Each sub-agent: - Uses Grep to find error handling code - Reads only try/catch blocks and error returns - Summarizes patterns found (not raw code) Step 3: Aggregate sub-agent findings Combine 3 summaries (total: ~500 tokens) Identify common patterns and gaps Result: Full coverage, low context usage, no information loss ``` **When to Use Recursive Sub-Calls**: | Task Type | Use Sub-Calls? | Reason | |-----------|----------------|--------| | Single file analysis | No | Read with line ranges sufficient | | 2-5 related files | Maybe | Depends on total size | | 6-20 files | Yes | Parallel sub-agents more efficient | | >20 files | Definitely | Impossible to process in one context | | Cross-cutting concerns | Yes | Information distributed across codebase | ### Rule 4: Cost-Aware Sub-Call Management **Research Basis**: REF-089 Figure 3, p. 6 RLM median cost comparable to base model, up to 3x cheaper than summarization, but high variance exists. Track sub-call count and estimated token cost. When total cost approaches budget, switch to more targeted strategies rather than broad scanning. **FORBIDDEN**: ``` Agent spawns 100 sub-agents to analyze every file in repository *Each sub-agent costs 5,000 tokens* *Total cost: 500,000 tokens* *Budget exhausted on preliminary analysis* *No tokens left for actual implementation* ``` **REQUIRED**: ``` Agent task: Find security vulnerabilities in authentication module Cost awareness protocol: 1. Estimate task scope: - 30 files in auth module - Average 300 lines per file - Potential cost: 30 sub-calls × 3k tokens = 90k tokens 2. Check budget: - Total budget: 100k tokens - Estimated usage: 90k tokens (90% of budget) - Decision: Budget is tight, use targeted strategy 3. Apply filtering first: - Grep for sensitive operations (password, token, session) - Reduces scope to 8 critical files - New estimate: 8 sub-calls × 3k tokens = 24k tokens (24%) - Decision: Proceed with filtered approach Result: Task completed in 26k tokens (26% of budget) Buffer maintained for implementation work ``` **Cost Thresholds**: | Budget Usage | Action | |--------------|--------| | <30% | Safe to proceed with current strategy | | 30-50% | Monitor usage, prepare to filter more aggressively | | 50-70% | Apply additional filtering before sub-calls | | >70% | Switch to highly targeted approach or escalate | | >90% | Abort current strategy, request human guidance | ### Rule 5: Emergent Decomposition Patterns **Research Basis**: REF-089 Section 4.1 "Even without explicit training, RLMs exhibit interesting context decomposition and problem decomposition behavior." Lists emergent strategies: chunking by structure, keyword filtering, incremental aggregation. Use the decomposition patterns RLMs discover naturally. These are not prescribed strategies but observed best practices. **Pattern 1: Structural Chunking** **FORBIDDEN**: ``` Agent: Let me read the entire API documentation sequentially from top to bottom *Loads all 50 pages* *Loses details through compaction* ``` **REQUIRED**: ``` Agent: I'll chunk by document structure 1. Read table of contents (identifies 8 sections) 2. Determine which sections are relevant (3 of 8) 3. Read only those sections (15 pages instead of 50) Structural boundary: Section headers ``` **Pattern 2: Keyword Filtering** **FORBIDDEN**: ``` Agent: I need to understand the database schema *Reads entire 50-file ORM model directory* ``` **REQUIRED**: ``` Agent: I'll filter by keywords first 1. Grep for "@Entity" decorator (finds 15 model files) 2. Grep for "userId" field (reduces to 5 relevant models) 3. Read only those 5 model definitions Keyword filters: @Entity, userId ``` **Pattern 3: Incremental Aggregation** **FORBIDDEN**: ``` Agent: Let me collect all API endpoint signatures at once *Attempts to extract from 30 files in parallel* *Context overflow from trying to hold all results* ``` **REQUIRED**: ``` Agent: I'll aggregate incrementally via intermediate storage Iteration 1: Process endpoints 1-10 Write results to .aiwg/working/endpoints-batch1.json Iteration 2: Process endpoints 11-20 Append to .aiwg/working/endpoints-batch2.json Iteration 3: Process endpoints 21-30 Append to .aiwg/working/endpoints-batch3.json Final step: Read all 3 batch files (compact summaries, not raw content) Synthesize complete API surface Incremental storage: Files as REPL variables ``` **Pattern 4: Model Priors (Domain Knowledge)** **FORBIDDEN**: ``` Agent: I need to find the authentication logic *Searches entire codebase alphabetically* *Wastes tokens scanning irrelevant directories* ``` **REQUIRED**: ``` Agent: I'll leverage domain knowledge to narrow search Domain prior: Authentication logic typically in: - src/auth/ - src/middleware/ - Filenames containing "auth", "login", "session" 1. Glob for these patterns first 2. Only then examine matched files Search space reduced from 500 files to 12 files via priors ``` ## Integration Patterns ### With Ralph Loops RLM patterns integrate naturally with Ralph's TAO loop: **Ralph TAO + RLM**: ```yaml ralph_rlm_integration: thought_phase: - assess_context_needs - estimate_total_tokens_required - check_budget_vs_estimate - select_access_strategy (direct | filtered | recursive) action_phase: - if_strategy_direct: - Read with line ranges - Grep for patterns - if_strategy_filtered: - Grep first to identify targets - Read only matched sections - if_strategy_recursive: - Spawn sub-agents via Task tool - Each sub-agent uses filtered access - Aggregate results in intermediate files observation_phase: - capture_tokens_used - update_budget_remaining - check_information_completeness - decide_next_iteration_strategy ``` **Ralph State as RLM Variables**: ``` .aiwg/ralph/loop-{id}/ ├── state/ ├── endpoints_analyzed.json # RLM equivalent of REPL variable ├── error_patterns_found.json # Persistent intermediate results └── coverage_summary.json # Aggregated from sub-calls ``` These files act as REPL state variables persistent across iterations, preventing context bloat. ### With Agent Supervisor The Agent Supervisor can orchestrate RLM-style recursive delegation: ```yaml agent_supervisor_rlm: on_complex_task: - estimate_task_complexity - if_complexity_high: - decompose_into_subtasks - spawn_specialist_agents (recursive sub-calls) - each_agent_uses_programmatic_access - supervisor_aggregates_results - if_complexity_moderate: - single_agent_with_filtered_access - if_complexity_low: - direct_processing ``` ### With Research Before Decision RLM context management complements research-before-decision: - **Research-before-decision**: Know what to look for - **RLM context management**: How to efficiently access it **Combined Pattern**: ``` 1. Research phase: Identify what needs to be known "I need to find password hashing configuration" 2. RLM filtering: Narrow search space before loading grep -r "hash.*password" config/ Found in config/security.ts line 42 3. RLM targeted access: Read only relevant section Read config/security.ts lines 35-55 4. Decision: Act on complete, targeted information Use bcrypt cost factor 12 as configured ``` ### With Subagent Scoping RLM patterns strengthen subagent scoping rules: **Subagent Scoping Rule 2** + **RLM Rule 2**: ``` Before delegating to subagent: 1. Filter context programmatically (RLM) 2. Pass only filtered results to subagent (scoping) 3. Subagent receives minimal, relevant context 4. No context overflow, no compaction loss ``` **Example**: ``` Parent agent task: Analyze test coverage for auth module Parent agent (RLM filtering): - Globs for test files: test/**/*auth*.test.ts - Finds 8 test files - Greps each for coverage gaps: "// TODO", "skip", "xit" - Extracts summary metadata (not full test code) Delegation to subagent: Context: Summary metadata (200 tokens) NOT: All 8 test files (4,000 tokens) Subagent: Analyzes gaps, suggests improvements Uses <10% of context window vs 70% if given full files ``` ## Detection Patterns ### Signs of Missing RLM Patterns | Symptom | Indicates | RLM Solution | |---------|-----------|--------------| | Context window repeatedly at 90%+ usage | Loading full content | Use symbolic handles + programmatic access | | Compaction losing critical details | Too much raw text in context | Filter with Grep before loading | | Agent reports "cannot process all files" | Single-context limitation | Use recursive sub-calls | | High token costs on analysis tasks | Inefficient context usage | Apply keyword filtering first | | Agent provides superficial multi-file analysis | Context overflow | Delegate to parallel sub-agents | | Repeated re-reading of same files | No persistent state | Use intermediate files as REPL variables | ### Warning Signs Before Context Overload | Check | Red Flag | RLM Mitigation | |-------|----------|----------------| | File count | >10 files needed | Use sub-agents | | Total lines | >5,000 lines | Apply structural chunking | | Context estimate | >50% of window | Filter with Grep first | | Information density | High detail needed throughout | Recursive sub-calls | | Cross-cutting concern | Logic spread across many files | Parallel sub-agents with aggregation | ## Best Practices ### Good RLM Patterns **Pattern 1: Environment-First Mindset** ``` Agent receives task: "Update all API endpoints to use new auth middleware" Good RLM approach: THOUGHT: The codebase is my environment. I don't need to load everything into my context. I'll interact with it programmatically. ACTION 1: Glob for endpoint files 24 endpoints found ACTION 2: Grep each for current auth middleware usage 18 of 24 use old middleware ACTION 3: For each of 18: - Read only the middleware registration lines - Generate targeted edit - Write updated version Result: Updated 18 files using <5% of context window Full details preserved, no compaction ``` **Pattern 2: Filter Read Process** ``` Task: Find all TODO comments related to performance Bad: Read all files looking for TODOs (context overflow) Good RLM: 1. FILTER: grep -r "// TODO.*performance" src/ 8 matches found 2. READ: For each match, read surrounding 10 lines Context contains only 80 lines 3. PROCESS: Categorize and prioritize TODOs Output summary Total tokens: <2,000 (vs 50,000+ for full codebase read) ``` **Pattern 3: Recursive Aggregation** ``` Task: Generate API documentation from 40 endpoint files Bad: Load all 40 files and try to document in one pass Good RLM: 1. Spawn 4 sub-agents, each handles 10 endpoints 2. Each sub-agent: - Uses Grep to extract route + handler signature - Generates doc snippet - Writes to intermediate file 3. Parent agent: - Reads 4 intermediate files (summaries, not raw code) - Combines into final documentation - Total context: ~5,000 tokens (vs 40,000+ direct) Recursive structure: Root Agent ├── Sub-agent 1 (endpoints 1-10) docs-batch-1.md ├── Sub-agent 2 (endpoints 11-20) docs-batch-2.md ├── Sub-agent 3 (endpoints 21-30) docs-batch-3.md └── Sub-agent 4 (endpoints 31-40) docs-batch-4.md Final aggregation from 4 batch files ``` ## Cost Model ### RLM vs Full-Context Processing **From REF-089**: RLMs are up to 3x cheaper than summarization agents while maintaining stronger performance. **Cost Comparison**: | Strategy | Token Cost | Information Loss | When Better | |----------|-----------|------------------|-------------| | **Full-context loading** | High (all content) | None (initially) | Files <2,000 tokens | | **Context compaction** | Medium (compression) | High (lossy) | Not recommended for dense tasks | | **RLM programmatic** | Low-Medium (targeted) | None (lossless) | Files >2,000 tokens, distributed info | | **RLM recursive** | Variable (sub-calls) | None (lossless) | Very large codebases, cross-cutting | **Median RLM Cost**: Comparable to or lower than base model (REF-089 Figure 3). **Variance**: High some tasks are cheap (simple filtering), others expensive (deep recursion). Use percentile-based cost tracking (p25, p50, p75, p95) to capture distribution. ### Cost Optimization Strategies **Strategy 1: Depth-Based Budgeting** ```yaml cost_by_recursion_depth: depth_0: 5,000 tokens # Root agent direct processing depth_1: 15,000 tokens # Root + sub-agents (no sub-sub-agents) depth_2: 40,000 tokens # Root + sub + sub-sub (rare) guideline: Prefer depth 0-1, avoid depth 2 unless truly necessary ``` **Strategy 2: Batch Size Tuning** ```yaml batch_sizing: small_batch: 3-5 files per sub-agent # Higher parallelism, more sub-calls medium_batch: 8-12 files per sub-agent # Balanced large_batch: 15-20 files per sub-agent # Lower parallelism, fewer sub-calls rule: Tune batch size based on file complexity - Simple files (models, configs): Large batches - Complex files (business logic): Small batches ``` **Strategy 3: Incremental vs Parallel** ```yaml aggregation_strategy: incremental: pattern: Process sequentially, save intermediate results to files cost: Lower (one agent active at a time) latency: Higher (sequential) when: Cost-constrained, not time-sensitive parallel: pattern: Spawn all sub-agents simultaneously cost: Higher (N agents active) latency: Lower (parallel execution) when: Time-sensitive, budget available ``` ## Metrics Track these metrics for RLM effectiveness: | Metric | Target | Indicates | |--------|--------|-----------| | Context window utilization | <50% | Efficient programmatic access | | Sub-call count per task | <10 (depth 1) | Appropriate decomposition | | Cost ratio (RLM vs direct) | <1.5x | RLM efficiency maintained | | Information completeness | >95% | No critical loss through filtering | | Compaction rate | <10% | Minimal lossy summarization | | Median tokens per task | <20k | Sustainable token usage | | P95 tokens per task | <100k | Outlier control | ## Platform Applicability These rules apply universally across all AI coding platforms: - Claude Code, Codex, Copilot, Cursor, Warp, Factory, OpenCode, Windsurf - Any agent working with large codebases or document corpora RLM patterns are platform-agnostic they depend on tool access (Read, Grep, Glob, Task) rather than specific model capabilities. ## Checklist Before processing large context: - [ ] Estimated total tokens if loaded directly (would it exceed 50% of context window?) - [ ] Applied keyword filtering via Grep before loading - [ ] Used line ranges in Read for targeted access - [ ] Maintained symbolic file handles rather than loading full content - [ ] Checked if recursive sub-calls would be more efficient (>10 files) - [ ] Budget allocated for sub-calls if using delegation - [ ] Intermediate results saved to files (REPL variables) if iterative - [ ] Cost tracking enabled to monitor token usage Before spawning sub-agents for RLM recursion: - [ ] Task requires distributed information access (>5 files) - [ ] Budget allocated (estimated cost < 70% of total budget) - [ ] Each sub-task has clear scope (subagent-scoping rules followed) - [ ] Aggregation strategy defined (parallel or incremental) - [ ] Parent agent will receive summaries, not raw content from sub-agents ## Limitations From REF-089 Appendix B, important limitations to be aware of: 1. **Synchronous sub-calls are slow**: Production systems should use async/parallel execution (AIWG already supports via Task tool parallelism) 2. **Models need coding ability**: Non-coder models struggle with programmatic context access (AIWG agents run in coding-capable environments by design) 3. **Output token limits matter**: Models with limited output tokens underperform as RLMs (provider model selection should consider this) 4. **High variance in costs**: Some RLM runs are expensive outliers (use percentile-based cost tracking, not just averages) ## References - @.aiwg/research/findings/REF-089-recursive-language-models.md - Complete research analysis - @.claude/rules/research-before-decision.md - Complementary research patterns - @.claude/rules/subagent-scoping.md - Context limits for delegation - @.claude/rules/tao-loop.md - TAO loop integration with RLM patterns - @tools/ralph-external/ - Ralph loop implementation - @tools/daemon/agent-supervisor.mjs - Agent orchestration - @tools/daemon/task-store.mjs - Persistent state (REPL variables) --- **Rule Status**: ACTIVE **Last Updated**: 2026-02-09 **Research Basis**: REF-089 (Zhang, Kraska, & Khattab, 2026) **Issue**: #322 (Core RLM Addon)