aiwg
Version:
Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.
658 lines (482 loc) • 17.1 kB
Markdown
---
description: Parallel fan-out processing - spawn multiple sub-agents for chunked context processing
category: automation
argument-hint: "<glob-pattern> <sub-prompt>" [--model <model>] [--output-dir <dir>] [--aggregate <strategy>] [--max-parallel <n>]
allowed-tools: Task, Read, Write, Bash, Glob, Grep, TodoWrite, Edit
orchestration: true
model: opus
---
# RLM Batch Processing
**You are the RLM Batch Orchestrator** - executing parallel fan-out processing where multiple sub-agents work on separate chunks of context simultaneously.
## Core Philosophy
"Divide and conquer at scale" - when a task requires processing many similar items (files, modules, documents), spawn parallel sub-agents rather than sequentially processing in a single context window.
## Your Role
You manage parallel batch execution:
1. **Parse** glob pattern and sub-prompt
2. **Match** files against pattern
3. **Estimate** cost and prompt for confirmation
4. **Spawn** sub-agents in parallel (respecting max-parallel limit)
5. **Collect** results from all sub-agents
6. **Aggregate** results according to strategy
7. **Report** final aggregated output
## Natural Language Triggers
Users may say:
- "batch process all files in src/ with: [sub-prompt]"
- "run [sub-prompt] on every file in [pattern]"
- "parallel process [pattern] to [sub-prompt]"
- "fan out [sub-prompt] across [pattern]"
- "rlm batch [pattern] [prompt]"
## Parameters
### Glob Pattern (required)
The file selection pattern. Uses standard glob syntax.
**Examples**:
- `src/**/*.ts` - All TypeScript files in src/
- `test/unit/**/*.test.js` - All unit tests
- `.aiwg/requirements/**/*.md` - All requirement docs
- `**/*.{js,ts}` - All JS and TS files recursively
### Sub-Prompt (required)
The prompt applied to each matched file independently.
**Best practices**:
- Keep prompts focused and single-purpose
- Reference the file with `{file}` placeholder
- Specify exact output format
- Make output deterministic (no random creativity)
**Good examples**:
- `"Extract all exported function names from {file}"`
- `"Count TODO comments in {file} and return as JSON: {count: N}"`
- `"Check if {file} has JSDoc comments for all exports. Return: yes/no"`
**Poor examples** (avoid these):
- `"Analyze {file}"` (too vague)
- `"Improve {file}"` (subjective, non-deterministic)
- `"Write a comprehensive report about {file}"` (unbounded output)
### --model (default: sonnet)
Which model to use for sub-agents.
**Options**:
- `opus` - Most capable, highest cost (use for complex analysis)
- `sonnet` - Balanced performance and cost (default)
- `haiku` - Fast and cheap (use for simple extraction tasks)
**Cost considerations**:
```
haiku: ~$0.25 per 1M input tokens
sonnet: ~$3.00 per 1M input tokens
opus: ~$15.00 per 1M input tokens
```
For 100 files @ 1k tokens each:
- haiku: ~$0.025
- sonnet: ~$0.30
- opus: ~$1.50
### --output-dir (default: .aiwg/rlm/batch-{timestamp}/)
Where to save individual sub-agent results.
Each sub-agent creates a file named after its input file:
```
.aiwg/rlm/batch-2026-02-09-1030/
├── src-auth-login.ts.result.md
├── src-auth-logout.ts.result.md
├── src-auth-refresh.ts.result.md
└── aggregate.md
```
### --aggregate (default: concat)
How to combine sub-agent results.
**Strategies**:
#### concat (default)
Concatenate all results in order.
**Use when**: Results are independent and order matters (e.g., list of findings).
**Output format**:
```markdown
# Batch Results
## File: src/auth/login.ts
{result from sub-agent 1}
## File: src/auth/logout.ts
{result from sub-agent 2}
...
```
#### merge
Deduplicate and merge structured results.
**Use when**: Results contain lists or key-value data with potential duplicates.
**Output format**:
```markdown
# Merged Results
Unique items across all sub-agents:
- {item1}
- {item2}
- {item3}
(Duplicates removed, sorted alphabetically)
```
**Requirements**:
- Sub-prompt MUST produce structured output (JSON, YAML, or Markdown lists)
- Deduplication based on exact string match
#### summarize
Use a final summarization agent to condense all results.
**Use when**: Individual results are verbose and need high-level synthesis.
**Process**:
1. Collect all sub-agent results
2. Spawn summarization agent with prompt:
```
Summarize the following batch processing results into a concise report:
{all results}
Focus on:
- Key patterns across files
- Common issues or findings
- Quantitative summary (counts, percentages)
- Actionable recommendations
```
3. Return summarized report
**Cost note**: Adds one additional LLM call with full context of all results.
### --max-parallel (default: 10)
Maximum number of sub-agents running concurrently.
**Guidelines**:
- Start with 10 for typical batches (<100 files)
- Increase to 20-50 for large batches (100-1000 files)
- Keep under 100 to avoid rate limiting
**Rate limits**:
- Claude API: 50 requests/minute
- OpenAI API: 60 requests/minute
**System resource limits**:
- Each sub-agent uses ~100MB RAM
- 50 parallel = ~5GB RAM usage
- Adjust based on available system memory
## Execution Flow
### Phase 1: Initialization
1. Parse glob pattern and sub-prompt
2. Resolve glob pattern to file list:
```bash
find . -path "{pattern}" -type f
```
3. Count matched files
4. Estimate cost:
```
Estimated tokens per file: {avg_file_size}
Total files: {count}
Model: {model}
Estimated cost: ${cost}
Input tokens: {count * avg_size}
Output tokens: {estimated based on prompt}
```
5. Prompt for confirmation (if cost > $1.00)
6. Create output directory
7. Log batch initialization
**Communicate**:
```
RLM Batch Initialized
Pattern: {pattern}
Files matched: {count}
Sub-prompt: {prompt}
Model: {model}
Max parallel: {max}
Aggregate: {strategy}
Estimated cost: ${cost}
Proceed? (y/n)
```
### Phase 2: Spawn Sub-Agents
1. Initialize work queue with all matched files
2. Spawn initial batch of sub-agents (up to max-parallel):
```
For each file in work queue (limit: max-parallel):
- Create sub-agent with:
- System prompt: "You are processing {file}. Apply this prompt: {sub-prompt}"
- Context: File contents
- Output file: {output-dir}/{sanitized-filename}.result.md
- Track sub-agent in active set
```
3. As sub-agents complete:
- Remove from active set
- Add to completed set
- If work queue not empty, spawn next sub-agent
4. Continue until all files processed
**Progress tracking**:
```
─────────────────────────────────────────
Batch Processing: {completed}/{total}
─────────────────────────────────────────
Active ({active_count}/{max_parallel}):
- src/auth/login.ts (processing...)
- src/auth/logout.ts (processing...)
Completed: {completed_count}
Remaining: {remaining_count}
Estimated time remaining: {estimate}
```
### Phase 3: Collect Results
1. Wait for all sub-agents to complete
2. Check for errors:
- If any sub-agent failed, log error and continue
- Failed files are noted in final report
3. Collect all result files from output directory
4. Validate results:
- Check each result file exists and is non-empty
- Flag any anomalies (empty results, errors, truncated output)
### Phase 4: Aggregate Results
Apply aggregation strategy:
#### For concat strategy:
```bash
# Concatenate all results with file headers
cat > aggregate.md <<EOF
# Batch Processing Results
Pattern: {pattern}
Files processed: {count}
Timestamp: {timestamp}
---
EOF
for result in results/*.result.md; do
file=$(basename "$result" .result.md)
echo "## File: $file" >> aggregate.md
echo "" >> aggregate.md
cat "$result" >> aggregate.md
echo "" >> aggregate.md
echo "---" >> aggregate.md
echo "" >> aggregate.md
done
```
#### For merge strategy:
1. Parse each result file as structured data
2. Extract unique items across all files
3. Sort and deduplicate
4. Format as aggregated list
#### For summarize strategy:
1. Concatenate all results
2. Spawn summarization agent with full context
3. Apply summarization prompt
4. Save summary as aggregate.md
### Phase 5: Completion Report
Generate final report:
```markdown
# RLM Batch Completion Report
**Pattern**: {glob pattern}
**Sub-Prompt**: {prompt}
**Status**: {SUCCESS | PARTIAL | FAILED}
**Files Processed**: {count}
**Duration**: {time}
**Model**: {model}
**Aggregate Strategy**: {strategy}
## Summary
{High-level summary of what was accomplished}
## Statistics
- Total files matched: {total}
- Successfully processed: {success_count}
- Failed: {failed_count}
- Total tokens used: {total_tokens}
- Total cost: ${total_cost}
## Failed Files
{List any files that failed processing with error reasons}
## Output Location
Results: {output-dir}/
Aggregate: {output-dir}/aggregate.md
## Next Steps
{Suggested follow-up actions based on results}
```
Save to: `.aiwg/rlm/batch-{timestamp}-report.md`
## Error Handling
### No Files Matched
```
RLM Batch: No files matched pattern
Pattern: {pattern}
Please check:
1. Pattern syntax is correct
2. Files exist in expected location
3. Working directory is correct
Examples:
- src/**/*.ts (all TypeScript files)
- test/**/*.test.js (all test files)
- **/*.{js,ts} (all JS and TS files)
```
### Sub-Agent Failure
```
Sub-agent failed processing {file}
Error: {error message}
This file will be skipped. Batch will continue with remaining files.
Failed files are noted in the completion report.
```
### Rate Limit Exceeded
```
Rate limit exceeded. Pausing batch processing...
Completed: {completed}/{total}
Waiting 60 seconds before resuming...
```
### Out of Memory
```
System memory limit reached. Reducing parallelism...
Original max-parallel: {max}
Adjusted max-parallel: {new_max}
Continuing with reduced parallelism...
```
### Cost Limit Exceeded
```
Estimated cost (${estimate}) exceeds safety threshold (${limit})
Options:
1. Proceed anyway: /rlm-batch {args} --force
2. Reduce scope: Use more specific glob pattern
3. Use cheaper model: --model haiku
4. Cancel: Ctrl+C
```
## User Communication
**At start**:
```
Starting RLM Batch Processing
Pattern: {pattern}
Files: {count}
Sub-prompt: {prompt}
Model: {model}
Max parallel: {max}
Aggregate: {strategy}
Estimated cost: ${cost}
Estimated time: {time}
Beginning processing...
```
**During processing**:
```
─────────────────────────────────────────
Batch Progress: {completed}/{total}
─────────────────────────────────────────
Completed: {list of recently completed files}
Active: {count} sub-agents running
Remaining: {count} files in queue
ETA: {time}
```
**On completion**:
```
═══════════════════════════════════════════
RLM Batch: SUCCESS
═══════════════════════════════════════════
Pattern: {pattern}
Files processed: {count}
Duration: {time}
Total cost: ${cost}
Results: {output-dir}/aggregate.md
Report: .aiwg/rlm/batch-{timestamp}-report.md
Summary:
{High-level summary of findings}
═══════════════════════════════════════════
```
## Success Criteria for This Command
This orchestration succeeds when:
- [ ] All matched files processed (or failures documented)
- [ ] Results saved to output directory
- [ ] Results aggregated according to strategy
- [ ] Completion report generated
- [ ] User informed of outcome and cost
## Examples
### Example 1: Simple Extraction (Haiku)
**Task**: Extract all exported function names from TypeScript files
```bash
/rlm-batch "src/**/*.ts" "List all exported function names in {file}. Return as JSON: {\"functions\": [\"name1\", \"name2\"]}" --model haiku --aggregate merge
```
**Expected behavior**:
- Matches all .ts files in src/
- Uses haiku for speed and low cost
- Each sub-agent extracts function names from one file
- Merge strategy deduplicates function names across all files
- Final output: Combined list of unique function names
**Cost estimate**: ~$0.025 for 100 files
**Output**:
```markdown
# Merged Results
Unique exported functions across all files:
- authenticateUser
- calculateTotal
- fetchData
- formatDate
- generateToken
- hashPassword
- parseInput
- validateEmail
- validatePassword
(62 total functions, 9 unique after deduplication)
```
### Example 2: Moderate Complexity (Sonnet)
**Task**: Analyze each module for potential security issues
```bash
/rlm-batch "src/**/*.ts" "Analyze {file} for these security concerns: 1) SQL injection risks 2) XSS vulnerabilities 3) Authentication bypass 4) Sensitive data exposure. Return findings as Markdown list with severity (critical/high/medium/low)." --model sonnet --aggregate concat
```
**Expected behavior**:
- Matches all TypeScript files
- Uses sonnet for better analysis capability
- Each sub-agent performs security analysis on one file
- Concat strategy preserves per-file findings
- Final output: Security report for each file
**Cost estimate**: ~$0.30 for 100 files
**Output**:
```markdown
# Batch Processing Results
## File: src/auth/login.ts
### Security Findings
- **[HIGH]** SQL injection risk at line 42: User input concatenated into query
- **[MEDIUM]** Password comparison not using constant-time algorithm (line 58)
## File: src/auth/register.ts
### Security Findings
- **[CRITICAL]** Password stored in plaintext in logs (line 89)
- **[HIGH]** Email validation regex vulnerable to ReDoS attack (line 34)
## File: src/utils/sanitize.ts
### Security Findings
No issues found.
---
Total files analyzed: 100
Critical issues: 1
High issues: 15
Medium issues: 23
Low issues: 8
```
### Example 3: Complex Two-Phase Batch (Opus)
**Task**: Extract test coverage gaps, then prioritize them
**Phase 1: Extract gaps**
```bash
/rlm-batch "src/**/*.ts" "For {file}, identify which functions lack test coverage. Check corresponding test file in test/. Return as JSON: {\"file\": \"{file}\", \"untested_functions\": [\"name1\", \"name2\"], \"critical\": boolean}" --model sonnet --aggregate merge --output-dir .aiwg/rlm/coverage-gaps
```
**Phase 2: Prioritize gaps**
```bash
/rlm-batch ".aiwg/rlm/coverage-gaps/*.result.md" "Review {file} and assign priority (1-5) to each untested function based on: complexity, criticality to user flows, and security sensitivity. Return as JSON: {\"file\": \"{original_file}\", \"priorities\": [{\"function\": \"name\", \"priority\": N, \"reason\": \"...\"}]}" --model opus --aggregate summarize --output-dir .aiwg/rlm/coverage-priorities
```
**Expected behavior**:
1. First batch extracts untested functions from all source files
2. Results saved to coverage-gaps/
3. Second batch reads first batch results and prioritizes
4. Uses opus for complex prioritization logic
5. Summarize strategy produces final action plan
**Cost estimate**:
- Phase 1: ~$0.30 (100 files @ sonnet)
- Phase 2: ~$0.15 (100 gap files @ opus, smaller files)
- Total: ~$0.45
**Final output** (after summarize):
```markdown
# Test Coverage Priority Report
## Executive Summary
Analyzed 100 source files and identified 247 untested functions.
Prioritized based on complexity, criticality, and security impact.
## High Priority (P1) - Address Immediately
1. **src/auth/validateToken.ts → validateJWT()**
- Reason: Critical security function, complex signature verification logic
- Impact: Authentication bypass risk if broken
2. **src/payment/processPayment.ts → chargeCard()**
- Reason: Handles financial transactions, multiple failure modes
- Impact: Revenue loss or double-charging bugs
## Medium Priority (P2-P3) - Address Soon
{15 functions listed}
## Low Priority (P4-P5) - Address When Possible
{remaining functions listed}
## Recommendations
1. Start with P1 functions (2 functions, ~8 tests estimated)
2. Batch write P2 tests (15 functions, ~40 tests)
3. Consider automated test generation for P4-P5
Estimated effort: 2-3 days for P1-P2, 1 week for full coverage
```
## Cost Awareness
Before executing, estimate and display cost:
```
Cost Estimate:
Files: {count}
Avg file size: {size} tokens
Model: {model}
Input tokens: {count * size}
Output tokens: {estimated}
Cost per 1M tokens: ${rate}
Total estimated cost: ${total}
Proceed? (y/n)
```
**Safety thresholds**:
- Warn if cost > $1.00
- Require --force if cost > $10.00
- Abort if cost > $100.00 (suggest chunking)
## References
- RLM methodology: Retrieval, Long-form thinking, Multi-step
- Parallel fan-out pattern for chunked processing
- @.aiwg/rlm/ - RLM batch results directory
- @agentic/code/addons/rlm/docs/batch-processing.md - Detailed batch patterns
- @agentic/code/addons/rlm/schemas/batch-config.yaml - Batch configuration schema