aiwg
Version:
Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.
711 lines (523 loc) • 17.5 kB
Markdown
# External Ralph Loop: Security and Safety Guide
> **WARNING**: External Ralph spawns headless Claude Code sessions with `--dangerously-skip-permissions`. This bypasses all permission prompts, enabling unsupervised file system access, command execution, and network requests. Use with extreme caution.
## Table of Contents
1. [Security Model](#security-model)
2. [Threat Analysis](#threat-analysis)
3. [Prerequisites and Access Control](#prerequisites-and-access-control)
4. [Environment Isolation](#environment-isolation)
5. [Cost and Resource Controls](#cost-and-resource-controls)
6. [Monitoring and Observability](#monitoring-and-observability)
7. [Incident Response](#incident-response)
8. [Best Practices Checklist](#best-practices-checklist)
9. [Operational Runbook](#operational-runbook)
## Security Model
### What External Ralph Does
External Ralph is a crash-resilient supervisor that:
1. Spawns Claude Code sessions as detached background processes
2. Monitors progress via heartbeats and state files
3. Automatically resumes after crashes
4. Iterates until completion criteria are met
### The `--dangerously-skip-permissions` Flag
**This is the critical security consideration.** When Claude Code runs with this flag:
| Capability | Normal Mode | Dangerous Mode |
|------------|-------------|----------------|
| Read files | Prompts user | **Automatic** |
| Write files | Prompts user | **Automatic** |
| Execute shell commands | Prompts user | **Automatic** |
| Network requests | Prompts user | **Automatic** |
| Install packages | Prompts user | **Automatic** |
| Modify system files | Prompts user | **Automatic** |
| Access environment variables | Prompts user | **Automatic** |
### Why This Flag Is Used
External Ralph runs as a background daemon without an interactive terminal. Since there's no user to respond to permission prompts, the flag is required for autonomous operation.
**This is a deliberate trade-off**: autonomous operation in exchange for bypassed safety prompts.
## Threat Analysis
### Threat Categories
#### 1. Unintended Destructive Actions
**Risk Level**: HIGH
Claude may misinterpret instructions and:
- Delete important files (`rm -rf`)
- Overwrite critical configurations
- Break working code while "improving" it
- Remove tests to make tests "pass"
**Mitigations**:
- Git-based recovery (ensure clean working tree before starting)
- Explicit completion criteria that validate correctness
- Regular checkpoints for rollback
- Anti-laziness rules prevent test deletion
#### 2. Excessive Resource Consumption
**Risk Level**: MEDIUM
Unsupervised sessions may:
- Consume excessive API tokens (cost)
- Fill disk with logs/artifacts
- Create infinite loops of file generation
- Exhaust memory with large context
**Mitigations**:
- Budget caps per iteration (`--budget`)
- Iteration limits (`--max-iterations`)
- Timeout per session (`--timeout`)
- Disk space monitoring
- Early stopping on high confidence
#### 3. Sensitive Data Exposure
**Risk Level**: MEDIUM-HIGH
The session can access:
- `.env` files with secrets
- Private keys and certificates
- Database credentials
- API tokens
**Mitigations**:
- Run in isolated environments (containers, VMs)
- Use `.gitignore` and `.claudeignore` patterns
- Audit file access in session transcripts
- Never run against production data
#### 4. Supply Chain Attacks
**Risk Level**: LOW-MEDIUM
Malicious prompts could instruct Claude to:
- Install compromised packages
- Add malicious dependencies
- Modify build scripts
**Mitigations**:
- Review all `package.json` changes
- Lock file integrity (`package-lock.json`)
- Run in network-isolated environments for sensitive code
- Audit all commits before merging
#### 5. Prompt Injection
**Risk Level**: LOW
If processing untrusted input, malicious content could:
- Override objectives
- Exfiltrate data via file writes
- Modify behavior through injected instructions
**Mitigations**:
- Never use untrusted input as objectives
- Sanitize any external data in prompts
- Review prompts before execution
## Prerequisites and Access Control
### User Authorization
External Ralph should only be run by authorized users who:
1. **Understand the risks** - Read this entire document
2. **Have appropriate permissions** - Admin/developer access to the target codebase
3. **Accept responsibility** - For all actions taken by the autonomous session
### System Requirements
```yaml
requirements:
claude_code: ">=1.0.0" # Claude Code CLI must be installed
node: ">=18.0.0" # Node.js for orchestrator
git: ">=2.0.0" # Git for state tracking
permissions:
file_system: "read/write to project directory"
network: "outbound to api.anthropic.com"
process: "spawn child processes"
```
### Pre-Flight Checks
Before starting an external Ralph loop:
```bash
# 1. Verify clean git state
git status
# Should show clean working tree or only expected changes
# 2. Create safety branch
git checkout -b ralph-session-$(date +%Y%m%d-%H%M%S)
# 3. Verify no sensitive files are accessible
cat .gitignore # Should include .env, *.pem, etc.
# 4. Check Claude Code installation
claude --version
# 5. Verify completion criteria is testable
# Example: "npm test" should be runnable
```
### Access Control Recommendations
| Environment | Recommendation |
|-------------|----------------|
| Local development | Acceptable with caution |
| Shared dev server | Requires team notification |
| CI/CD pipeline | Requires security review |
| Production-adjacent | NOT RECOMMENDED |
| Production | PROHIBITED |
## Environment Isolation
### Recommended: Container Isolation
Run External Ralph in an isolated container:
```dockerfile
# Dockerfile.ralph-sandbox
FROM node:20-bookworm
# Install Claude Code
RUN npm install -g @anthropic-ai/claude-code
# Create non-root user
RUN useradd -m ralph
USER ralph
WORKDIR /workspace
# Copy only necessary project files
COPY --chown=ralph:ralph . .
# Limit capabilities
# No network access except Anthropic API
# No access to host filesystem
# Limited memory and CPU
```
```yaml
# docker-compose.ralph.yml
version: '3.8'
services:
ralph-sandbox:
build:
context: .
dockerfile: Dockerfile.ralph-sandbox
volumes:
- ./workspace:/workspace
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# Network policy: only allow api.anthropic.com
networks:
- ralph-isolated
deploy:
resources:
limits:
memory: 4G
cpus: '2'
```
### VM Isolation
For higher security requirements:
```bash
# Create isolated VM (example with multipass)
multipass launch --name ralph-sandbox --memory 4G --disk 20G
# Transfer project (sanitized)
multipass transfer ./project ralph-sandbox:/home/ubuntu/
# Run inside VM
multipass exec ralph-sandbox -- npm run ralph-external
```
### Network Isolation
Restrict network access to only required endpoints:
```yaml
# Required outbound connections
allowed_endpoints:
- api.anthropic.com:443 # Claude API
- registry.npmjs.org:443 # npm packages (if needed)
- github.com:443 # git operations (if needed)
# Block everything else
blocked:
- internal networks
- localhost services
- file transfer protocols
```
### File System Sandboxing
Limit file system access:
```yaml
# .claudeignore - Files Claude should not access
.env*
*.pem
*.key
credentials/
secrets/
~/.ssh/
~/.aws/
~/.config/
```
## Cost and Resource Controls
### Budget Controls
Always set explicit budget limits:
```bash
# Start with conservative limits
ralph-external "Fix tests" \
--completion "npm test passes" \
--budget 2.0 \ # $2 per iteration max
--max-iterations 5 \ # 5 iterations max ($10 total worst case)
--timeout 30 # 30 minutes per iteration
```
### Cost Estimation
| Model | Approx. Cost/Hour | 8-Hour Session |
|-------|-------------------|----------------|
| Haiku | $0.50 - $2 | $4 - $16 |
| Sonnet | $2 - $8 | $16 - $64 |
| Opus | $10 - $40 | $80 - $320 |
**Recommendation**: Start with Sonnet, use Opus only for complex tasks.
### Resource Limits Configuration
```javascript
// Orchestrator configuration
const config = {
// Cost controls
budgetPerIteration: 2.0, // USD
maxIterations: 5, // Hard limit
// Time controls
timeoutMinutes: 60, // Per iteration
maxTotalHours: 8, // Total session
// Progress controls
enableEarlyStopping: true, // Stop at high confidence
stallThreshold: 3, // Iterations without progress
// Resource controls
memory: 3, // Reflection memory capacity
checkpointIntervalMinutes: 15, // Checkpoint frequency
};
```
### Early Stopping
Enable early stopping to prevent runaway sessions:
```javascript
// EarlyStopping configuration
{
minConfidence: 0.95, // Stop if 95% confident
minIterations: 2, // But run at least 2 iterations
progressThreshold: 0.1, // Require 10% progress per iteration
stallLimit: 3, // Stop after 3 iterations without progress
}
```
## Monitoring and Observability
### Real-Time Monitoring
Monitor running sessions:
```bash
# Check status
aiwg ralph-status
# Watch session output (if attached)
tail -f .aiwg/ralph-external/loops/{loop-id}/daemon-output.log
# Monitor heartbeats
watch -n 5 'cat .aiwg/ralph-external/loops/{loop-id}/state.json | jq .lastHeartbeat'
```
### Heartbeat Monitoring
External Ralph records heartbeats every 30 seconds:
```json
{
"loopId": "ralph-fix-tests-abc123",
"status": "running",
"currentIteration": 3,
"lastHeartbeat": "2026-02-03T12:34:56Z",
"currentPid": 12345
}
```
**Alert conditions**:
- No heartbeat for > 2 minutes (stale)
- PID no longer alive (crashed)
- Iteration stuck for > timeout duration
### Log Analysis
Key logs to monitor:
```bash
# Daemon output (main log)
.aiwg/ralph-external/loops/{loop-id}/daemon-output.log
# Per-iteration logs
.aiwg/ralph-external/iterations/{N}/
├── stdout.log # Claude's output
├── stderr.log # Errors
├── pre-snapshot.json # State before
├── post-snapshot.json # State after
└── analysis.json # Iteration analysis
```
### Metrics to Track
| Metric | Normal Range | Alert Threshold |
|--------|--------------|-----------------|
| Iteration duration | 10-60 min | > 90 min |
| Tool calls per iteration | 20-100 | > 500 |
| Error count per iteration | 0-5 | > 20 |
| Files modified per iteration | 1-20 | > 100 |
| Progress per iteration | 5-30% | 0% for 2+ iterations |
### Alerting Configuration
```yaml
# Example monitoring configuration
alerts:
heartbeat_stale:
condition: "no heartbeat for > 2 minutes"
action: "notify + investigate"
excessive_errors:
condition: "error_count > 20 in iteration"
action: "pause + review"
no_progress:
condition: "0% progress for 3 iterations"
action: "abort + review"
budget_exceeded:
condition: "iteration cost > budget"
action: "auto-abort"
excessive_file_changes:
condition: "files_modified > 100"
action: "pause + human review"
```
## Incident Response
### Immediate Actions
#### Session Running Amok
```bash
# 1. Abort immediately
aiwg ralph-abort
# 2. Verify process killed
ps aux | grep claude
pkill -9 -f "claude.*dangerously" # Force kill if needed
# 3. Assess damage
git status
git diff
# 4. Revert if necessary
git checkout -b backup-before-revert
git reset --hard HEAD~N # Or specific commit
```
#### Cost Runaway
```bash
# 1. Abort session
aiwg ralph-abort
# 2. Check actual usage
# Visit Anthropic console for actual costs
# 3. Review iteration history
cat .aiwg/ralph-external/session-state.json | jq '.iterations | length'
```
#### Suspected Security Issue
```bash
# 1. Abort immediately
aiwg ralph-abort
# 2. Isolate environment
# Disconnect network if in VM/container
# 3. Preserve evidence
tar -czf ralph-incident-$(date +%s).tar.gz \
.aiwg/ralph-external/ \
~/.claude/projects/
# 4. Review session transcripts
cat .aiwg/ralph-external/iterations/*/session-transcript.jsonl | \
jq 'select(.tool_use) | .tool_use.name' | \
sort | uniq -c | sort -rn
# 5. Check for suspicious activity
git log --oneline -20
git diff HEAD~10..HEAD
```
### Recovery Procedures
#### From Crash
```bash
# 1. Check crash state
aiwg ralph-status
# 2. Review last iteration
cat .aiwg/ralph-external/iterations/*/analysis.json | jq '.shouldContinue'
# 3. Resume if appropriate
aiwg ralph-resume
# Or start fresh
aiwg ralph-abort
git reset --hard
```
#### From Bad Changes
```bash
# 1. Identify good state
git log --oneline -20
# 2. Create backup branch
git checkout -b recovery-backup
# 3. Reset to good state
git checkout main
git reset --hard {good-commit-hash}
# 4. Cherry-pick good changes if any
git cherry-pick {good-change-hash}
```
## Best Practices Checklist
### Before Starting
- [ ] **Read this document** in full
- [ ] **Understand the risks** of `--dangerously-skip-permissions`
- [ ] **Clean git state** - commit or stash all changes
- [ ] **Create safety branch** for easy recovery
- [ ] **Set budget limits** appropriate for the task
- [ ] **Set iteration limits** (start conservative)
- [ ] **Verify completion criteria** is testable
- [ ] **Check `.gitignore`** excludes sensitive files
- [ ] **Notify team** if on shared infrastructure
- [ ] **Have rollback plan** ready
### During Execution
- [ ] **Monitor periodically** - check status every 15-30 minutes
- [ ] **Watch for stalls** - no progress for multiple iterations
- [ ] **Review git changes** - `git diff` between iterations
- [ ] **Check logs for errors** - excessive errors indicate problems
- [ ] **Be ready to abort** - don't let bad sessions continue
### After Completion
- [ ] **Review all changes** - `git diff` before committing
- [ ] **Run tests** - verify completion criteria actually met
- [ ] **Check for anti-patterns** - test deletion, feature removal
- [ ] **Review costs** - check actual API usage
- [ ] **Clean up** - remove working files if not needed
- [ ] **Document learnings** - what worked, what didn't
### For Production-Adjacent Work
- [ ] **Use isolated environment** (container, VM)
- [ ] **No production credentials** in environment
- [ ] **No production data** accessible
- [ ] **Network isolation** from production systems
- [ ] **Security review** of completion criteria
- [ ] **Human review** before merging any changes
## Operational Runbook
### Starting a Session
```bash
# 1. Pre-flight checks
git status # Verify clean state
git checkout -b ralph-$(date +%Y%m%d) # Safety branch
# 2. Review objective and criteria
echo "Objective: Fix all failing tests"
echo "Criteria: npm test exits 0"
# 3. Start with conservative limits
aiwg ralph-external "Fix all failing tests" \
--completion "npm test exits 0" \
--max-iterations 5 \
--budget 2.0 \
--timeout 30
# 4. Note the loop ID
# Output: Started loop ralph-fix-tests-abc123
```
### Monitoring a Session
```bash
# Quick status
aiwg ralph-status
# Detailed state
cat .aiwg/ralph-external/session-state.json | jq
# Watch progress
watch -n 30 'aiwg ralph-status'
# Tail logs
tail -f .aiwg/ralph-external/loops/*/daemon-output.log
```
### Aborting a Session
```bash
# Graceful abort
aiwg ralph-abort
# Verify stopped
aiwg ralph-status
# Force kill if needed
pkill -9 -f "ralph-external"
pkill -9 -f "claude.*dangerously"
```
### Resuming a Session
```bash
# Check if resumable
aiwg ralph-status
# Resume with increased iterations if needed
aiwg ralph-resume --max-iterations 10
```
### Post-Session Review
```bash
# Review all changes
git diff origin/main
# Check test status
npm test
# Review iteration history
for f in .aiwg/ralph-external/iterations/*/analysis.json; do
echo "=== $f ==="
jq '.completionPercentage, .learnings' "$f"
done
# Clean up if satisfied
git checkout main
git merge ralph-$(date +%Y%m%d)
git branch -d ralph-$(date +%Y%m%d)
```
## Summary
External Ralph is a powerful tool for autonomous, long-running task execution. The `--dangerously-skip-permissions` flag makes this possible but introduces significant security considerations.
**Key Takeaways**:
1. **Understand the risks** - All permission checks are bypassed
2. **Isolate when possible** - Containers/VMs provide defense in depth
3. **Set limits** - Budget, iterations, and timeouts are mandatory
4. **Monitor actively** - Don't leave sessions completely unattended
5. **Have recovery plans** - Git branches and rollback procedures ready
6. **Review before merging** - Human oversight of all changes
**When NOT to use External Ralph**:
- Against production systems or data
- With production credentials accessible
- On untrusted or shared infrastructure
- Without understanding the risks
- Without ability to monitor and abort
**When External Ralph is appropriate**:
- Local development with clean git state
- Isolated test environments
- Well-defined, testable objectives
- Tasks that benefit from persistence across crashes
- When human is available to monitor and intervene
## References
- `tools/ralph-external/README.md` - Technical documentation
- `tools/ralph-external/session-launcher.mjs` - Session spawning implementation
- `.claude/rules/anti-laziness.md` - Safeguards against destructive shortcuts
- Claude Code documentation - `--dangerously-skip-permissions` flag details