aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

aiwg.io

jmagly/aiwg

161 lines (113 loc) • 6.27 kB

Markdown

--- id: no-time-estimates severity: HIGH applies_to: [all-agents, orchestrators, planners, task-decomposers] tags: [estimation, planning, scope, agents, parallelism] --- # No Time Estimates — Agent-Oriented Estimation Instead ## Rule **Never produce wall-clock time estimates.** Human+AI development velocity is unknowable and varies non-linearly with operator skill, model quality, task decomposability, and tool configuration. Time estimates are noise that pollutes context and creates false expectations. Instead, express all effort in **agent-oriented units**: scope, agent count, parallelism potential, and pass count. --- ## Why Time Estimates Fail in AI-Assisted Work ### The Velocity Problem Traditional estimates assume a known baseline: "1 developer = N hours per story point." This assumption breaks down completely in human-AI centaur configurations (REF-169) where: - One operator can direct 1–20+ specialized agents simultaneously - Parallel execution can compress sequential timelines by 60–80% (REF-088) - Model capability and prompt quality contribute more to speed than headcount - The same task can complete in minutes or hours depending on tool configuration ### The Non-Linearity Problem More agents ≠ faster output. The DeepMind scaling research (REF-086) shows: - Coordination overhead grows as n*(n-1)/2 communication paths - Above 4 concurrent agents: 17.2× error amplification in "bag of agents" architectures - Agent quantity, coordination topology, model capability, and task properties interact — no simple multiplier exists ### The Duration Problem Longer runs don't produce proportionally better results. REF-127 documents: - 35-minute agent degradation threshold: performance degrades measurably beyond this point - Doubling agent run duration quadruples failure rate - Time spent ≠ progress made ### The Variance Problem AI-assisted developer productivity studies show enormous variance across operators, tasks, and domains. Any single estimate will be wrong by an order of magnitude for some combination of factors. Publishing a time estimate is publishing noise. --- ## What to Estimate Instead ### 1. Scope Units Decompose the work into atomic, independently-deliverable items. Count them. Each scope unit should be: - Independently verifiable (testable or demonstrable) - Completable in one agent loop cycle - Named precisely ("add JWT validation to auth middleware" not "auth work") **Output format**: ``` Scope: 7 atomic items - UC-001: Add JWT validation to POST /auth/login - UC-002: Add refresh token endpoint - UC-003: Add token blacklist on logout ... ``` ### 2. Agent Count and Roles State how many specialized agents are needed and what each does. Use the 3–7 sweet spot (REF-088). Above 7, establish hierarchical sub-teams with explicit coordinators. **Output format**: ``` Agents required: 4 - Planner (orchestrator): decomposes task graph, resolves blockers - Security Auditor: validates auth implementation against OWASP - Test Engineer: writes and runs integration tests - Code Reviewer: checks patterns and consistency ``` ### 3. Parallelism Map Classify each scope unit as parallel-ready or sequential-dependent. This is the most actionable planning output — it determines actual throughput. **Output format**: ``` Parallel batch 1 (can run simultaneously): - UC-001, UC-002, UC-003 (no dependencies) Sequential gate: integration test suite must pass before: Parallel batch 2: - UC-004, UC-005 (depend on batch 1) ``` ### 4. Pass Estimate Estimate how many agent loop iterations will be needed to reach the quality gate. Base this on: - Complexity of the verification command - Number of interacting systems - How tight the quality gate is **Output format**: ``` Quality gate: npx tsc --noEmit && npm test passes with 0 failures Estimated passes: 2–4 Pass 1: Initial implementation Pass 2: Fix type errors and test failures Pass 3–4: Edge cases if integration reveals interaction bugs ``` **Never say**: "This will take 2–3 days." **Always say**: "This is 3 scope units, 2 passes to quality gate, parallelizable with 2 agents." ### 5. Quality Gate Clarity Before any estimate, verify that completion criteria are measurable and agent-executable. If they aren't, getting that clarity is the first deliverable. Per vague-discretion rule: "zero bugs," "code looks good," and "thorough" are not quality gates. Provide commands that exit 0 on success. --- ## Prohibited Phrases The following patterns are banned from agent output in planning and estimation contexts: | Banned | Replace With | |--------|-------------| | "This will take N days/hours/weeks" | "This is N scope units" | | "Expected duration: X minutes" | "Estimated passes: N" | | "This should be quick" | "This is 1 scope unit, 1 pass" | | "This is a large task" | "This is N scope units, requires batching" | | "Complex, may take a while" | "Sequential dependency chain: 3 gates before parallelism" | | "Approximately N hours of work" | N/A — drop entirely | | "Estimated completion: [timestamp]" | N/A — drop entirely | The one exception: if the user explicitly asks for a time estimate and acknowledges the variance, you may offer a range with explicit caveats about the sources of variance. Even then, anchor the range to agent count and pass count, not to assumed human hours. --- ## Application This rule applies whenever an agent is: - Planning a task or sprint - Decomposing work for another agent - Responding to "how long will this take?" - Writing completion reports - Generating ADRs or phase plans - Responding to scope questions in issues or PRs --- ## References - REF-086: DeepMind multi-agent scaling — coordination tax, 17.2× error amplification above 4 agents, n*(n-1)/2 path overhead - REF-088: DEV multi-agent guide — 3–7 agent sweet spot, 60–80% time compression from parallelism - REF-127: Long-running agents — 35-min degradation threshold, doubling duration quadruples failure - REF-169: Evans et al. 2026 — centaur configurations, one human directing many agents, velocity is non-scalar - vague-discretion rule: measurable completion criteria requirement - subagent-scoping rule: parallel vs sequential decomposition patterns