aiwg
Version:
Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo
190 lines (144 loc) • 11.1 kB
Markdown
# Error Handling Tree
## Metadata
- ID: DES-EHT-`id`
- Owner: `name/role/team`
- Contributors: `list`
- Reviewers: `list`
- Team: `team`
- Stakeholders: `list`
- Status: `draft/in-progress/blocked/approved/done`
- Dates: created `YYYY-MM-DD` / updated `YYYY-MM-DD` / due `YYYY-MM-DD`
- Related: UC-`id`, REQ-`id`, DES-`id`, BS-`id`, IC-`id`, PSC-`id`, CODE-`module`, TEST-`id`
## Related Templates
- agentic/code/frameworks/sdlc-complete/templates/analysis-design/pseudocode-spec-template.md
- agentic/code/frameworks/sdlc-complete/templates/analysis-design/method-interface-contract-template.md
- agentic/code/frameworks/sdlc-complete/templates/analysis-design/activity-diagram-spec-template.md
- agentic/code/frameworks/sdlc-complete/templates/analysis-design/state-machine-spec-template.md
## Traceability
- Parent Use Case: UC-`id` — `title`
- Behavioral Spec: BS-`id`
- Interface Contracts: DES-MIC-`id`, DES-MIC-`id`
- Pseudo-Code Specs: DES-PSC-`id`, DES-PSC-`id`
## Error Context
- Component / Service: `fully qualified name of the component this tree covers`
- Scope: `single method / module / service boundary / cross-service flow`
- Error Philosophy: `fail-fast / fail-safe / retry-first / circuit-breaker`
- Caller Expectations: `caller receives typed error / HTTP status / event / nothing (fire-and-forget)`
## Error Propagation Diagram
```mermaid
flowchart TD
Operation([Operation]) --> Decision{Success?}
Decision -->|Yes| Success([Return Result])
Decision -->|No| Classify{Error Type?}
Classify -->|Transient| Retry[Retry with Backoff]
Retry --> RetryDecision{Max Retries?}
RetryDecision -->|No| Operation
RetryDecision -->|Yes| CircuitBreaker[Open Circuit Breaker]
CircuitBreaker --> Escalate([Escalate to Caller])
Classify -->|Validation| Reject([Return Validation Error])
Classify -->|Fatal| Log[Log + Alert]
Log --> Abort([Abort with Error])
```
## Exception Catalog
Every exception this component can raise or receive. Each row must trace to an interface contract exception specification (DES-MIC).
| ID | Exception | Type | Source | Severity | Transient | Notes |
| -- | --------- | ---- | ------ | -------- | --------- | ----- |
| E01 | `ExceptionName` | checked/unchecked | `originating component or operation` | `fatal/degraded/warning` | yes/no | `additional context` |
## Error Handling Matrix
Map each exception to its handler and recovery strategy.
| Exception (ID) | Detection Point | Handler Action | Recovery Strategy | Fallback | Caller Notification |
| --------------- | --------------- | -------------- | ----------------- | -------- | ------------------- |
| E01 | `where in the flow this is caught` | `what the handler does` | `retry / compensate / abort / degrade` | `fallback behavior or none` | `error code / HTTP status / event` |
## Retry Specifications
For each transient exception that is retried, document the retry policy.
| Exception (ID) | Max Retries | Backoff Strategy | Initial Delay | Max Delay | Jitter | Circuit Breaker Threshold |
| --------------- | ----------- | ---------------- | ------------- | --------- | ------ | ------------------------- |
| E01 | `count` | `constant/linear/exponential` | `ms` | `ms` | `yes/no` | `N failures in M seconds` |
## Compensation Actions
For operations that must be undone on failure (saga pattern), document the compensation chain.
| Failed Operation | Compensation Action | Idempotent | Timeout | Owner |
| ---------------- | ------------------- | ---------- | ------- | ----- |
| `what succeeded before the failure` | `what must be undone` | yes/no | `ms` | `component responsible` |
## Error Propagation Rules
Define how errors flow between layers. Every boundary crossing must have an explicit mapping.
| Source Layer | Source Error | Target Layer | Target Error | Transformation | Information Lost |
| ------------ | ----------- | ------------ | ------------ | -------------- | ---------------- |
| `service / module` | `internal error` | `API / caller` | `external error` | `mapping rule` | `what details are stripped (security)` |
## Logging and Observability
| Exception (ID) | Log Level | Log Fields | Alert Rule | Dashboard |
| --------------- | --------- | ---------- | ---------- | --------- |
| E01 | `error/warn/info` | `fields to capture (no PII in plaintext)` | `threshold for paging` | `link to dashboard or panel` |
## Completeness Checklist
- [ ] Every exception in the Exception Catalog has a row in the Error Handling Matrix
- [ ] Every transient exception has a Retry Specification
- [ ] Every multi-step operation with side effects has a Compensation Action chain
- [ ] Error Propagation Rules cover every layer boundary (service → API, module → module)
- [ ] No exception is silently swallowed — every handler has an explicit recovery or escalation
- [ ] Logging captures enough context to diagnose without exposing PII
- [ ] Retry policies have both max-retries and circuit-breaker thresholds
- [ ] The Error Propagation Diagram matches the Exception Catalog
- [ ] Fatal errors have alerting rules defined
## How to Fill This Template
1. **Inventory Exceptions**: Start from the interface contracts (DES-MIC) for all methods in scope. Every exception spec becomes a row in the Exception Catalog.
2. **Classify**: Mark each exception as transient or permanent. Transient errors get retry specs; permanent errors get immediate handling.
3. **Draw the Diagram**: Sketch the error flow using MermaidJS. Show the decision tree: success → return, transient → retry → circuit breaker, validation → reject, fatal → abort.
4. **Fill the Handling Matrix**: For each exception, document where it's detected, what the handler does, and how the system recovers.
5. **Define Retry Policies**: For transient errors, specify backoff strategy, max retries, and circuit breaker thresholds. Avoid unbounded retries.
6. **Define Compensations**: For saga-style flows, document what must be undone when a later step fails. Every compensation must be idempotent.
7. **Map Propagation Rules**: At every layer boundary, document how internal errors are translated to external errors. Strip internal details for security.
8. **Add Observability**: Every exception needs a log level and captured fields. Fatal errors need alerting rules.
9. **Validate**: Walk the completeness checklist. No silent swallowing; no unbounded retries; no unlogged fatals.
## Example
### Component: PaymentService
**Scope**: All operations in `PaymentService` module.
**Error Philosophy**: Retry-first for transient failures; fail-fast for validation; compensate for partial success.
```mermaid
flowchart TD
Authorize([authorize payment]) --> AuthDecision{Success?}
AuthDecision -->|Yes| AuthSuccess([Return authorizationId])
AuthDecision -->|No| AuthClassify{Error Type?}
AuthClassify -->|Network timeout| RetryAuth[Retry with exponential backoff]
RetryAuth --> RetryCheck{Retries < 3?}
RetryCheck -->|Yes| Authorize
RetryCheck -->|No| CBOpen[Open Circuit Breaker]
CBOpen --> AuthFail([PaymentUnavailableException])
AuthClassify -->|Invalid card| CardReject([InvalidPaymentMethodException])
AuthClassify -->|Insufficient funds| FundsReject([InsufficientFundsException])
AuthClassify -->|Provider error| ProviderFail[Log + Alert]
ProviderFail --> AuthFail
```
**Exception Catalog**:
| ID | Exception | Type | Source | Severity | Transient | Notes |
| -- | --------- | ---- | ------ | -------- | --------- | ----- |
| E01 | NetworkTimeoutException | unchecked | HTTP client → payment provider | degraded | yes | Provider API latency spike |
| E02 | InvalidPaymentMethodException | checked | Payment provider response | warning | no | Card expired, invalid number, etc. |
| E03 | InsufficientFundsException | checked | Payment provider response | warning | no | Customer has insufficient balance |
| E04 | PaymentProviderException | unchecked | Payment provider 5xx | fatal | yes | Provider-side outage |
| E05 | PaymentUnavailableException | checked | Circuit breaker open | degraded | no | Surfaced to caller after retries exhausted |
**Error Handling Matrix**:
| Exception (ID) | Detection Point | Handler Action | Recovery Strategy | Fallback | Caller Notification |
| --------------- | --------------- | -------------- | ----------------- | -------- | ------------------- |
| E01 | HTTP client timeout | catch, increment retry counter | retry with backoff | open circuit breaker after max retries | PaymentUnavailableException |
| E02 | Provider response code `card_invalid` | map to domain exception | none — immediate rejection | none | InvalidPaymentMethodException (400) |
| E03 | Provider response code `insufficient_funds` | map to domain exception | none — immediate rejection | none | InsufficientFundsException (402) |
| E04 | Provider HTTP 5xx | log error, increment retry counter | retry with backoff | open circuit breaker | PaymentUnavailableException (503) |
| E05 | Circuit breaker state check | skip provider call | none — circuit is open | none | PaymentUnavailableException (503) |
**Retry Specifications**:
| Exception (ID) | Max Retries | Backoff Strategy | Initial Delay | Max Delay | Jitter | Circuit Breaker Threshold |
| --------------- | ----------- | ---------------- | ------------- | --------- | ------ | ------------------------- |
| E01 | 3 | exponential | 200ms | 5000ms | yes (0-100ms) | 5 failures in 60 seconds |
| E04 | 3 | exponential | 500ms | 10000ms | yes (0-200ms) | 3 failures in 30 seconds |
**Error Propagation Rules**:
| Source Layer | Source Error | Target Layer | Target Error | Transformation | Information Lost |
| ------------ | ----------- | ------------ | ------------ | -------------- | ---------------- |
| PaymentService | NetworkTimeoutException | OrderService (caller) | PaymentUnavailableException | wrap with generic message | provider URL, timeout duration |
| PaymentService | PaymentProviderException | OrderService (caller) | PaymentUnavailableException | wrap with generic message | provider error body, trace ID (logged internally) |
| OrderService | PaymentUnavailableException | API Gateway | 503 Service Unavailable | map to HTTP status | exception stack trace, internal error ID |
## Agent Notes
- Create one DES-EHT per component or service boundary; do not mix error trees from unrelated components.
- Every exception must trace back to a DES-MIC exception specification — if an exception has no interface contract, it's undocumented behavior.
- Retry policies must always have a ceiling (max retries + circuit breaker). Unbounded retries are a reliability hazard.
- Compensation actions must be idempotent — a compensation that fails and is retried must produce the same result.
- Error propagation rules should strip internal details at every trust boundary to prevent information leakage.
- Generate negative test cases directly: one per exception, one per retry exhaustion, one per circuit breaker trip, one per compensation chain.
- Save finalized spec to `.aiwg/architecture/error-handling/DES-EHT-{id}.md`.