aiwg
Version:
Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo
207 lines (151 loc) • 7.23 kB
Markdown
# Pipeline Patterns
Six canonical patterns for LLM inference pipelines. Start with the simplest one that meets your requirements.
---
## Decision Guide
Apply in order — stop at the first match:
```
Tool use + dynamic branching?
→ Embedded Agent
Explicit states + error recovery + compliance auditability?
→ State Machine
External document retrieval required?
→ RAG Pipeline
Runtime prompt assembly (multi-tenant, feature flags)?
→ Dynamic Prompt
Quality gate on output only (no multi-step pipeline)?
→ Eval Loop (standalone)
Everything else:
→ Simple Chain ← DEFAULT
```
**Bias toward Simple Chain.** It handles ≥70% of standard use cases. Complexity is a cost — in code, in ops, in debugging.
---
## Pattern 1: Simple Chain
**When to use**: Single-responsibility tasks, high-volume inference, latency-sensitive paths, cost-constrained deployments.
**Structure**:
```
Input → [Prompt A] → LLM → Parse → [Prompt B] → LLM → Output
```
**Generated artifacts**:
- `prompts/step-a.prompt.md`, `prompts/step-b.prompt.md`
- `pipeline.config.yaml`
- `src/pipeline.py` or `src/pipeline.ts`
- `eval/cases.jsonl`, `eval/eval.py`
- `cost-estimate.md`
**Anti-patterns to avoid**:
- Adding agent loop "just in case"
- Using sonnet when haiku passes eval at >85%
- Adding framework dependencies not needed for the task
---
## Pattern 2: Embedded Agent
**When to use**: Routing decisions, structured extraction with ambiguity, tool-gated tasks needing retry logic but not full autonomy.
**The embedded agent is a component in a flow — not the flow itself.** Key constraints:
- ≤5 tools
- Bounded iterations (max_iterations required — no infinite loops)
- Deterministic exit conditions
- Falls back to `escalate` if exit condition not met
**Structure**:
```
Flow → [Agent: classify + route] → Flow
Flow → [Agent: extract structured data] → Flow
```
**Generated artifacts**:
- `prompts/system.prompt.md` (scoped system prompt)
- `tools/` (typed tool definitions)
- `pipeline.config.yaml` (with `agent_config`)
- `src/agent.py` or `src/agent.ts`
**When NOT to use**:
- More than 5 tools needed → redesign as State Machine or pipeline steps
- Unbounded iteration required → use Ralph loop (development) or State Machine (production)
- Full autonomy needed → use AIWG-style agents, not embedded agent
---
## Pattern 3: State Machine
**When to use**: Document processing pipelines, multi-stage classification, workflows with explicit retry/escalation logic, compliance-critical flows where state must be auditable.
**Structure**:
```
INIT → EXTRACT → VALIDATE → [PASS → ENRICH → OUTPUT] | [FAIL → RETRY(n) → ESCALATE]
```
**Generated artifacts**:
- `fsm.config.yaml` (states, transitions, guards)
- `prompts/` (one prompt file per LLM state)
- `src/pipeline.py` or `src/pipeline.ts` (FSM runtime)
- `audit/transitions.jsonl` (append-only audit log)
**Key principles**:
- Every state has a defined type (`llm`, `transform`, `decision`, `terminal`, `escalate`)
- Every transition has a guard condition
- Terminal states have explicit outcomes: `accept`, `reject`, `escalate`
- Max retries defined — no infinite loops
- Audit log captures all transitions
**When NOT to use**:
- Simple sequential steps with no branching → Simple Chain
- Need for unbounded tool use → Embedded Agent or full agent system
---
## Pattern 4: RAG Pipeline
**When to use**: Knowledge base Q&A, document-grounded generation, any case where the LLM needs external context it cannot have in the system prompt.
**Structure**:
```
Query → Embed → Retrieve(k) → Rerank(optional) → [Context + Query → Prompt] → LLM → Response
```
**Generated artifacts**:
- `retrieval.config.yaml` (chunk size, overlap, k, embedding model)
- `prompts/rag.prompt.md` (with `{{context}}` injection)
- `src/retrieval.py`, `src/pipeline.py`
- `eval/rag-eval.py` (RAGAS-compatible eval harness)
**Key parameters**:
- `k` = number of chunks to retrieve (default: 5; increase if recall is low)
- `chunk_size` = 512 words (default; decrease for precise retrieval)
- `chunk_overlap` = 64 words (prevents context boundary splits)
- `rerank` = false by default (enable if recall is important; adds latency)
**When NOT to use**:
- "The context might be long" → use prompt caching on simple chain instead
- Context fits in system prompt → use simple chain with direct injection
- Knowledge changes per-request → dynamic prompt may be more appropriate
---
## Pattern 5: Eval Loop
**When to use**: Quality gate over generated output where a single-pass generation isn't reliable enough. Standalone or composed with any other pattern.
**Structure**:
```
GENERATE(prompt, input)
→ output
→ EVAL(eval_prompt, input, output) ← isolated call
→ {score, pass, feedback}
→ if pass: ACCEPT
→ if fail and attempts < max: REFINE(feedback) → GENERATE again
→ if fail and attempts >= max: ESCALATE
```
**The most important property: strict isolation.** The evaluator has NO knowledge of the generator's internals, chain-of-thought, or intermediate steps.
**Generated artifacts**:
- `prompts/generator.prompt.md`
- `prompts/evaluator.prompt.md` (separate file — never mixed with generator)
- `eval/loop.py` or `eval/loop.ts` (configurable: max_attempts, pass_threshold)
**Configuration**:
- `pass_threshold`: 0.85 (default)
- `max_attempts`: 3 (default)
- `eval_model`: haiku (cheaper than generator; sufficient for scoring)
**Anti-patterns**:
- Evaluator and generator in the same prompt file → isolation violation
- Evaluator receiving chain-of-thought or intermediate steps → isolation violation
- Using the same model as generator for evaluation → increases cost with no benefit
---
## Pattern 6: Dynamic Prompt
**When to use**: Personalized generation, multi-tenant prompts, feature-flagged prompt variants, systematic prompt iteration.
**Structure**:
```
Config + Context → PromptBuilder → Rendered Prompt → [Eval Loop] → Accepted Prompt
```
**Generated artifacts**:
- `prompts/builder.config.yaml` (template blocks, variable schema)
- `prompts/template.prompt.md.j2` (Jinja2 or Handlebars template)
- `src/prompt_builder.py` or `src/prompt_builder.ts`
- `eval/prompt-eval.py`
**Key principle**: The template is code. Version it, test it, review it. A/B testing prompt variants is an explicit use case — not a side effect.
---
## Anti-Pattern Reference
| Anti-Pattern | Signal | Recommended Pattern |
|-------------|--------|---------------------|
| Agentic overkill | "I need an agent that..." for single-step extraction | Simple Chain |
| Tool proliferation | >5 tools in embedded agent | State Machine or split pipeline |
| Framework cargo-cult | "We're using LangChain, so..." | Evaluate if load-bearing; default to clean stub |
| Missing eval | No mention of quality measurement | Add eval loop to any pattern |
| Eval contamination | Evaluator knows generator's reasoning | Strict isolation protocol |
| Infinite loop risk | No exit condition or max_iterations | Embedded Agent with bounds or State Machine |
| RAG for everything | "Context might be large" | Check if prompt caching handles it first |