aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

aiwg.io

jmagly/aiwg

207 lines (151 loc) • 7.23 kB

Markdown

# Pipeline Patterns Six canonical patterns for LLM inference pipelines. Start with the simplest one that meets your requirements. --- ## Decision Guide Apply in order — stop at the first match: ``` Tool use + dynamic branching? → Embedded Agent Explicit states + error recovery + compliance auditability? → State Machine External document retrieval required? → RAG Pipeline Runtime prompt assembly (multi-tenant, feature flags)? → Dynamic Prompt Quality gate on output only (no multi-step pipeline)? → Eval Loop (standalone) Everything else: → Simple Chain ← DEFAULT ``` **Bias toward Simple Chain.** It handles ≥70% of standard use cases. Complexity is a cost — in code, in ops, in debugging. --- ## Pattern 1: Simple Chain **When to use**: Single-responsibility tasks, high-volume inference, latency-sensitive paths, cost-constrained deployments. **Structure**: ``` Input → [Prompt A] → LLM → Parse → [Prompt B] → LLM → Output ``` **Generated artifacts**: - `prompts/step-a.prompt.md`, `prompts/step-b.prompt.md` - `pipeline.config.yaml` - `src/pipeline.py` or `src/pipeline.ts` - `eval/cases.jsonl`, `eval/eval.py` - `cost-estimate.md` **Anti-patterns to avoid**: - Adding agent loop "just in case" - Using sonnet when haiku passes eval at >85% - Adding framework dependencies not needed for the task --- ## Pattern 2: Embedded Agent **When to use**: Routing decisions, structured extraction with ambiguity, tool-gated tasks needing retry logic but not full autonomy. **The embedded agent is a component in a flow — not the flow itself.** Key constraints: - ≤5 tools - Bounded iterations (max_iterations required — no infinite loops) - Deterministic exit conditions - Falls back to `escalate` if exit condition not met **Structure**: ``` Flow → [Agent: classify + route] → Flow Flow → [Agent: extract structured data] → Flow ``` **Generated artifacts**: - `prompts/system.prompt.md` (scoped system prompt) - `tools/` (typed tool definitions) - `pipeline.config.yaml` (with `agent_config`) - `src/agent.py` or `src/agent.ts` **When NOT to use**: - More than 5 tools needed → redesign as State Machine or pipeline steps - Unbounded iteration required → use Ralph loop (development) or State Machine (production) - Full autonomy needed → use AIWG-style agents, not embedded agent --- ## Pattern 3: State Machine **When to use**: Document processing pipelines, multi-stage classification, workflows with explicit retry/escalation logic, compliance-critical flows where state must be auditable. **Structure**: ``` INIT → EXTRACT → VALIDATE → [PASS → ENRICH → OUTPUT] | [FAIL → RETRY(n) → ESCALATE] ``` **Generated artifacts**: - `fsm.config.yaml` (states, transitions, guards) - `prompts/` (one prompt file per LLM state) - `src/pipeline.py` or `src/pipeline.ts` (FSM runtime) - `audit/transitions.jsonl` (append-only audit log) **Key principles**: - Every state has a defined type (`llm`, `transform`, `decision`, `terminal`, `escalate`) - Every transition has a guard condition - Terminal states have explicit outcomes: `accept`, `reject`, `escalate` - Max retries defined — no infinite loops - Audit log captures all transitions **When NOT to use**: - Simple sequential steps with no branching → Simple Chain - Need for unbounded tool use → Embedded Agent or full agent system --- ## Pattern 4: RAG Pipeline **When to use**: Knowledge base Q&A, document-grounded generation, any case where the LLM needs external context it cannot have in the system prompt. **Structure**: ``` Query → Embed → Retrieve(k) → Rerank(optional) → [Context + Query → Prompt] → LLM → Response ``` **Generated artifacts**: - `retrieval.config.yaml` (chunk size, overlap, k, embedding model) - `prompts/rag.prompt.md` (with `{{context}}` injection) - `src/retrieval.py`, `src/pipeline.py` - `eval/rag-eval.py` (RAGAS-compatible eval harness) **Key parameters**: - `k` = number of chunks to retrieve (default: 5; increase if recall is low) - `chunk_size` = 512 words (default; decrease for precise retrieval) - `chunk_overlap` = 64 words (prevents context boundary splits) - `rerank` = false by default (enable if recall is important; adds latency) **When NOT to use**: - "The context might be long" → use prompt caching on simple chain instead - Context fits in system prompt → use simple chain with direct injection - Knowledge changes per-request → dynamic prompt may be more appropriate --- ## Pattern 5: Eval Loop **When to use**: Quality gate over generated output where a single-pass generation isn't reliable enough. Standalone or composed with any other pattern. **Structure**: ``` GENERATE(prompt, input) → output → EVAL(eval_prompt, input, output) ← isolated call → {score, pass, feedback} → if pass: ACCEPT → if fail and attempts < max: REFINE(feedback) → GENERATE again → if fail and attempts >= max: ESCALATE ``` **The most important property: strict isolation.** The evaluator has NO knowledge of the generator's internals, chain-of-thought, or intermediate steps. **Generated artifacts**: - `prompts/generator.prompt.md` - `prompts/evaluator.prompt.md` (separate file — never mixed with generator) - `eval/loop.py` or `eval/loop.ts` (configurable: max_attempts, pass_threshold) **Configuration**: - `pass_threshold`: 0.85 (default) - `max_attempts`: 3 (default) - `eval_model`: haiku (cheaper than generator; sufficient for scoring) **Anti-patterns**: - Evaluator and generator in the same prompt file → isolation violation - Evaluator receiving chain-of-thought or intermediate steps → isolation violation - Using the same model as generator for evaluation → increases cost with no benefit --- ## Pattern 6: Dynamic Prompt **When to use**: Personalized generation, multi-tenant prompts, feature-flagged prompt variants, systematic prompt iteration. **Structure**: ``` Config + Context → PromptBuilder → Rendered Prompt → [Eval Loop] → Accepted Prompt ``` **Generated artifacts**: - `prompts/builder.config.yaml` (template blocks, variable schema) - `prompts/template.prompt.md.j2` (Jinja2 or Handlebars template) - `src/prompt_builder.py` or `src/prompt_builder.ts` - `eval/prompt-eval.py` **Key principle**: The template is code. Version it, test it, review it. A/B testing prompt variants is an explicit use case — not a side effect. --- ## Anti-Pattern Reference | Anti-Pattern | Signal | Recommended Pattern | |-------------|--------|---------------------| | Agentic overkill | "I need an agent that..." for single-step extraction | Simple Chain | | Tool proliferation | >5 tools in embedded agent | State Machine or split pipeline | | Framework cargo-cult | "We're using LangChain, so..." | Evaluate if load-bearing; default to clean stub | | Missing eval | No mention of quality measurement | Add eval loop to any pattern | | Eval contamination | Evaluator knows generator's reasoning | Strict isolation protocol | | Infinite loop risk | No exit condition or max_iterations | Embedded Agent with bounds or State Machine | | RAG for everything | "Context might be large" | Check if prompt caching handles it first |