UNPKG

agent-contracts-runtime

Version:

Runtime bridge for executing agent-contracts workflows on Agent SDKs

771 lines (649 loc) 32.5 kB
implement-llm-feature: description: | Implement LLM-powered commands in a target CLI tool using the agent-contracts + agent-contracts-runtime + cli-contracts stack. ## Implementation Procedure ### Step 1: Project Analysis Read these files from the target project: - cli-contract.yaml existing commands, components/schemas, x-agent usage - package.json dependencies, scripts, bin entries, type:"module" check - tsconfig.json module resolution, paths, outDir - src/ directory structure understand existing command handler pattern Determine: - Which LLM commands to add (audit / propose / explain) - What domain-specific context each command needs - What structured output schema each command returns - Whether the project already has agent-contracts integration ### Step 2: Install Dependencies Add to package.json devDependencies: - agent-contracts: ^0.32.0 - agent-contracts-runtime: ^0.31.0 - zod: ^4.0.0 Add optional peerDependencies for SDK packages: - @anthropic-ai/claude-agent-sdk: >=0.2.0 - @openai/agents: >=0.10.0 - @google/genai: >=2.0.0 ### Step 3: Create DSL Definitions Create directory structure: dsl/ ├── {project}-dsl.yaml ├── agent-runtime.config.yaml ├── agents/ └── {agent-name}.yaml ├── tasks.yaml └── handoff-types.yaml **Main DSL entry** (`dsl/{project}-dsl.yaml`): version: 1 system: id: {project} name: "{Project} Agent System" default_workflow_order: - {workflow-ids} agents: { $ref: "./agents/" } tasks: { $ref: "./tasks.yaml" } handoff_types: { $ref: "./handoff-types.yaml" } guardrails: output-schema-conformance: description: ... scope: { agents: [...], tasks: [...] } rationale: ... guardrail_policies: {domain}-safety-policy: rules: - guardrail: output-schema-conformance severity: critical action: block workflow: {workflow-id}: description: ... trigger: cli-command entry_conditions: [...] steps: - type: delegate task: {task-id} from_agent: {agent-id} **Agent definition** (`dsl/agents/{agent-name}.yaml`): {agent-name}: role_name: "{Role Name}" purpose: >- Domain-specific purpose describing what the agent knows and what it can do that static rules cannot. mode: read-only can_read_artifacts: [] can_write_artifacts: [] can_invoke_agents: [] can_execute_tools: [] can_return_handoffs: [{result-handoff-ids}] responsibilities: [...] constraints: [...] rules: - id: "R-{PREFIX}-001" description: ... severity: mandatory escalation_criteria: - condition: ... action: stop_and_report **Tasks** (`dsl/tasks.yaml`): {task-id}: description: ... target_agent: {agent-name} allowed_from_agents: [{agent-name}] workflow: {workflow-id} input_artifacts: [] invocation_handoff: {request-handoff} result_handoff: {result-handoff} responsibilities: [...] completion_criteria: [...] **Handoff types** (`dsl/handoff-types.yaml`): Use AgentAuditResult shape as the base for result types: {result-handoff}: version: 1 description: ... schema: type: object required: [summary, riskLevel, findings] properties: summary: { type: string } riskLevel: type: string enum: [low, medium, high, critical] findings: type: array items: type: object required: [severity, category, message] properties: id: { type: string } severity: type: string enum: [info, warning, error, critical] category: { type: string } target: { type: string } location: { type: string } message: { type: string } recommendation: { type: string } confidence: { type: number, minimum: 0, maximum: 1 } recommendedActions: type: array items: type: object required: [kind, title] properties: kind: type: string enum: [run_command, edit_file, review, confirm, block, ignore] title: { type: string } command: { type: string } target: { type: string } rationale: { type: string } metadata: type: object properties: tool: { type: string } command: { type: string } version: { type: string } generatedAt: { type: string } adapter: { type: string } model: { type: string } Domain-specific result types add properties alongside the base fields. **Runtime config** (`dsl/agent-runtime.config.yaml`): dsl: ./{project}-dsl.yaml generated_dir: ../src/generated/dsl ### Step 4: Generate TypeScript Contracts Add to package.json scripts: "dsl:generate": "agent-runtime generate --config dsl/agent-runtime.config.yaml" Run: npm run dsl:generate This produces src/generated/dsl/ containing: - agents.ts (AgentContract interface + agentRegistry) - tasks.ts (TaskContract interface + taskRegistry) - workflows.ts (WorkflowContract + workflowRegistry) - handoffs.ts (Zod schemas + handoffSchemas + factory functions) - index.ts (barrel re-exports) ### Step 5: Implement src/agents/ Module **src/agents/types.ts**: import type { {ResultType} } from "../generated/dsl/handoffs.js"; export type TaskId = "{task-1}" | "{task-2}" | ...; export interface AgentConfig { adapter?: string; model?: string; temperature?: number; } export interface AgentOptions { dryRun?: boolean; failOn?: "warning" | "error" | "critical"; } export interface AgentRunResult { taskId: TaskId; data: {ResultType} | null; raw: string; prompt: string; dryRun: boolean; status: "success" | "error" | "escalation" | "validation_error"; errorMessage?: string; followUpsUsed: number; retriesUsed: number; } **src/agents/orchestrator.ts**: export const EXIT_RUNTIME_MISSING = 11; export const EXIT_ADAPTER_ERROR = 12; async function createAdapter(pkg: string, name: string, config: AgentConfig) { switch (name) { case "mock": { const m = await import(`${pkg}/adapters/mock`); return new m.MockAdapter(); } case "claude": { const m = await import(`${pkg}/adapters/claude-agent-sdk`); return new m.ClaudeAgentSdkAdapter({ model: config.model }); } case "openai": { const m = await import(`${pkg}/adapters/openai-agents-sdk`); return new m.OpenAIAgentsSdkAdapter({ model: config.model }); } case "gemini": { const m = await import(`${pkg}/adapters/gemini-sdk`); return new m.GeminiSdkAdapter({ apiKey: process.env.GEMINI_API_KEY, model: config.model }); } } } export async function runAgentWorkflow( userRequest: string, workflowId: WorkflowId, config: AgentConfig, options: AgentOptions, ): Promise<AgentRunResult> { if (options.dryRun) return { workflowId, data:null, raw:"", prompt:userRequest, dryRun:true, status:"success" }; const PKG = "agent-contracts-runtime"; let runWorkflow; try { ({ runWorkflow } = await import(PKG)); } catch { throw Object.assign(new Error("agent-contracts-runtime not installed"), { exitCode: EXIT_RUNTIME_MISSING }); } let registries; try { const dsl = await import("../generated/dsl/index.js"); registries = { agentRegistry: dsl.agentRegistry, taskRegistry: dsl.taskRegistry, handoffSchemas: dsl.handoffSchemas, workflowRegistry: dsl.workflowRegistry }; } catch { registries = {}; } const adapter = await createAdapter(PKG, config.adapter ?? "mock", config); const result = await runWorkflow(adapter, { workflow: workflowId, user_request: userRequest, runtime: { maxFollowUps:3, maxRetries:1 } }, registries); // map result to AgentRunResult } **src/agents/context-builder.ts**: CRITICAL: context-builder builds the USER-SIDE prompt only. Agent instructions (purpose, responsibilities, constraints, rules, evaluation criteria, anti-patterns, output format) are defined in the DSL (dsl/agents/*.yaml) and injected by the runtime via buildTaskPrompt / runWorkflow. NEVER hardcode these in context-builder. Context-builder SHOULD contain: - Domain-specific DATA: file contents, configuration, scan results - Project metadata: target paths, dialect, directory structure - Deterministic pre-analysis: lint results, schema excerpts - Operational context: cwd, timestamps - Brief task framing: "# Migration Safety Audit Request" Context-builder MUST NOT contain: - "## Instructions" / "## Required Patterns" sections - Output format specifications (handled by handoff schema) - Evaluation criteria or checklists - Anti-pattern definitions or rule sets - API reference material - Anything that duplicates DSL agent purpose/responsibilities/rules export function build{Command}Context(target, config): string { const sections = []; sections.push("# {Command} Request"); sections.push(`## Target\n\n${target}`); // Domain data only no instructions here. // Agent instructions come from DSL via buildTaskPrompt. // Cap total context at 16KB return sections.join("\n\n"); } **src/agents/formatter.ts**: export function computeExitCode(result, options): number { if (result.dryRun) return 0; if (result.status !== "success") return 1; const order = ["info","warning","error","critical"]; const threshold = order.indexOf(options.failOn ?? "error"); return result.data.findings.some(f => order.indexOf(f.severity) >= threshold) ? 10 : 0; } export function formatResultText(result): string { /* severity icons, structured output */ } export function formatResultJson(result): string { return JSON.stringify(result.data, null, 2); } ### Step 6: Create CLI Command Handlers For each LLM command, create src/commands/{command}.ts: export async function command{Name}(target, opts) { const cwd = process.cwd(); const context = build{Name}Context(target, config); const agentConfig = { adapter: opts.adapter, model: opts.model, cwd }; const result = await runAgentWorkflow(context, "{workflow-id}", agentConfig, agentOpts); if (opts.reportFormat === "json") console.log(formatResultJson(result)); else console.log(formatResultText(result)); process.exit(computeExitCode(result, opts)); } Wire handlers through cli-contracts' generated program: import { createProgram, type CommandHandlers } from './generated/program.js'; const handlers: CommandHandlers = { {commandName}: async (target, opts) => { await command{Name}(target, opts); }, // ... other handlers }; createProgram(handlers, version).parse(); ### Step 7: Update cli-contract.yaml Add each LLM command with standard options and x-agent metadata: {command-name}: summary: "{Command summary}" arguments: [...] options: - name: adapter schema: { type: string, enum: [mock, claude, openai, gemini] } - name: model schema: { type: string } - name: show-prompt schema: { type: boolean, default: false } - name: fail-on schema: { type: string, enum: [warning, error, critical], default: error } - name: output aliases: [o] schema: { type: string } file: { mode: write } - name: report-format schema: { type: string, enum: [json, text, yaml], default: json } x-agent: riskLevel: low requiresConfirmation: false idempotent: true sideEffects: [network] sideEffectNote: >- Network calls to LLM provider when adapter is not mock. Filesystem write only when --output is specified. safeDryRunOption: show-prompt expectedDurationMs: 120000 retryableExitCodes: [1, 12] exits: '0': { description: "No blocking findings.", stdout: { format: json, schema: { $ref: '...' } } } '1': { description: "General error.", stderr: { format: text } } '3': { description: "Input validation failed.", stderr: { format: json } } '10': { description: "Blocking findings detected.", stdout: { format: json, schema: { $ref: '...' } } } '11': { description: "Runtime not installed.", stderr: { format: json } } '12': { description: "Adapter error.", stderr: { format: json } } Add AgentAuditResult, AgentFinding, AgentRecommendedAction, AgentEvidence to components/schemas if not already present. ### Step 8: Update README Add "Agent-Native Toolchain" section after mechanical features: - Intro: tool encapsulates domain-specific semantic reasoning - Deterministic checks first, then semantic audit - Structured findings (AgentAuditResult/AgentFinding) - LLM Adapter Configuration table - Add agent-contracts-runtime, agent-contracts, cli-contracts to Technology Stack target_agent: llm-feature-implementer allowed_from_agents: - llm-feature-implementer workflow: implement-llm-feature input_artifacts: [] invocation_handoff: implement-request result_handoff: implementation-result responsibilities: - Analyze the target project and determine the right LLM commands to add - Create all DSL definition files with correct cross-references - Generate TypeScript contracts via agent-runtime generate - Implement the complete src/agents/ module following the exact patterns - Create CLI command handlers integrating with the project's existing CLI structure - Update cli-contract.yaml with proper x-agent metadata and component schemas - Ensure the implementation compiles and --show-prompt mode works completion_criteria: - dsl/ directory contains valid DSL definitions that pass agent-runtime generate - agent-runtime.config.yaml points to the DSL with correct generated_dir - src/generated/dsl/ contains generated contracts (agentRegistry, taskRegistry, handoffSchemas, workflowRegistry) - src/agents/orchestrator.ts uses runWorkflow from agent-contracts-runtime (never runTask directly, no direct adapter.send calls) - src/agents/orchestrator.ts dynamic-imports the runtime and adapters with correct exit codes (11, 12) - src/agents/context-builder.ts contains only domain data (file contents, config, scan results) no hardcoded agent instructions, evaluation criteria, or output format specs - src/agents/context-builder.ts caps input at 16KB - src/agents/formatter.ts implements computeExitCode with the standard exit code mapping - src/commands/ has a handler for each new LLM command, passing cwd to AgentConfig - CLI entry point uses cli-contracts' createProgram(handlers, version) with CommandHandlers interface - cli-contract.yaml has all new commands with --adapter, --model, --show-prompt, --fail-on, --output, --report-format options - cli-contract.yaml has x-agent metadata on each new command (riskLevel, safeDryRunOption, expectedDurationMs) - components/schemas includes AgentAuditResult and related schemas - All generated TypeScript compiles without type errors - --show-prompt returns the constructed prompt without calling the LLM # ============================================================================= # Audit tasks — parallelized by layer # ============================================================================= audit-dsl: description: | Audit the DSL layer of an LLM command integration. Verify these files exist and are well-formed: - dsl/{project}-dsl.yaml must have version:1, system:{id,name}, agents/tasks/handoff_types via $ref - dsl/agents/*.yaml each agent has role_name, purpose, mode, can_read_artifacts:[], can_write_artifacts:[], responsibilities, constraints, rules with R-PREFIX-NNN - dsl/tasks.yaml target_agent, workflow, result_handoff all cross-reference valid IDs - dsl/handoff-types.yaml version:1, schema with JSON Schema, result types follow AgentAuditResult shape - dsl/agent-runtime.config.yaml dsl and generated_dir fields Severity classification: - critical: missing version:1 or system block (generator will fail) - critical: task references non-existent agent or workflow - critical: agent mode contradicts its purpose (see below) - error: agent missing constraints or rules - error: handoff schema not following AgentAuditResult shape - warning: missing guardrails or guardrail_policies - warning: agent has no escalation_criteria CRITICAL: agent mode-purpose coherence (category: dsl-agent): - Agent with mode: read-only has purpose or responsibilities that describe file writing, editing, creation, or code generation the agent's adapter tools won't include Write/Edit so the stated purpose is impossible to fulfil - Agent with mode: read-write has purpose that is purely analytical (audit, review, explain) with no file output unnecessarily broad permissions Category vocabulary: dsl-structure, dsl-agent, dsl-task, dsl-handoff, dsl-workflow, dsl-guardrail. Output a partial audit-result with findings only from the DSL layer. target_agent: implementation-auditor allowed_from_agents: - implementation-auditor workflow: audit-implementation input_artifacts: [] invocation_handoff: audit-request result_handoff: audit-result responsibilities: - Check all DSL files for structural correctness - Validate cross-references (agent↔task↔workflow↔handoff) - Verify guardrails and guardrail_policies - Check agent mode-purpose coherence (read-only agents must not have write-oriented purposes; read-write agents must justify their write permissions in their purpose) completion_criteria: - Every dsl/* file checked for required fields - Cross-references validated - Agent mode is consistent with stated purpose and responsibilities - Findings use dsl-* category vocabulary audit-runtime: description: | Audit the runtime integration layer of an LLM command integration. Scan src/agents/orchestrator.ts (or equivalent runtime module): CRITICAL violations (category: runtime-orchestrator): - adapter.send() or adapter.sendExecution() called directly - adapter.followUp() called directly - Custom adapter class defined (not imported from runtime) - runWorkflow not imported from agent-contracts-runtime - runTask used instead of runWorkflow consumer projects must use runWorkflow() which orchestrates through the DSL workflow DAG, activates workflow-level plugin hooks (beforeWorkflow/ afterWorkflow), handles gate steps and step dependencies. runTask() is a low-level internal API; calling it directly bypasses workflow orchestration entirely ERROR violations (category: runtime-registry): - Generated registries not imported from src/generated/dsl/ - Registries not passed to runWorkflow options - Hard-coded agent/task definitions instead of registry lookup ERROR violations (category: runtime-adapter): - Adapter packages imported statically (not dynamic import) - Missing exit code 11 when runtime import fails - Missing exit code 12 when adapter creation fails ERROR violations (category: runtime-handler): - Project has cli-contracts generated code (src/generated/program.ts or src/generated/cli/program.ts exporting CommandHandlers and createProgram) but CLI entry point manually creates Commander commands instead of using createProgram(handlers, version) - LLM command handlers are not wired through the generated CommandHandlers interface handler signatures drift from the cli-contract.yaml definition without type errors - CLI entry point calls runTask directly instead of going through the command handler context-builder orchestrator chain WARNING violations (category: runtime-plugin): - Plugin hooks explicitly bypassed or ignored Category vocabulary: runtime-orchestrator, runtime-registry, runtime-adapter, runtime-handler, runtime-plugin. Output a partial audit-result with findings only from the runtime layer. target_agent: implementation-auditor allowed_from_agents: - implementation-auditor workflow: audit-implementation input_artifacts: [] invocation_handoff: audit-request result_handoff: audit-result responsibilities: - Scan orchestrator code for direct adapter usage - Verify runWorkflow is used (not runTask) for LLM command execution - Verify registry imports and passing - Check adapter creation patterns and exit codes - Verify cli-contracts handler wiring (CommandHandlers + createProgram used in CLI entry point) completion_criteria: - All orchestrator files scanned for adapter.send / adapter.followUp - No runTask usage found (runWorkflow must be used instead) - Registry usage verified - Exit code 11/12 patterns checked - cli-contracts handler integration verified (CommandHandlers interface used, createProgram called) audit-cli-contract: description: | Audit the CLI contract layer of an LLM command integration. Parse cli-contract.yaml and check each LLM command: ERROR: missing standard options (category: cli-contract-options): - --adapter, --model, --show-prompt, --fail-on, --output, --report-format - Using --format instead of --report-format for LLM commands ERROR: missing x-agent metadata (category: cli-contract-xagent): - No x-agent block on LLM command - Missing riskLevel, safeDryRunOption, or expectedDurationMs - Using snake_case instead of camelCase for x-agent fields ERROR: wrong exit codes (category: cli-contract-exits): - Missing exit 0/1/3/10/11/12 definitions - Exit 10 not defined for finding-based commands WARNING: missing schemas (category: cli-contract-schema): - AgentAuditResult not in components/schemas - AgentFinding, AgentRecommendedAction, AgentEvidence not defined Category vocabulary: cli-contract-options, cli-contract-xagent, cli-contract-exits, cli-contract-schema. Output a partial audit-result with findings only from the CLI contract layer. target_agent: implementation-auditor allowed_from_agents: - implementation-auditor workflow: audit-implementation input_artifacts: [] invocation_handoff: audit-request result_handoff: audit-result responsibilities: - Check each LLM command for standard options - Verify x-agent metadata fields and casing - Validate exit code definitions - Check components/schemas for canonical types completion_criteria: - Every LLM command checked for all 6 standard options - x-agent metadata validated on every LLM command - Exit codes verified against convention - Schema completeness checked audit-architecture: description: | Audit the architecture and module structure of an LLM command integration. Verify module structure (category: architecture-modules): - src/agents/types.ts exists with TaskId, AgentConfig, AgentOptions, AgentRunResult - src/agents/orchestrator.ts exists - src/agents/context-builder.ts exists - src/agents/formatter.ts exists with computeExitCode Verify behavior (category: architecture-dry-run): - --show-prompt path returns the constructed prompt without calling the adapter - buildTaskPrompt is used (not a placeholder string) CRITICAL: context-builder SSoT violations (category: architecture-context-builder): The context-builder's role is to assemble domain-specific DATA (file contents, configuration, scan results, schema context) into the user prompt. Agent INSTRUCTIONS (how to evaluate, what rules to apply, what patterns to look for, output format) belong in the DSL agent definition and are injected by buildTaskPrompt / runWorkflow. Detect any of the following in context-builder source code: - Sections titled "## Instructions", "## Required Patterns", "## Audit Instructions", "## Output Format", or similar behavioral directive headings these are agent instructions that belong in DSL (agent purpose, responsibilities, constraints, rules) and must not be hardcoded in context-builder - Output format specifications (field lists, JSON/YAML schema descriptions, example output structures) buildTaskPrompt already generates "Required Output Format" from the handoff Zod schema; duplicating this in context-builder causes conflicts - Domain-specific evaluation criteria, checklists, anti-pattern definitions, rule sets, or API reference material these belong in the DSL agent definition's purpose field - Any string that duplicates or paraphrases content already present in the DSL agent's purpose, responsibilities, constraints, or rules fields What IS allowed in context-builder: - File contents read from disk (migration SQL, schema, config) - Project metadata (dialect, directories, file lists) - Deterministic pre-analysis results (lint violations, scan output) - Operational context (target path, cwd, timestamp) - Brief task framing ("# Migration Safety Audit Request") ERROR: mode-purpose coherence (category: architecture-agent-coherence): - DSL agent with mode: read-only has purpose or responsibilities that describe file writing, editing, creation, or modification - DSL agent with mode: read-write but no cwd propagated to adapter the agent's file tools won't operate in the project directory - Command handler doesn't pass cwd to AgentConfig when using an agentic adapter (claude) ERROR: adapter cwd propagation (category: architecture-adapter-cwd): - AgentConfig type has no cwd field - Command handlers don't set config.cwd = process.cwd() or resolve(projectDir) - createAdapter doesn't pass cwd to adapter constructor/factory - Agentic adapters (claude) instantiated without cwd their file tools will operate in an undefined directory CRITICAL: runTask usage (category: architecture-handler): - runTask() imported or called anywhere in the consumer project consumer projects must use runWorkflow() exclusively; runTask() is a low-level internal API that bypasses workflow DAG orchestration, workflow-level plugin hooks, and gate steps ERROR: cli-contracts handler wiring (category: architecture-handler): - Project has cli-contracts generated code (src/generated/program.ts or src/generated/cli/program.ts with CommandHandlers) but CLI entry point manually creates Commander commands instead of calling createProgram(handlers, version) - Command handlers bypass the generated CommandHandlers interface Scan for direct SDK imports (category: architecture-modules): - @anthropic-ai/sdk or @anthropic-ai/claude-agent-sdk imported outside src/agents/orchestrator.ts or adapter files - openai or @openai/agents imported outside adapters - @google/genai imported outside adapters Category vocabulary: architecture-modules, architecture-dry-run, architecture-context-builder, architecture-agent-coherence, architecture-adapter-cwd, architecture-handler, architecture-formatter. Output a partial audit-result with findings only from the architecture layer. target_agent: implementation-auditor allowed_from_agents: - implementation-auditor workflow: audit-implementation input_artifacts: [] invocation_handoff: audit-request result_handoff: audit-result responsibilities: - Verify module structure - Check dry-run behavior - Audit context-builder for SSoT violations (hardcoded instructions, output format specs, domain rules that belong in DSL) - Check mode-purpose coherence across DSL definitions and runtime code - Verify cwd propagation from command handler through adapter config - Verify cli-contracts handler wiring (CommandHandlers + createProgram) - Scan for direct SDK imports completion_criteria: - Module structure verified - Dry-run behavior confirmed - Context-builder contains only domain data, not agent instructions - Agent mode is consistent with purpose and tool usage - cwd is propagated to agentic adapters - cli-contracts handler integration verified - No direct SDK imports outside adapter files # ============================================================================= # Audit merge — aggregates parallel audit results # ============================================================================= audit-merge: description: | Merge the partial audit results from the four parallel layer audits (DSL, runtime, CLI contract, architecture) into a single unified audit-result. The prior_context contains the JSON output from each completed audit step. Parse each one, combine all findings into a single array, deduplicate by (target + message), assign sequential IDs (F-001, F-002, ...), derive the overall riskLevel from the worst finding severity, and produce prioritized recommendedActions. Merging rules: - Collect all findings from all layer results - Sort: critical first, then error, warning, info - Assign sequential IDs: F-001, F-002, ... - Deduplicate: if two findings have the same target and message, keep one - Derive riskLevel: any critical critical, any error high, any warning medium, info only low - Merge recommendedActions from all layers, deduplicate by title - Set metadata: { tool: "agent-runtime", command: "audit-implementation" } target_agent: implementation-auditor allowed_from_agents: - implementation-auditor workflow: audit-implementation input_artifacts: [] invocation_handoff: audit-request result_handoff: audit-result responsibilities: - Parse prior_context JSON from each layer audit step - Merge and deduplicate findings - Derive overall riskLevel - Produce a single coherent summary paragraph completion_criteria: - All findings from all layers included - No duplicate findings - riskLevel correctly derived from worst finding - Summary paragraph covers all audited layers