UNPKG

agent-contracts-runtime

Version:

Runtime bridge for executing agent-contracts workflows on Agent SDKs

1,056 lines (813 loc) 39.5 kB
# agent-contracts-runtime Runtime bridge for executing [`agent-contracts`](https://github.com/foo-log-inc/agent-contracts) agent teams from TypeScript programs. ## Built-in Agents: Implement and Audit `agent-contracts-runtime` ships with two built-in agents that add LLM-powered commands to any CLI tool using the three-layer stack (`agent-contracts` + `agent-contracts-runtime` + `cli-contracts`). ### `agent-runtime implement` — Let the agent build it Instead of hand-writing the integration code yourself, describe what you want and let the implementation agent do it: ```bash agent-runtime implement \ --adapter claude \ --project-dir ./my-tool \ "Add an audit command for migration safety and a propose command for SQL optimization" ``` The agent reads your project, then generates: - DSL definitions (`dsl/agents/`, `dsl/tasks.yaml`, `dsl/handoff-types.yaml`, workflows, guardrails) - TypeScript contracts via `agent-runtime generate` - Runtime integration (`src/agents/orchestrator.ts`, `context-builder.ts`, `formatter.ts`) - CLI command handlers - Updated `cli-contract.yaml` with `x-agent` metadata and `components/schemas` The agent is a domain expert in the three-layer stack. It knows the correct patterns, exit code conventions, adapter usage, and registry wiring. It will not call LLM APIs directly — all invocations go through `runTask`/`runWorkflow`. ```bash agent-runtime implement --dry-run "Add an audit command" # Preview the prompt agent-runtime implement --report-format json "..." # JSON output agent-runtime implement --model claude-sonnet-4 "..." # Specify model ``` ### `agent-runtime audit-implementation` — Verify the integration After implementation (by the agent or by hand), audit the result: ```bash agent-runtime audit-implementation --adapter claude ./my-tool ``` The auditor checks all three layers in parallel: | Layer | What it checks | |-------|----------------| | **DSL** | version:1, system block, agent/task/workflow cross-references, guardrails | | **Runtime** | `runTask`/`runWorkflow` usage (no `adapter.send()`), dynamic imports, exit codes 11/12, registry passing | | **CLI contract** | Standard options (--adapter, --model, --dry-run, --fail-on, --output, --report-format), x-agent metadata, exit codes | | **Architecture** | Module structure, dry-run behavior, context size cap, no direct SDK imports | The four layer checks run concurrently via the DAG workflow scheduler, then merge into a single structured report: ```bash agent-runtime audit-implementation --focus dsl ./my-tool # Check one layer agent-runtime audit-implementation --fail-on warning ./my-tool # Strict mode agent-runtime audit-implementation --report-format json ./my-tool # JSON output agent-runtime audit-implementation --output report.json ./my-tool # Write to file ``` Exit codes: `0` = no findings above threshold, `10` = findings detected, `12` = adapter error. ### Why use the built-in agents? Hand-writing the integration code for `agent-contracts` + `agent-contracts-runtime` + `cli-contracts` is error-prone. Common mistakes include: - Calling `adapter.send()` directly instead of `runTask()`/`runWorkflow()` - Missing standard CLI options or exit codes - Forgetting `x-agent` metadata - Incorrect adapter construction (e.g., `new ClaudeAgentSdkAdapter()` instead of `createAdapter("claude")`) - Static imports instead of dynamic imports for graceful degradation The implementation agent has all of these patterns embedded in its DSL definition. The audit agent detects violations. Together they form a closed loop: implement correctly, then verify. --- ## Overview `agent-contracts` defines business-level agent behavior in DSL: agent roles, task boundaries, workflows, handoff schemas, validations, and guardrails. `agent-contracts-runtime` makes those DSL-defined workflows callable from ordinary TypeScript code. It sits between your program and Agent SDKs: ``` TypeScript program ↓ WorkflowInvocation ↓ agent-contracts-runtime ↓ DSL-derived contracts ↓ Agent SDK adapter ↓ Claude / OpenAI Agents SDK / Google ADK / custom runner ``` Agent SDKs execute LLM agents. `agent-contracts-runtime` lets application code invoke reusable, DSL-defined agent workflows without hand-writing SDK-specific orchestration for each workflow. ## Install ```bash npm install agent-contracts-runtime npm install -D agent-contracts ``` ## Quick start ```bash # 1. Add LLM commands to your CLI tool (the agent does the work) agent-runtime implement --adapter claude --project-dir ./my-tool \ "Add an audit command for code quality" # 2. Audit the generated integration agent-runtime audit-implementation --adapter claude ./my-tool # 3. Or set up manually: initialize, generate, run agent-runtime init agent-runtime generate agent-runtime run feature-implement "Add login endpoint with JWT" agent-runtime doctor ``` ## How it works ``` agent-contracts.yaml (DSL) bindings/runtime.yaml (guardrail impl) │ │ ▼ ▼ agent-runtime generate ──────────────┘ │ ├── agent/generated/agents.ts (AgentContract + registry) ├── agent/generated/tasks.ts (TaskContract + registry) ├── agent/generated/workflows.ts (WorkflowContract + registry) ├── agent/generated/handoffs.ts (Zod schemas + registry + factories) ├── agent/generated/index.ts (barrel re-export) ├── agent/generated/hooks/guardrails.ts (check functions) ├── agent/generated/hooks/index.ts (unified hook adapter) └── agent/generated/.manifest.json (DSL hash, metadata) agent/src/ (user plugins, project guardrails) │ ▼ agent-runtime run <workflow> <request> │ ├── Plugin: beforeWorkflow ├── For each step: │ ├── Plugin: beforeTask │ ├── Plugin: contextEnhancer (enrich structured context) │ ├── Plugin: promptBuilder (full override) or default buildTaskPrompt │ ├── Plugin: promptEnhancer (post-process) │ ├── SDK Adapter: send prompt to LLM │ ├── Extract structured result (YAML/JSON) │ ├── Validate against Zod handoff schema │ ├── followUp (lightweight, same session) on validation error │ ├── retry (heavyweight, new session) on persistent failure │ ├── Plugin: afterTask │ └── decideRetryStrategy callback for custom recovery logic └── Plugin: afterWorkflow ``` ### Layer separation | Layer | Path | Owner | Description | |-------|------|-------|-------------| | **Generated** | `agent/generated/` | Auto-generated | DSL-derived contracts, handoff factories, and hooks. Never edit manually. | | **User code** | `agent/src/` | You | Plugins, project guardrails, interceptors. | | **Runtime** | `node_modules/agent-contracts-runtime/` | npm package | Workflow runner, task runner, SDK adapters, generator. | ## Generated contracts The `generate` command reads your `agent-contracts.yaml` DSL and optional binding YAML files, then produces TypeScript contracts and guardrail hooks using Handlebars templates. ### Two-phase generation | Phase | Input | Output | Description | |-------|-------|--------|-------------| | **1. Contracts** | DSL (`agents`, `tasks`, `workflow`, `handoff_types`) | `agents.ts`, `tasks.ts`, `workflows.ts`, `handoffs.ts`, `index.ts` | Typed contract interfaces, registries, and handoff factories | | **2. Guardrails** | DSL guardrails + binding `guardrail_impl` + active policy | `hooks/guardrails.ts`, `hooks/index.ts` | Check functions for command, file path, file content | Phase 2 requires a binding YAML (following the `agent-contracts` `SoftwareBinding` schema). If no bindings are configured, guardrail hooks are skipped. ### Handoff factories In addition to Zod schemas and type aliases, `handoffs.ts` generates type-safe factory functions for each handoff type: ```typescript import { handoffs } from "./agent/generated"; const envelope = handoffs.featureImplementationRequest({ objective: "Add login endpoint with JWT", inputs: { repository: "." }, expectedOutputs: ["implementation-diff"], completionCriteria: ["tests_passed"], }); // => { type: "feature-implementation-request", version: 1, payload: { ... } } ``` The factory validates the payload against the Zod schema at construction time, catching type errors before the workflow starts. YAML invocation files use the DSL-defined field names (snake_case). Generated TypeScript factories and APIs use camelCase. ### Custom templates Override any built-in Handlebars template by placing a file with the same name in a custom templates directory: ```bash agent-runtime generate --templates ./my-templates ``` Or set `templates_dir` in `agent-runtime.config.yaml`. ### Programmatic generation API ```typescript import { generate, checkFreshness } from "agent-contracts-runtime/generator"; const result = await generate({ configPath: "./agent-runtime.config.yaml", clean: true, }); console.log(result.files_generated); const isFresh = await checkFreshness("./agent-runtime.config.yaml"); ``` ## Programmatic API ### High-level APIs (`executeTask` / `executeWorkflow`) One-call convenience APIs that handle adapter creation, DSL loading, guardrails, progress logging, and retry policy: ```typescript import { executeTask, executeWorkflow } from "agent-contracts-runtime"; const result = await executeTask("audit-code", { adapter: "claude", dsl: resolvedDsl, request: "Audit the authentication module", logFile: "./logs/audit.log", }); const wfResult = await executeWorkflow("feature-implement", { adapter: "openai", model: "gpt-4.1", dsl: resolvedDsl, request: "Add login endpoint with JWT", context: { cwd: process.cwd() }, }); ``` ### `createAdapter` factory ```typescript import { createAdapter } from "agent-contracts-runtime"; const adapter = await createAdapter("claude", { model: "claude-sonnet-4-20250514", cwd: process.cwd(), guardrailHooks, }); ``` Accepted adapter names: `mock`, `claude`, `openai`, `gemini`. The runtime provides three API levels: | API | Use case | Input | |-----|----------|-------| | **Simple API** | Ad-hoc CLI-style invocation | `user_request: string` | | **Structured API** | Application integration, CI | `WorkflowInvocation` with typed handoff | | **Builder API** | Fluent programmatic usage | `createRuntime` → chained methods | ### WorkflowInvocation All API levels resolve to the same internal model: ```typescript type WorkflowInvocation = { workflow: string; handoff: { type: string; version?: number; payload: unknown; }; runtime?: { maxFollowUps?: number; maxRetries?: number; dryRun?: boolean; readonly?: boolean; }; hooks?: { onStepComplete?: (event: StepCompleteEvent) => void; onGate?: (gateKind: string, description: string) => Promise<boolean>; decideRetryStrategy?: ( outcome: TaskOutcome, attempt: number ) => Promise<"follow_up" | "retry" | "abort">; }; context?: { variables?: Record<string, unknown>; artifacts?: Record<string, string>; }; }; ``` ### Simple API For quick scripts and CLI-compatible usage: ```typescript import { runWorkflow, createAdapter } from "agent-contracts-runtime"; const adapter = await createAdapter("claude", { model: "claude-sonnet-4-20250514", cwd: process.cwd(), }); const result = await runWorkflow(adapter, "feature-implement", { user_request: "Add login endpoint with JWT", maxFollowUps: 3, maxRetries: 1, }); console.log(`${result.workflow_id}: ${result.status} (${result.total_elapsed_ms}ms)`); ``` Or with the Claude adapter: ```typescript import { runWorkflow } from "agent-contracts-runtime"; import { ClaudeAgentSdkAdapter } from "agent-contracts-runtime/adapters/claude-agent-sdk"; const adapter = new ClaudeAgentSdkAdapter({ cwd: process.cwd(), }); const result = await runWorkflow(adapter, "feature-implement", { user_request: "Add login endpoint with JWT", maxFollowUps: 3, maxRetries: 1, }); ``` Or with the OpenAI Agents SDK adapter: ```typescript import { runWorkflow } from "agent-contracts-runtime"; import { OpenAIAgentsSdkAdapter } from "agent-contracts-runtime/adapters/openai-agents-sdk"; const adapter = new OpenAIAgentsSdkAdapter({ model: "gpt-4.1", }); const result = await runWorkflow(adapter, "feature-implement", { user_request: "Add login endpoint with JWT", maxFollowUps: 3, maxRetries: 1, }); ``` For production integration, prefer the Structured API or Builder API so that input handoffs are validated before execution. ### Structured API Use `WorkflowInvocation` for typed handoff input with Zod validation on both input and output: ```typescript import { runWorkflow } from "agent-contracts-runtime"; import { handoffs } from "./agent/generated"; const result = await runWorkflow(adapter, { workflow: "feature-implement", handoff: handoffs.featureImplementationRequest({ objective: "Add login endpoint with JWT", inputs: { repository: ".", apiSpec: "./docs/openapi.yaml", }, constraints: { allowedPaths: ["src/**", "test/**"], deniedPaths: ["infra/**"], }, expectedOutputs: ["implementation-diff", "test-report"], completionCriteria: ["tests_passed", "review_approved"], }), runtime: { maxFollowUps: 3, maxRetries: 1, }, hooks: { onGate: async (gateKind, description) => true, }, }); ``` The `WorkflowInvocation` envelope is SDK-independent. SDK-specific options belong on the adapter constructor: ```typescript const adapter = await createAdapter("claude", { model: "claude-sonnet-4-20250514", cwd: process.cwd(), }); ``` ### Builder API For fluent programmatic usage: ```typescript import { createRuntime, createAdapter } from "agent-contracts-runtime"; import { handoffs } from "./agent/generated"; const adapter = await createAdapter("claude", { model: "claude-sonnet-4-20250514", cwd: process.cwd(), }); const runtime = createRuntime({ adapter }); const result = await runtime .workflow("feature-implement") .handoff(handoffs.featureImplementationRequest({ objective: "Add login endpoint with JWT", inputs: { repository: "." }, expectedOutputs: ["implementation-diff"], completionCriteria: ["tests_passed"], })) .maxFollowUps(3) .maxRetries(1) .onStepComplete((event) => { console.log(event.task_id, event.outcome_status); }) .onGate(async () => true) .run(); ``` The builder can also accept a plain string for quick usage: ```typescript const result = await runtime .workflow("feature-implement") .request("Add login endpoint with JWT") .run(); ``` ### Run a single task ```typescript import { runTask } from "agent-contracts-runtime"; const result = await runTask(adapter, "run-tests", { user_request: "Run all tests and report results", }); if (result.outcome.status === "success") { console.log("Completed:", result.outcome.data); } else if (result.outcome.status === "validation_error") { console.error("Schema mismatch — followUps used:", result.follow_ups_used); } else if (result.outcome.status === "escalation") { console.warn("Escalation:", result.outcome.reason); } ``` ### Workflow result ```typescript type WorkflowResult = { workflow_id: string; status: "success" | "failed" | "escalation" | "cancelled"; steps: StepResult[]; final_handoff?: HandoffEnvelope; total_elapsed_ms: number; }; ``` Each step result includes task ID, outcome status, validation errors, follow-up count, retry count, and elapsed time. ## CLI | Command | Description | |---------|-------------| | `agent-runtime implement <description>` | Add LLM-powered commands to a CLI tool (agentic) | | `agent-runtime audit-implementation [dir]` | Audit an LLM integration for pattern conformance (agentic) | | `agent-runtime init` | Initialize runtime scaffolding | | `agent-runtime generate` | Generate contracts and hooks from DSL | | `agent-runtime run <workflow> <request>` | Execute a workflow | | `agent-runtime run --file <path>` | Execute from a YAML invocation file | | `agent-runtime list <resource>` | List workflows, tasks, or agents | | `agent-runtime show-prompt <task>` | Display the generated prompt for a task | | `agent-runtime doctor` | Verify configuration and connectivity | | `agent-runtime agents [--format]` | Output the full embedded resolved DSL (YAML/JSON) | | `agent-runtime extract [--all] [commands...]` | Extract embedded CLI contract specification | ### `agent-runtime init` Scaffolds a new project with configuration and user code templates: ```bash agent-runtime init # Scaffold in current directory with mock adapter agent-runtime init --adapter claude # Configure for Claude adapter agent-runtime init --output ./my-proj # Scaffold in a specific directory agent-runtime init --force # Overwrite existing files ``` Creates the following structure: ``` <output-dir>/ ├── agent-runtime.config.yaml # Runtime configuration ├── agent/ │ └── src/ │ ├── plugins/ │ │ └── example-plugin.ts # AgentPlugin skeleton with hook examples │ └── guardrails/ │ └── custom-guardrails.ts # Project-specific guardrail checks └── .gitignore # Adds agent/generated/ entry ``` The generated `agent-runtime.config.yaml` points to `./agent-contracts.yaml` for DSL input and `./agent/generated/` for generated output. Edit this file to configure bindings, guardrail policies, and custom templates. Key options: ```bash agent-runtime generate --check # CI: verify generated files are up to date agent-runtime generate --clean # Delete and regenerate agent-runtime generate -t ./my-tpl # Use custom Handlebars templates agent-runtime run ... --dry-run # Simulate without calling SDK agent-runtime run ... --adapter mock # Use mock adapter for testing agent-runtime run --file ./invocation.yaml # Run from YAML manifest agent-runtime show-prompt plan-and-implement # Preview prompt agent-runtime show-prompt audit-tests -u "Check test quality" # With custom request agent-runtime show-prompt plan-and-implement --format json # JSON output agent-runtime show-prompt plan-and-implement --with-plugins # Apply plugin enhancers agent-runtime doctor # Run all diagnostic checks ``` ### `agent-runtime doctor` Runs diagnostic checks and reports pass/fail/warn status for each: | Check | What it verifies | |-------|------------------| | `config_exists` | `agent-runtime.config.yaml` is found and parses correctly | | `dsl_exists` | `agent-contracts.yaml` (per config) exists and is valid YAML | | `manifest_fresh` | Generated files match the current DSL hash | | `adapter_configured` | At least one SDK adapter has its API key env var set | | `plugin_entrypoint` | All plugin files listed in config exist on disk | | `bindings_valid` | All binding YAML files listed in config parse correctly | ```bash $ agent-runtime doctor ✓ config_exists PASS Loaded ./agent-runtime.config.yaml ✓ dsl_exists PASS Parsed OK — 3 agent(s), 4 task(s), 2 workflow(s) ✓ manifest_fresh PASS Generated files are up to date ! adapter_configured WARN No SDK adapter API keys found. Set one of: ... ✓ plugin_entrypoint PASS No plugins configured ✓ bindings_valid PASS No bindings configured All checks passed. ``` Outputs a `DoctorResult` JSON object to stdout. Exits with code 0 when all checks pass (warnings are OK), code 1 when any check fails. ### YAML invocation file For CI or programmatic invocation, define the full request as a YAML file: ```yaml # invocation.yaml workflow: feature-implement handoff: type: feature-implementation-request payload: objective: Add login endpoint with JWT inputs: repository: . expected_outputs: - implementation-diff completion_criteria: - tests_passed runtime: max_follow_ups: 3 max_retries: 1 ``` ```bash agent-runtime run --file ./invocation.yaml --adapter claude # or: --adapter gemini, --adapter mock ``` For full CLI reference with exit codes and output schemas, see [docs/cli-reference.md](docs/cli-reference.md). The CLI specification is managed contract-first via [`cli-contracts`](https://www.npmjs.com/package/cli-contracts) in [`cli-contract.yaml`](cli-contract.yaml). ## SDK adapters | Adapter | SDK | Status | |---------|-----|--------| | `claude` | Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) | Implemented | | `gemini` | Google ADK (`@google/adk`) | Implemented | | `openai` | OpenAI Agents SDK (`@openai/agents`) | Implemented | | `mock` | Simulated responses for testing/demo | Implemented | ### Adapter interface The minimal adapter interface requires only `send`: ```typescript interface SdkAdapter { send(prompt: string, options: AdapterSendOptions): Promise<string>; followUp?(message: string): Promise<string>; } ``` For adapters that need full contract context, implement `sendExecution`: ```typescript interface SdkAdapter { send(prompt: string, options: AdapterSendOptions): Promise<string>; followUp?(message: string): Promise<string>; sendExecution?(request: AgentExecutionRequest): Promise<string>; } ``` When `sendExecution` is implemented, the runtime prefers it over `send` and passes the full `AgentExecutionRequest` containing agent/task IDs, handoff info, Zod schema metadata, and task context. ### Choosing adapter methods | Method | Use when | |--------|----------| | `send(prompt, options)` | Your SDK only needs a prompt string | | `sendExecution(request)` | Your SDK needs agent/task IDs, handoff metadata, schema info, tools, or context | | `followUp(message)` | Your SDK supports continuing the same session after validation errors | Adapters may start with `send` and later upgrade to `sendExecution` without changing workflow code. SDK-specific options belong on the adapter constructor, not on the shared interface: ```typescript // Claude Agent SDK const claudeAdapter = new ClaudeAgentSdkAdapter({ model: "claude-sonnet-4-20250514", cwd: process.cwd(), guardrailHooks, }); // OpenAI Agents SDK const openaiAdapter = new OpenAIAgentsSdkAdapter({ model: "gpt-4.1", maxTurns: 20, guardrailHooks, }); ``` ### Claude Agent SDK adapter The Claude adapter wraps `@anthropic-ai/claude-agent-sdk`, which runs Claude as a stateful coding agent with built-in tool execution (Read, Edit, Bash, etc.). ```typescript import { ClaudeAgentSdkAdapter } from "agent-contracts-runtime/adapters/claude-agent-sdk"; const adapter = new ClaudeAgentSdkAdapter({ cwd: process.cwd(), model: "claude-sonnet-4-20250514", // optional, uses SDK default permissionMode: "bypassPermissions", // default for automated workflows maxTurns: 20, // optional turn limit guardrailHooks, // optional guardrail enforcement }); ``` Internally the adapter calls the SDK's `query()` function, which returns an AsyncGenerator of SDK events. The adapter iterates the stream and extracts the final result text. `followUp()` resumes the same session via the SDK's `resume` option, so the agent retains full conversation context when correcting output format. | Config option | Description | Default | |---------------|-------------|---------| | `cwd` | Working directory | `process.cwd()` | | `model` | Claude model identifier | SDK default | | `tools` | Available tools (string array or `{ type: 'preset', preset: 'claude_code' }`) | Auto-selected based on `readonly` | | `permissionMode` | `"default"` / `"acceptEdits"` / `"bypassPermissions"` / `"plan"` | `"bypassPermissions"` | | `maxTurns` | Maximum conversation turns | No limit | | `guardrailHooks` | Runtime guardrail hooks (mapped to SDK `PreToolUse` hooks) | None | > **Note:** `@anthropic-ai/claude-agent-sdk` requires `zod@^4.0.0` as a peer dependency. This runtime uses `zod@^4.0.0` natively. ### OpenAI Agents SDK adapter The OpenAI adapter wraps `@openai/agents`, which provides a lightweight agent framework with built-in tool execution, guardrails, handoffs, and tracing. ```typescript import { OpenAIAgentsSdkAdapter } from "agent-contracts-runtime/adapters/openai-agents-sdk"; const adapter = new OpenAIAgentsSdkAdapter({ model: "gpt-4.1", // optional, uses SDK default maxTurns: 20, // optional turn limit (default: 10) guardrailHooks, // optional guardrail enforcement }); ``` Internally the adapter creates a fresh `Agent` for each `send()` call with the contract prompt as `instructions`, then calls the SDK's `run()` function and extracts `finalOutput` as the result text. `followUp()` resumes the same conversation via the SDK's `previousResponseId` option, so the model retains full context when correcting output format. | Config option | Description | Default | |---------------|-------------|---------| | `model` | Model identifier (e.g. `"gpt-4.1"`, `"gpt-5.5"`) | SDK default | | `maxTurns` | Maximum agent loop turns | `10` (SDK default) | | `tools` | Additional tools to pass to the Agent | None | | `agentName` | Name for the Agent instance | `"contract-agent"` | | `guardrailHooks` | Runtime guardrail hooks (mapped to SDK `InputGuardrail`) | None | | `signal` | `AbortSignal` for cancellation | None | > **Note:** `@openai/agents` requires `zod@^4.0.0` as a peer dependency. This runtime uses `zod@^4.0.0` natively. ### Google Gemini adapter The Gemini adapter wraps `@google/genai`, Google's TypeScript SDK for Gemini models. ```typescript import { GeminiSdkAdapter } from "agent-contracts-runtime/adapters/gemini-sdk"; const adapter = new GeminiSdkAdapter({ model: "gemini-2.5-flash", // optional, defaults to gemini-2.5-flash apiKey: process.env.GEMINI_API_KEY, // optional, falls back to env var temperature: 0.7, // optional maxOutputTokens: 8192, // optional guardrailHooks, // optional guardrail enforcement }); ``` Internally the adapter creates a chat session via `ai.chats.create()` for each `send()` call, then calls `chat.sendMessage()`. This maintains conversation state for `followUp()` within the same chat session. | Config option | Description | Default | |---------------|-------------|---------| | `apiKey` | Gemini API key (or set `GEMINI_API_KEY` env var) | Env var | | `model` | Model identifier (e.g. `"gemini-2.5-flash"`, `"gemini-2.5-pro"`) | `"gemini-2.5-flash"` | | `systemInstruction` | System instruction prepended to conversations | None | | `temperature` | Temperature for generation (0.0–2.0) | SDK default | | `maxOutputTokens` | Maximum output tokens | SDK default | | `guardrailHooks` | Runtime guardrail hooks (evaluated locally on responses) | None | > **Note:** The `gemini` adapter name routes to the ADK adapter (see below). `GeminiSdkAdapter` (`agent-contracts-runtime/adapters/gemini-sdk`) remains available as a legacy alias. ### Google ADK adapter The ADK adapter wraps `@google/adk` (Google Agent Development Kit), which provides native sub-agent routing via `LlmAgent.subAgents`. The `gemini` adapter name routes to this implementation. ```typescript import { AdkSdkAdapter } from "agent-contracts-runtime/adapters/adk-sdk"; const adapter = new AdkSdkAdapter({ model: "gemini-2.5-flash", apiKey: process.env.GEMINI_API_KEY, guardrailHooks, }); ``` | Config option | Description | Default | |---------------|-------------|---------| | `apiKey` | Gemini API key (or set `GEMINI_API_KEY` env var) | Env var | | `model` | Model identifier (e.g. `"gemini-2.5-flash"`) | `"gemini-2.5-flash"` | | `rootAgentName` | Name for the root agent | `"root_agent"` | | `guardrailHooks` | Runtime guardrail hooks | None | | `cacheConfig` | Prompt caching config `{ enabled: boolean }` | `{ enabled: true }` | > **Note:** The ADK adapter does not support `followUp()` (session resume). Recovery uses fresh `send()`. ## Handoff validation The runtime validates both input and output handoffs against generated Zod schemas. Input validation happens when a `WorkflowInvocation` includes a typed handoff (Structured API or Builder API). The handoff factory validates the payload at construction time. Output validation happens after each SDK execution. The runtime extracts the structured result from the agent's output and validates it against the expected handoff schema for that workflow step. If validation fails, the runtime attempts a followUp (lightweight, same session) to correct the output format before falling back to a full retry. ## Follow-up and retry | | followUp | retry | |---|---|---| | **Method** | `adapter.followUp()` | `adapter.send()` | | **Cost** | Lightweight | Heavy | | **Use case** | Output format correction | Full task re-execution | | **Default limit** | `maxFollowUps: 2` | `maxRetries: 0` (opt-in) | | **DSL mapping** | Independent (always available) | `step.max_retries` in workflow DSL | | **Trigger** | Zod schema validation error | Empty output, or `decideRetryStrategy` | | **Session** | Same session continues (Claude: `resume` with session ID, OpenAI: `previousResponseId`) | New session | The prompt includes the full handoff schema field table and a YAML example, so the agent can produce valid output on the first attempt without guessing the format. Inject custom recovery logic via `decideRetryStrategy`: ```typescript const result = await runTask(adapter, "plan-and-implement", { user_request: "Add login", decideRetryStrategy: async (outcome, attempt) => { if (attempt >= 2) return "abort"; if (outcome.status === "validation_error") return "follow_up"; return "retry"; }, }); ``` ## Runtime hooks and plugins Plugins let project code customize execution around DSL-defined workflows without changing the DSL or runtime core. ```typescript interface AgentPlugin { readonly id: string; beforeTask?(taskId: string, context: TaskContext): Promise<TaskContext | null>; contextEnhancer?(taskId: string, context: TaskContext): TaskContext; afterTask?(taskId: string, outcome: TaskOutcome): Promise<TaskOutcome>; promptEnhancer?(taskId: string, prompt: string, context: TaskContext): string; promptBuilder?(args: PromptBuilderArgs): string | null; customGuardrails?: { /* evaluateCommand, evaluateFilePath, evaluateFileContent */ }; beforeWorkflow?(workflowId: string, userRequest: string): Promise<void>; afterWorkflow?(workflowId: string, result: WorkflowResult): Promise<void>; } ``` ### Hook execution order ``` beforeWorkflow ↓ beforeTask ← skip task (return null) or modify context ↓ contextEnhancer ← enrich structured context (variables, handoff_input, etc.) ↓ promptBuilder ← full prompt override, or null to use default ↓ promptEnhancer ← lightweight post-processing on prompt string ↓ SDK Adapter send ↓ afterTask ↓ afterWorkflow ``` ### beforeTask vs contextEnhancer | Hook | Purpose | Side effects | |------|---------|--------------| | `beforeTask` | Skip task (return `null`), replace context entirely, perform gating decisions | May have side effects | | `contextEnhancer` | Add structured variables, paths, or metadata to context | Should be side-effect free | ### Context vs prompt The runtime treats prompt as a **derived artifact** from structured context. Plugins should prefer modifying `TaskContext` via `contextEnhancer` over string manipulation in `promptEnhancer` when the data is structured: ```typescript const contextPlugin: AgentPlugin = { id: "context-enricher", contextEnhancer(taskId, context) { return { ...context, relevant_paths: ["src/auth/**", "src/middleware/**"], variables: { ...context.variables, dbType: "PostgreSQL", orm: "TypeORM", }, }; }, }; ``` ### Prompt customization hooks | Hook | Purpose | Execution order | |------|---------|-----------------| | `promptBuilder` | Full prompt override. Return a string to replace the default prompt, or `null` to use the default. | 1st (before `promptEnhancer`) | | `promptEnhancer` | Lightweight post-processor. Receives the built prompt and returns a modified version. | 2nd (after `promptBuilder`) | ### Register a plugin ```typescript import { pluginRegistry, type AgentPlugin } from "agent-contracts-runtime"; const myPlugin: AgentPlugin = { id: "my-plugin", async beforeTask(taskId, context) { return context; // return null to skip }, contextEnhancer(taskId, context) { return { ...context, variables: { ...context.variables, projectFramework: "NestJS" }, }; }, async afterTask(taskId, outcome) { return outcome; }, promptEnhancer(taskId, prompt) { return prompt + "\n\nAlways write tests first."; }, }; pluginRegistry.register(myPlugin); ``` ## Guardrails Guardrails evaluate commands, file paths, and file content before execution. Generated guardrail hooks (from DSL + binding) and plugin custom guardrails are merged and evaluated together, producing one of four actions: | Action | Effect | |--------|--------| | `block` | Deny the operation | | `warn` | Allow with warning (can fail with `fail_on_guardrail_warning`) | | `info` | Allow with informational context to user/agent | | `shadow` | Report only, no effect on execution | Guardrail checks are defined in binding YAML files (not in the DSL directly), following the `agent-contracts` `SoftwareBinding` schema with `guardrail_impl` entries. Three matcher types are supported: - `command_regex` — Match shell commands against regex patterns - `file_glob` — Match file paths against glob patterns - `content_regex` — Match file content against regex patterns ### Binding file ```yaml # bindings/runtime.yaml software: agent-runtime version: 1 guardrail_impl: no-force-push: checks: - matcher: type: command_regex pattern: "(^|[|;&]\\s*)git\\s+push\\s.*(--force|-f)\\b" message: "Force push is forbidden." block-env-files: checks: - matcher: type: file_glob pattern: "**/{.env,.env.*}" message: "Writing to .env file is blocked." ``` ## Configuration `agents.conf.yaml` (legacy `agent-runtime.config.yaml` is still supported as a fallback): ```yaml dsl: ./agent-contracts.yaml generated_dir: ./agent/generated # default: ./agent/generated bindings: # optional, for guardrail hook generation - ./bindings/runtime.yaml active_guardrail_policy: default # which guardrail_policy to activate templates_dir: ./custom-templates # optional, overrides built-in templates model_mapping: # optional, per-task LLM routing fast: adapter: gemini model: gemini-2.5-flash standard: adapter: claude model: claude-sonnet-4-20250514 thinking: adapter: claude model: claude-opus-4-6 logging: progress_log: destination: stderr # stderr | file | both | none file: ./logs/agent.log naming: single # single | per-invocation | daily ``` ### Per-task LLM selection Tasks in the DSL can declare a `model_class` (`fast`, `standard`, or `thinking`) to express the required LLM capability level. The runtime resolves each `model_class` to a concrete adapter + model pair using a 4-layer priority chain: | Priority | Source | Example | |----------|--------|---------| | 1 (highest) | `AGENT_RUNTIME_MODEL_{FAST\|STANDARD\|THINKING}` env var | `AGENT_RUNTIME_MODEL_THINKING=claude:claude-opus-4-6` | | 2 | `AGENT_RUNTIME_MODEL` env var (catch-all) | `AGENT_RUNTIME_MODEL=claude:claude-sonnet-4-20250514` | | 3 | `model_mapping.<class>` in config | `model_mapping.thinking.adapter: claude` | | 4 (lowest) | CLI `--adapter` + adapter default model | `--adapter claude` | Environment variable format is `adapter:model` (colon-separated). Omitting the model part (`claude:`) uses the adapter default. Omitting the adapter part (`claude-haiku`, no colon) uses the `--adapter` value. This enables cross-provider routing — e.g. fast tasks on Gemini, thinking tasks on Claude — within a single workflow execution. ## Exports | Import path | Description | |-------------|-------------| | `agent-contracts-runtime` | Core runtime API (`runWorkflow`, `runTask`, `executeTask`, `executeWorkflow`, `createAdapter`, `createRuntime`, `buildTaskPrompt`, `pluginRegistry`, `createModelResolver`, `zodSchemaToPromptDescription`) | | `agent-contracts-runtime` | Types (`WorkflowInvocation`, `HandoffInput`, `AgentExecutionRequest`, `WorkflowRegistries`, `SdkAdapter`, `ModelAwareSdkAdapterFactory`, `ModelResolver`, `ModelClass`, `TaskContext`, `TaskOutcome`, etc.) | | `agent-contracts-runtime/generator` | Generator API (`generate`, `checkFreshness`, `buildContractContext`) | | `agent-contracts-runtime/adapters/claude-agent-sdk` | Claude Agent SDK adapter | | `agent-contracts-runtime/adapters/adk-sdk` | Google ADK adapter | | `agent-contracts-runtime/adapters/gemini-sdk` | Google Gemini adapter (legacy alias) | | `agent-contracts-runtime/adapters/openai-agents-sdk` | OpenAI Agents SDK adapter | | `agent-contracts-runtime/adapters/mock` | Mock adapter for testing | ## LLM Features (Shared Runtime) ### Environment Variables | Variable | Adapter | Description | |----------|---------|-------------| | `ANTHROPIC_API_KEY` | `claude` | Anthropic Claude API key | | `OPENAI_API_KEY` | `openai` | OpenAI API key | | `GEMINI_API_KEY` | `gemini` | Google Gemini/ADK API key | ### Model Configuration The runtime resolves models using a 4-layer priority chain: 1. `AGENT_RUNTIME_MODEL_{FAST|STANDARD|THINKING}` env var (highest) 2. `AGENT_RUNTIME_MODEL` env var (catch-all) 3. `model_mapping.<class>` in config 4. CLI `--adapter` + adapter default model (lowest) ### DSL Extension Tools that integrate with `agent-contracts-runtime` define their LLM agents, tasks, and workflows in DSL YAML files under a `dsl/` directory. The runtime loads this DSL at build time via `loadDslContext()` and generates typed registries. This allows any CLI tool to add LLM-powered commands by: 1. Defining agents/tasks/workflows in `dsl/*.yaml` 2. Running `agent-runtime generate` to produce typed contracts 3. Using `executeTask()` / `executeWorkflow()` in command handlers 4. Declaring commands in `cli-contract.yaml` with standard LLM options (`--adapter`, `--model`, `--show-prompt`, `--fail-on`, `--output`, `--report-format`, `--log-file`) ## Requirements - Node.js 20+ - TypeScript 5.x (ESM, strict mode) - `tsx` (bundled as a dependency — used by the CLI to load generated `.ts` contracts and plugins at runtime) - `agent-contracts` (optional peer dependency for DSL resolution) - `@anthropic-ai/claude-agent-sdk` (optional, for Claude adapter — requires zod ^4.0.0) - `@google/adk` (optional, for Gemini/ADK adapter) - `@google/genai` (optional, for legacy Gemini adapter) - `@openai/agents` (optional, for OpenAI adapter — requires zod ^4.0.0) ## License MIT