UNPKG

claude-flow

Version:

Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration

123 lines 5.47 kB
/** * GAIA Adversarial Critic — ADR-135 Track D * * After the main agent produces a candidate answer, a Sonnet pass reviews it. * If the critic verdict is "fail", the orchestrator re-runs the agent with the * critique injected as additional context. * * Motivation (iter 29 finding): tool quality is the bottleneck on L1 (~20.8%). * The critic catches wrong-because-of-bad-tool-result answers BEFORE submission, * without requiring better search backends. Expected L1 lift: +3-5pp. * * Design: * - NEW file only; NOT wired into gaia-bench.ts yet to avoid merge conflicts * with in-flight iter 29/31/34 branches. Wiring is a small follow-up PR. * - `criticReview()` — single Sonnet call, returns structured verdict. * - `runGaiaAgentWithCritic()` — orchestration wrapper: runs agent, calls * critic, retries once on "fail". "uncertain" is treated as "pass" (don't * burn retries on borderline cases). * - API errors and malformed JSON from the critic are caught; original * candidate is returned with an error-flagged verdict. * - Default opt-in: enableCritic=false. Set true via RunWithCriticOptions. * * Cost discipline: * - Critic uses claude-sonnet-4-6 (separate from the agent's default Haiku). * - One critic call + one optional retry = max 2 extra Sonnet calls per Q. * - Approximate extra cost per question: ~$0.003-0.005 (well within budget). * * Plugin sync TODO (follow-up PR after gaia-bench wiring): * - Update plugins/ruflo-workflows/commands/gaia-run.md with --enable-critic flag. * - Update plugins/ruflo-workflows/skills/gaia-debugging/SKILL.md: add critic * as a recommended diagnostic step for wrong-answer analysis. * * Refs: ADR-135, ADR-133, iter 29 finding, #2156 */ import { GaiaQuestion } from './gaia-loader.js'; import { GaiaAgentResult, GaiaAgentOptions } from './gaia-agent.js'; /** * Structured verdict returned by the adversarial critic. * * - "pass" → critic agrees with the candidate; no retry needed. * - "fail" → critic found a flaw; suggestedRevision provided if possible. * - "uncertain" → critic is unsure; treated as "pass" to avoid burning retries. */ export type CriticVerdictType = 'pass' | 'fail' | 'uncertain'; export interface CriticVerdict { /** Critic's assessment of the candidate answer. */ verdict: CriticVerdictType; /** Short explanation of the reasoning (1-3 sentences). */ reasoning: string; /** Suggested correction when verdict is "fail"; may be empty string. */ suggestedRevision?: string; /** Estimated USD cost of this critic call. */ costUsd: number; /** True when the critic call itself failed (API error, malformed JSON). */ error?: boolean; /** Original raw response text when parse failed. */ rawResponse?: string; } export interface CriticOptions { /** * Model to use for the critic pass. * Defaults to 'claude-sonnet-4-6' (intentionally separate from agent model). */ model?: string; /** * Maximum number of agent retries on "fail" verdict. * Default: 1 (one retry). Set to 0 to disable retries (observe-only mode). */ maxRetries?: number; /** Anthropic API key (resolved via env/gcloud if omitted). */ apiKey?: string; } export interface GaiaAgentResultWithCritic extends GaiaAgentResult { /** All critic verdicts collected (one per agent attempt). */ criticVerdicts: CriticVerdict[]; /** How many retries were actually attempted (0 if critic passed first time). */ retriesAttempted: number; } /** Options for runGaiaAgentWithCritic, extending core agent options. */ export interface RunWithCriticOptions extends GaiaAgentOptions { /** * Enable the adversarial critic pass. Default: false. * When false, runGaiaAgentWithCritic behaves identically to runGaiaAgent. */ enableCritic?: boolean; /** Critic-specific configuration. */ criticOptions?: CriticOptions; } /** * Run the adversarial critic against a candidate answer. * * @param question - The GAIA question being evaluated. * @param candidateAnswer - The agent's proposed final answer. * @param trajectory - Lightweight trajectory summary from the agent run. * @param options - Critic configuration (model, apiKey). * @returns CriticVerdict with verdict, reasoning, cost. */ export declare function criticReview(question: GaiaQuestion, candidateAnswer: string, trajectory: { steps: Array<{ tool?: string; result?: string; reasoning?: string; }>; turns: number; }, options?: CriticOptions): Promise<CriticVerdict>; /** * Run the GAIA agent with an optional adversarial critic pass. * * When enableCritic=false (default), this is a thin pass-through to * runGaiaAgent with an empty criticVerdicts array. * * When enableCritic=true: * 1. Run runGaiaAgent normally. * 2. If the agent produced a finalAnswer, call criticReview. * 3. If verdict is "fail" and retriesAttempted < maxRetries: * a. Re-run the agent with the critique injected as additional context. * b. Call criticReview again on the new answer. * 4. Return the final result with all critic verdicts attached. * * Note on "uncertain": treated as "pass" (no retry triggered). */ export declare function runGaiaAgentWithCritic(question: GaiaQuestion, options?: RunWithCriticOptions): Promise<GaiaAgentResultWithCritic>; //# sourceMappingURL=gaia-critic.d.ts.map