UNPKG

claude-flow

Version:

Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration

40 lines 1.72 kB
/** * V3 CLI gaia-bench Command — ADR-133-PR8 + ADR-135 Tracks A/B/D/E/Q + ADR-136 Track Q * * Runs GAIA benchmark questions through the claude-flow agent loop and * reports pass-rate, cost, and per-question results. * * Contract (matches gaia-benchmark.yml workflow expectations): * node bin/cli.js gaia-bench run \ * --level <1|2|3> \ * --limit <N> \ * --models <csv> \ * --output json * * JSON output shape: * { * level: number, * model: string, * summary: { total, passed, passRate, estCostUsd, hardnessDist? }, * results: [{ task_id, question, model, correct, answer, expected_output, error }] * } * * Integration (iter 39 — ADR-135): * Wires standalone track modules into the CLI so they are usable end-to-end. * - Track A (--voting-attempts N) : multi-attempt self-consistency voting * - Track B (--planning-interval N) : periodic planning checkpoints in gaia-agent * - Track D (--enable-critic) : adversarial critic review after agent answer * - Track E (--decompose) : question decomposition for multi-step Qs * - Track Q (--hardness-routing) : hardness-based compute allocation * * Precedence when flags combine: * --hardness-routing overrides --max-turns and --voting-attempts per question. * --voting-attempts > 1 takes precedence over --enable-critic (cost containment). * --decompose works independently; sub-question answers feed into voting/critic/plain. * * Refs: ADR-133, ADR-135, ADR-136, #2165, iter 28/34/36/37/39 */ import type { Command } from '../types.js'; export declare const gaiaBenchCommand: Command; export default gaiaBenchCommand; //# sourceMappingURL=gaia-bench.d.ts.map