claude-flow
Version:
Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration
40 lines • 1.72 kB
TypeScript
/**
* V3 CLI gaia-bench Command — ADR-133-PR8 + ADR-135 Tracks A/B/D/E/Q + ADR-136 Track Q
*
* Runs GAIA benchmark questions through the claude-flow agent loop and
* reports pass-rate, cost, and per-question results.
*
* Contract (matches gaia-benchmark.yml workflow expectations):
* node bin/cli.js gaia-bench run \
* --level <1|2|3> \
* --limit <N> \
* --models <csv> \
* --output json
*
* JSON output shape:
* {
* level: number,
* model: string,
* summary: { total, passed, passRate, estCostUsd, hardnessDist? },
* results: [{ task_id, question, model, correct, answer, expected_output, error }]
* }
*
* Integration (iter 39 — ADR-135):
* Wires standalone track modules into the CLI so they are usable end-to-end.
* - Track A (--voting-attempts N) : multi-attempt self-consistency voting
* - Track B (--planning-interval N) : periodic planning checkpoints in gaia-agent
* - Track D (--enable-critic) : adversarial critic review after agent answer
* - Track E (--decompose) : question decomposition for multi-step Qs
* - Track Q (--hardness-routing) : hardness-based compute allocation
*
* Precedence when flags combine:
* --hardness-routing overrides --max-turns and --voting-attempts per question.
* --voting-attempts > 1 takes precedence over --enable-critic (cost containment).
* --decompose works independently; sub-question answers feed into voting/critic/plain.
*
* Refs: ADR-133, ADR-135, ADR-136, #2165, iter 28/34/36/37/39
*/
import type { Command } from '../types.js';
export declare const gaiaBenchCommand: Command;
export default gaiaBenchCommand;
//# sourceMappingURL=gaia-bench.d.ts.map