claude-flow
Version:
Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration
27 lines • 988 B
TypeScript
/**
* GAIA End-to-End Smoke — ADR-133
*
* Wires gaia-agent.ts + gaia-judge.ts into a single end-to-end pipeline:
*
* for each question in SMOKE_FIXTURE:
* 1. runGaiaAgent(question) — Haiku agent loop, ≤8 turns
* 2. judgeAnswer(question, result.finalAnswer) — exact-match fast-path,
* Sonnet LLM-judge only if exact-match misses
*
* Reports: pass rate, total cost, mean turn count.
* Asserts: ≥ 3/5 questions pass (lenient — smoke fixture is not trivial).
*
* Cost discipline:
* - Agent: claude-haiku-4-5 at $0.25/$1.25 per M tokens
* - Judge: claude-sonnet-4-6 at $3/$15 per M tokens (only when needed)
* - Expected total for 5 questions × ~2 turns × Haiku + 1-2 Sonnet
* judge calls ≈ $0.02
*
* Usage:
* ANTHROPIC_API_KEY=sk-ant-... npx tsx src/benchmarks/gaia-e2e-smoke.ts
*
* Refs: ADR-133, #2156
*/
declare function runE2ESmoke(): Promise<void>;
export { runE2ESmoke };
//# sourceMappingURL=gaia-e2e-smoke.d.ts.map