claude-flow
Version:
Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration
30 lines • 1.29 kB
TypeScript
/**
* V3 CLI Performance Capability Benchmark
*
* Runs a small verifiable-answer corpus through the Anthropic API and reports
* pass-rate, latency, and cost. Closes the capability-evaluation gap that
* `performance benchmark --suite agent` does NOT cover — that suite measures
* the agent control plane (router, memory, hooks) without LLM calls; this
* subcommand measures the actual model's ability to solve agent-style tasks.
*
* Features:
* - Parallel execution with configurable concurrency
* - Multi-model comparison in a single run (`--models a,b,c`)
* - Per-task max-tokens overrides (declared in the fixture)
* - Configurable corpus via `--questions <path>`
*
* Inspired by GAIA / SWE-bench / GSM8K but text-only and scoreable via
* substring / exact match — no web browsing, no file attachments, no
* Hugging Face dataset download.
*
* API key resolution (in order):
* 1. $ANTHROPIC_API_KEY env var
* 2. `gcloud secrets versions access latest --secret=ANTHROPIC_API_KEY`
* 3. Fail with a clear error
*
* Refs: #2156 (Dream Cycle 2026-05-27 capabilities scan)
*/
import type { Command } from '../types.js';
declare const capabilityCommand: Command;
export default capabilityCommand;
//# sourceMappingURL=performance-capability.d.ts.map