claude-flow

Version:

Ruflo - Enterprise AI agent orchestration for Claude Code. Deploy 60+ specialized agents in coordinated swarms with self-learning, fault-tolerant consensus, vector memory, and MCP integration

github.com/ruvnet/claude-flow

ruvnet/claude-flow

30 lines • 1.29 kB

TypeScript

View Raw

/** * V3 CLI Performance Capability Benchmark * * Runs a small verifiable-answer corpus through the Anthropic API and reports * pass-rate, latency, and cost. Closes the capability-evaluation gap that * `performance benchmark --suite agent` does NOT cover — that suite measures * the agent control plane (router, memory, hooks) without LLM calls; this * subcommand measures the actual model's ability to solve agent-style tasks. * * Features: * - Parallel execution with configurable concurrency * - Multi-model comparison in a single run (`--models a,b,c`) * - Per-task max-tokens overrides (declared in the fixture) * - Configurable corpus via `--questions <path>` * * Inspired by GAIA / SWE-bench / GSM8K but text-only and scoreable via * substring / exact match — no web browsing, no file attachments, no * Hugging Face dataset download. * * API key resolution (in order): * 1. $ANTHROPIC_API_KEY env var * 2. `gcloud secrets versions access latest --secret=ANTHROPIC_API_KEY` * 3. Fail with a clear error * * Refs: #2156 (Dream Cycle 2026-05-27 capabilities scan) */ import type { Command } from '../types.js'; declare const capabilityCommand: Command; export default capabilityCommand; //# sourceMappingURL=performance-capability.d.ts.map