@clawdcc/cvm-benchmark
Version:
Comprehensive benchmarking and performance analysis tools for Claude Code versions
246 lines (184 loc) • 6.2 kB
Markdown
# @clawd/cvm-benchmark
Comprehensive benchmarking and performance analysis tools for Claude Code versions managed by CVM.
## Features
- **--version Spawn Benchmarks**: Measure Claude startup time using `--version` flag
- **Interactive PTY Benchmarks**: Measure full interactive startup time with terminal signals
- **Multi-Run Comparison**: Compare multiple benchmark runs to verify consistency
- **HTML Reports**: Beautiful Chart.js-powered performance visualization
- **Session Cleanup**: Automatic cleanup of test sessions for fair benchmarking
- **Viability Detection**: Identify minimum viable Claude Code versions
## Installation
### Method 1: NPM + Symlink (Recommended)
```bash
# Install via NPM
npm install -g @clawd/cvm-benchmark
# Link to CVM plugins directory
ln -s $(npm root -g)/@clawd/cvm-benchmark/index.js ~/.cvm/plugins/benchmark.js
# Verify installation
cvm plugins
```
### Method 2: Direct Clone
```bash
# Clone into a local directory
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark
npm install
# Link to CVM
ln -s $(pwd)/index.js ~/.cvm/plugins/benchmark.js
# Verify
cvm plugins
```
## Usage
### As CVM Plugin (Primary Method)
```bash
# Benchmark a specific version (interactive startup test)
cvm benchmark 2.0.42
# Benchmark all installed versions
cvm benchmark --all
# Compare multiple benchmark runs
cvm benchmark --compare 1 2
# Check loaded plugins
cvm plugins
```
### Standalone Usage
```bash
# Run comprehensive benchmark suite
node index.js all
# Compare benchmark runs
node index.js compare 1 2
# Run specific benchmarks
node lib/benchmark-version.js
node lib/benchmark-interactive.js 2.0.42
node lib/benchmark-interactive-all.js
node lib/comprehensive-suite.js 3
```
## Benchmark Types
### 1. --version Spawn Test
- Spawns Claude with `--version` flag
- Measures process spawn and execution time
- Fast, simple performance indicator
- Ideal for quick version comparison
### 2. Interactive PTY Test
- Spawns Claude in pseudo-terminal (PTY)
- Detects ready state via terminal signals:
- `ESC[?2004h` - Bracketed paste mode
- `ESC[?1004h` - Focus events
- `> ` - Prompt character
- No timeout-based detection (signal-based only)
- Measures real interactive startup time
- Cleans up session files after each run
**Trust Prompt Handling:**
- Benchmark runs with `cwd: process.cwd()` (directory where script runs)
- Older versions (0.2.x, 1.0.x) show "Do you trust the files" security prompt
- Auto-accepts trust prompt by sending Enter key to proceed
- Each version spawns fresh in the benchmark directory
**Version Requirement Detection:**
- Detects versions < 1.0.24 that show "needs update" error
- Extracts minimum version requirement (expected: 1.0.24)
- Returns `result: 'error_detected'` with full error message
- Warns if minimum version changes from 1.0.24
## Output Structure
All benchmark data is stored in `~/.cvm/benchmarks/`:
```
~/.cvm/benchmarks/
├── benchmarks-all-3run.json # --version benchmarks for all versions
├── benchmark-startup-{version}.json # Individual interactive benchmarks
├── STARTUP_COMPARISON.html # Generated performance report
├── run-1/ # Multi-run comparison data
│ ├── version/
│ │ └── benchmarks-all-3run.json
│ ├── interactive/
│ │ ├── benchmark-startup-0-2-9.json
│ │ ├── benchmark-startup-2-0-42.json
│ │ └── ...
│ └── metadata.json
└── run-2/
└── ...
```
## Reports
### Performance Report
Generated from individual benchmark runs, showing:
- --version spawn times across all versions
- Interactive startup times across all versions
- Version viability markers (1.0.24+)
- Performance trends and outliers
### Comparison Report
Overlays multiple benchmark runs to show:
- Measurement consistency across runs
- Performance variance
- Reliability of benchmark data
## Version States
The benchmark tool detects three version states:
1. **error_detected**: Pre-0.2.103 versions that show error before UI
2. **ui_then_exit**: Versions 0.2.103-1.0.23 that show UI with error but immediately close
3. **ready**: Versions 1.0.24+ that are actually interactive
Minimum viable version: **1.0.24**
## Performance Data
Example benchmark results:
```json
{
"version": "2.0.42",
"results": [
{
"time": 980,
"result": "ready",
"reason": "all terminal signals received and process stable",
"signals": {
"bracketedPaste": true,
"focusEvents": true,
"prompt": true
}
}
]
}
```
## API
### Plugin API
```javascript
module.exports = {
name: 'benchmark',
version: '0.1.0',
description: 'Benchmark and analyze Claude Code performance',
commands: [...],
hooks: {
afterInstall: (version) => { /* ... */ }
}
};
```
### Module API
```javascript
const benchmarkVersion = require('@clawd/cvm-benchmark/lib/benchmark-version');
const benchmarkInteractive = require('@clawd/cvm-benchmark/lib/benchmark-interactive');
const compareRuns = require('@clawd/cvm-benchmark/lib/compare-runs');
const comprehensiveSuite = require('@clawd/cvm-benchmark/lib/comprehensive-suite');
// Run benchmarks
await benchmarkVersion.run({ runs: 3 });
await benchmarkInteractive.run('2.0.42', 3);
await comprehensiveSuite.runAll({ runNumber: 3 });
compareRuns.compare(['1', '2', '3']);
```
## Requirements
- Node.js >= 14.0.0
- CVM installed with at least one Claude Code version
- `node-pty` for interactive benchmarks
## Development
```bash
# Clone the repo
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark
# Install dependencies
npm install
# Run tests
npm test
# Run benchmarks
node lib/benchmark-interactive.js 2.0.42
```
## License
MIT
## Related Projects
- [@clawd/cvm](https://github.com/clawd/cvm) - Claude Version Manager
- [Claude Code](https://claude.com/code) - Official CLI for Claude
## Credits
Built by the CVM team to enable comprehensive performance testing across all Claude Code versions.
---
**Status**: Production-ready, actively used for benchmarking 249 Claude Code versions (0.2.x → 2.0.x)