glassbox-ai
Version:
Enterprise-grade AI testing framework with reliability, observability, and comprehensive validation
426 lines (324 loc) • 13 kB
Markdown
# Performance Benchmarks
Comprehensive performance benchmarking suite for the Glassbox CLI tool that measures various aspects of system performance, efficiency, and responsiveness.
## Overview
The performance benchmark suite provides detailed measurements across six key areas:
1. **Test Execution Performance** - Measures execution time for different suite sizes
2. **Memory Usage** - Monitors memory consumption during large test runs
3. **Network Efficiency** - Analyzes bandwidth usage and API call efficiency
4. **VS Code Extension Responsiveness** - Tests extension performance and UI responsiveness
5. **Startup Time** - Measures initialization overhead and startup performance
6. **Cache Performance** - Evaluates cache hit rates and storage efficiency
## Quick Start
### Run All Benchmarks
```bash
# Run complete benchmark suite
node src/benchmarks/index.js all
# Run with verbose logging
node src/benchmarks/index.js all --verbose
# Run with garbage collection enabled
node src/benchmarks/index.js all --gc
```
### Run Specific Categories
```bash
# Run only test execution benchmarks
node src/benchmarks/index.js category testExecution
# Run only memory benchmarks
node src/benchmarks/index.js category memory
# Run only network benchmarks
node src/benchmarks/index.js category network
```
### Run Individual Benchmarks
```bash
# Run specific benchmark
node src/benchmarks/index.js benchmark testExecution "Small Suite (5 tests)"
# Run VS Code extension responsiveness
node src/benchmarks/index.js benchmark vscode "Extension Command Responsiveness"
```
### List Available Benchmarks
```bash
# List all categories
node src/benchmarks/index.js list
# List benchmarks in a category
node src/benchmarks/index.js list testExecution
```
## Benchmark Categories
### 1. Test Execution Benchmarks
Measures execution time and performance for different test suite sizes and configurations.
**Available Benchmarks:**
- `Small Suite (5 tests)` - Tests with 5 test cases
- `Medium Suite (25 tests)` - Tests with 25 test cases
- `Large Suite (100 tests)` - Tests with 100 test cases
- `Extra Large Suite (500 tests)` - Tests with 500 test cases
- `Cached Execution` - Compares cached vs non-cached performance
- `Optimized Runner` - Tests optimized runner performance
- `Parallel vs Sequential` - Compares parallel and sequential execution
- `Model Performance Comparison` - Tests different AI models
**Metrics Measured:**
- Execution time (average, min, max, standard deviation)
- Success rate
- Memory usage
- Network requests and bandwidth
### 2. Memory Benchmarks
Monitors memory usage patterns, detects memory leaks, and measures memory efficiency.
**Available Benchmarks:**
- `Large Test Memory Usage` - Memory usage during large test runs
- `Optimized Runner Memory` - Memory usage with optimized runner
- `Cached Memory Usage` - Memory impact of caching
- `Memory Leak Detection` - Detects potential memory leaks
- `Concurrency Memory Usage` - Memory usage with different concurrency levels
- `Streaming Memory Usage` - Memory usage with streaming responses
- `Response Size Memory Usage` - Memory usage with different response sizes
**Metrics Measured:**
- RSS (Resident Set Size)
- Heap usage (used, total, external)
- Memory growth patterns
- Memory leak detection
- Garbage collection impact
### 3. Network Benchmarks
Analyzes network performance, latency, throughput, and connection efficiency.
**Available Benchmarks:**
- `Network Latency` - Measures API call latency
- `Network Throughput` - Tests data transfer rates
- `Connection Pooling` - Tests connection reuse efficiency
- `Batch Size Efficiency` - Tests different batch sizes
- `Cached Network Efficiency` - Network efficiency with caching
- `Network Error Handling` - Tests error handling and retries
- `Streaming Efficiency` - Compares streaming vs non-streaming
- `Model Network Performance` - Network performance with different models
**Metrics Measured:**
- Request latency (average, min, max, p95, p99)
- Network throughput (MB/s)
- Request count and error rates
- Data transfer efficiency
- Connection pooling effectiveness
### 4. VS Code Extension Benchmarks
Tests VS Code extension performance, command responsiveness, and UI updates.
**Available Benchmarks:**
- `Extension Command Responsiveness` - Tests command execution speed
- `UI Update Responsiveness` - Measures UI update performance
- `Extension Startup Time` - Tests extension activation time
- `File Validation Performance` - Tests file validation speed
- `Extension Memory Usage` - Monitors extension memory usage
**Metrics Measured:**
- Command execution time
- UI update latency
- Extension startup time
- File validation performance
- Memory usage patterns
### 5. Startup Benchmarks
Measures startup time, initialization overhead, and system boot performance.
**Available Benchmarks:**
- `CLI Startup Time` - Tests CLI initialization
- `Module Loading Performance` - Tests module loading speed
- `Configuration Parsing` - Tests config parsing performance
- `Cache Initialization` - Tests cache setup time
- `Optimized Runner Initialization` - Tests runner setup
- `File System Initialization` - Tests file system setup
- `Validation Initialization` - Tests validation system setup
- `Cold vs Warm Startup` - Compares cold and warm startup times
- `Startup Memory Usage` - Monitors memory during startup
**Metrics Measured:**
- Startup time (average, min, max)
- Module loading times
- Initialization step timing
- Memory usage during startup
- Cold vs warm startup performance
### 6. Cache Benchmarks
Evaluates cache performance, hit rates, storage efficiency, and cache management.
**Available Benchmarks:**
- `Cache Hit Rate Performance` - Tests cache hit rates
- `Cache Storage Efficiency` - Tests storage optimization
- `Cache Invalidation Performance` - Tests cache clearing speed
- `Cache TTL Performance` - Tests time-to-live functionality
- `Cache Memory Usage` - Monitors cache memory usage
- `Cache Key Distribution` - Tests key distribution efficiency
- `Cache Compression Efficiency` - Tests compression ratios
- `Cache Concurrent Access` - Tests concurrent access patterns
- `Cache Persistence Performance` - Tests cache persistence
**Metrics Measured:**
- Cache hit rates
- Storage efficiency
- Compression ratios
- Invalidation performance
- Memory usage patterns
- Concurrent access performance
## Output and Reports
### JSON Reports
Benchmark results are saved as JSON files in the `benchmarks/results/` directory:
```
benchmarks/results/
├── benchmark-2024-01-15T10-30-00-000Z.json
├── comprehensive-report.json
└── performance-report.html
```
### HTML Reports
Automatically generated HTML reports provide visual representation of benchmark results:
- **Summary Dashboard** - Overview of all benchmark categories
- **Category Performance** - Detailed breakdown by category
- **Recommendations** - Performance improvement suggestions
- **Interactive Charts** - Visual performance metrics
### Report Structure
```json
{
"timestamp": "2024-01-15T10:30:00.000Z",
"platform": {
"platform": "darwin",
"arch": "x64",
"nodeVersion": "v18.17.0"
},
"summary": {
"totalBenchmarks": 48,
"successfulBenchmarks": 45,
"failedBenchmarks": 3,
"averageExecutionTime": 1250.5,
"peakMemoryUsage": 256000000,
"totalNetworkUsage": 15.2,
"cacheHitRate": 78.5
},
"categories": {
"testExecution": { /* benchmark results */ },
"memory": { /* benchmark results */ },
"network": { /* benchmark results */ },
"vscode": { /* benchmark results */ },
"startup": { /* benchmark results */ },
"cache": { /* benchmark results */ }
},
"recommendations": [
{
"category": "memory",
"benchmark": "Large Test Memory Usage",
"type": "warning",
"message": "High memory usage detected. Consider optimizing memory allocation."
}
]
}
```
## Configuration Options
### Benchmark Runner Options
```javascript
const suite = new PerformanceBenchmarkSuite({
outputDir: './benchmarks/results', // Output directory
detailedLogging: false, // Enable verbose logging
enableGarbageCollection: false, // Enable GC during benchmarks
iterations: 3, // Number of benchmark iterations
warmupRuns: 2, // Number of warmup runs
memorySampling: 100 // Memory sampling interval (ms)
});
```
### Environment Variables
```bash
# Enable garbage collection
export NODE_OPTIONS="--expose-gc"
# Set memory limits
export NODE_OPTIONS="--max-old-space-size=4096"
# Enable detailed logging
export DEBUG="glassbox:benchmarks"
```
## Performance Recommendations
The benchmark suite automatically generates recommendations based on performance thresholds:
### Memory Recommendations
- **High Memory Usage** (>500MB): Consider optimizing memory allocation
- **Memory Leaks**: Review object lifecycle management
- **Poor Memory Recovery**: Check garbage collection patterns
### Network Recommendations
- **High Latency** (>1s): Consider connection pooling or caching
- **Low Throughput**: Optimize batch sizes or use streaming
- **High Error Rates**: Implement better retry strategies
### Cache Recommendations
- **Low Hit Rate** (<60%): Adjust cache strategy or TTL
- **High Storage Usage**: Consider compression or cleanup
- **Slow Invalidation**: Optimize cache clearing mechanisms
### Performance Recommendations
- **Slow Execution** (>5s): Consider parallelization or optimization
- **Slow Startup** (>500ms): Optimize initialization sequence
- **Poor UI Responsiveness** (>33ms): Optimize UI update patterns
## Best Practices
### Running Benchmarks
1. **Consistent Environment**: Run benchmarks in the same environment for comparable results
2. **Idle System**: Ensure system is idle to avoid interference
3. **Multiple Runs**: Run benchmarks multiple times to account for variance
4. **Garbage Collection**: Enable GC for accurate memory measurements
5. **Network Stability**: Ensure stable network connection for network benchmarks
### Interpreting Results
1. **Baseline Comparison**: Compare against previous benchmark runs
2. **Threshold Analysis**: Use performance thresholds to identify issues
3. **Trend Analysis**: Look for performance trends over time
4. **Recommendation Review**: Pay attention to generated recommendations
5. **Category Correlation**: Consider relationships between different benchmark categories
### Performance Optimization
1. **Memory Optimization**: Monitor memory usage patterns and optimize allocations
2. **Network Optimization**: Use connection pooling and caching strategies
3. **Cache Optimization**: Adjust cache strategies based on hit rates
4. **Startup Optimization**: Optimize initialization sequences
5. **UI Optimization**: Ensure responsive UI updates
## Troubleshooting
### Common Issues
**Benchmark Failures:**
```bash
# Check for missing dependencies
npm install
# Verify API configuration
export OPENAI_API_KEY="your-api-key"
# Check file permissions
chmod +x src/benchmarks/index.js
```
**Memory Issues:**
```bash
# Increase memory limit
export NODE_OPTIONS="--max-old-space-size=8192"
# Enable garbage collection
export NODE_OPTIONS="--expose-gc"
```
**Network Issues:**
```bash
# Check network connectivity
ping api.openai.com
# Verify API keys
echo $OPENAI_API_KEY
```
### Debug Mode
Enable detailed logging for troubleshooting:
```bash
# Run with verbose logging
node src/benchmarks/index.js all --verbose
# Check specific benchmark
node src/benchmarks/index.js benchmark testExecution "Small Suite (5 tests)" --verbose
```
## Integration
### CI/CD Integration
Add benchmarks to your CI/CD pipeline:
```yaml
# GitHub Actions example
- name: Run Performance Benchmarks
run: |
node src/benchmarks/index.js all
# Upload results as artifacts
cp -r benchmarks/results/ ${{ github.workspace }}/benchmark-results/
```
### Automated Monitoring
Set up automated performance monitoring:
```javascript
// Automated benchmark runner
import { PerformanceBenchmarkSuite } from './src/benchmarks/index.js';
const suite = new PerformanceBenchmarkSuite();
const results = await suite.runAllBenchmarks();
// Check performance thresholds
if (results.summary.averageExecutionTime > 5000) {
console.error('Performance degradation detected');
process.exit(1);
}
```
## Contributing
### Adding New Benchmarks
1. Create benchmark class in appropriate category
2. Implement benchmark methods
3. Add to benchmark suite
4. Update documentation
### Benchmark Guidelines
- **Consistent Naming**: Use descriptive benchmark names
- **Proper Metrics**: Include relevant performance metrics
- **Error Handling**: Implement proper error handling
- **Resource Cleanup**: Clean up resources after benchmarks
- **Documentation**: Document benchmark purpose and metrics
## License
This benchmark suite is part of the Glassbox CLI tool and follows the same license terms.