@wundr.io/analysis-engine
Version:
Advanced code analysis engine with AST parsing, duplicate detection, complexity metrics, and AI integration
753 lines (581 loc) • 22.3 kB
Markdown
# @wundr.io/analysis-engine
[](https://www.npmjs.com/package/@wundr.io/analysis-engine)
[](https://opensource.org/licenses/MIT)
[](https://www.typescriptlang.org/)
[](https://nodejs.org)
Enterprise-grade code analysis engine with advanced AST parsing, intelligent duplicate detection,
comprehensive complexity metrics, and high-performance optimizations. Built for analyzing
large-scale codebases with memory efficiency and blazing-fast execution.
## Overview
The Analysis Engine is a sophisticated TypeScript/JavaScript code analysis toolkit that combines six
powerful analysis engines with cutting-edge performance optimizations. Designed to handle massive
codebases with ease, it delivers actionable insights while maintaining minimal memory footprint and
maximum throughput.
## Key Features
- **Six Advanced Analysis Engines**: Comprehensive code quality assessment
- **High-Performance Architecture**: 15,000+ files/second processing speed
- **Memory Efficient**: <250MB memory usage for large codebases
- **Concurrent Processing**: 30+ concurrent workers with intelligent load balancing
- **Streaming Analysis**: 60-80% memory reduction with streaming processors
- **Real-time Monitoring**: Built-in memory and performance monitoring
- **Enterprise Ready**: Production-grade error handling and resilience
- **Rich Reporting**: JSON, HTML, Markdown, and CSV output formats
## Performance Highlights
| Metric | Performance |
| ---------------------- | -------------------------- |
| **Processing Speed** | 15,000+ files/second |
| **Memory Usage** | <250MB for large codebases |
| **Concurrent Workers** | 30+ with auto-scaling |
| **Memory Reduction** | 60-80% with streaming |
| **Throughput** | 4.4x faster than baseline |
| **Cache Hit Rate** | 85%+ with object pooling |
## Installation
```bash
npm install @wundr.io/analysis-engine
```
### Peer Dependencies
```bash
npm install typescript@^5.5.0
```
## Quick Start
### Basic Analysis
```typescript
import { AnalysisEngine, analyzeProject } from '@wundr.io/analysis-engine';
// Simple analysis
const report = await analyzeProject('/path/to/project', {
outputFormats: ['json', 'html'],
includeTests: false,
});
console.log(`Analyzed ${report.summary.totalFiles} files`);
console.log(`Found ${report.duplicates.clusters.length} duplicate clusters`);
console.log(`Average complexity: ${report.complexity.averageCyclomaticComplexity}`);
```
### Advanced Usage with Progress Tracking
```typescript
import { AnalysisEngine, analyzeProjectWithProgress } from '@wundr.io/analysis-engine';
const report = await analyzeProjectWithProgress(
'/path/to/project',
progress => {
console.log(`[${progress.type}] ${progress.message}`);
if (progress.percentage) {
console.log(`Progress: ${progress.percentage}%`);
}
},
{
performance: {
maxConcurrency: 30,
enableStreaming: true,
enableMemoryOptimization: true,
},
duplicateDetection: {
minSimilarity: 0.8,
enableSemanticAnalysis: true,
},
complexity: {
maxCyclomaticComplexity: 10,
maxCognitiveComplexity: 15,
},
}
);
```
### Custom Engine Configuration
```typescript
const engine = new AnalysisEngine({
targetDir: '/path/to/project',
// File filtering
exclude: ['**/*.spec.ts', '**/node_modules/**'],
includeTests: false,
// Performance tuning
performance: {
maxConcurrency: 30,
chunkSize: 100,
enableStreaming: true,
enableMemoryOptimization: true,
memoryLimit: 250 * 1024 * 1024, // 250MB
},
// Output configuration
outputFormats: ['json', 'html', 'markdown'],
outputDir: './analysis-reports',
// Enable optimizations
useOptimizations: true,
});
const report = await engine.analyze();
```
## Analysis Engines
### 1. AST Parser Engine
Advanced TypeScript/JavaScript AST parsing with comprehensive entity extraction.
**Capabilities:**
- Classes, interfaces, types, and enums
- Functions, methods, and arrow functions
- Variables, constants, and exports
- JSDoc documentation extraction
- Dependency graph construction
- Signature and metadata analysis
**Example:**
```typescript
import { ASTParserEngine } from '@wundr.io/analysis-engine';
const parser = new ASTParserEngine();
const entities = await parser.analyze(['src/**/*.ts'], config);
console.log(`Found ${entities.length} entities`);
entities.forEach(entity => {
console.log(`${entity.type}: ${entity.name} (${entity.file}:${entity.line})`);
});
```
### 2. Duplicate Detection Engine
Intelligent duplicate code detection with semantic and structural analysis.
**Features:**
- Hash-based clustering
- Semantic similarity analysis
- Structural pattern matching
- Fuzzy matching for near-duplicates
- Consolidation recommendations
**Memory-Optimized Version:**
```typescript
import { OptimizedDuplicateDetectionEngine } from '@wundr.io/analysis-engine';
const duplicateEngine = new OptimizedDuplicateDetectionEngine({
minSimilarity: 0.8,
enableSemanticAnalysis: true,
enableStructuralAnalysis: true,
enableStreaming: true,
maxMemoryUsage: 200 * 1024 * 1024, // 200MB
clusteringAlgorithm: 'hash',
});
const clusters = await duplicateEngine.analyze(entities, config);
clusters.forEach(cluster => {
console.log(`\nDuplicate cluster (${cluster.similarity * 100}% similar):`);
cluster.entities.forEach(entity => {
console.log(` - ${entity.file}:${entity.line} (${entity.name})`);
});
if (cluster.consolidationSuggestion) {
console.log(`Suggestion: ${cluster.consolidationSuggestion.strategy}`);
console.log(`Effort: ${cluster.consolidationSuggestion.estimatedEffort}`);
}
});
```
### 3. Complexity Metrics Engine
Comprehensive complexity analysis with multiple metrics and thresholds.
**Metrics Calculated:**
- **Cyclomatic Complexity**: Control flow complexity
- **Cognitive Complexity**: Mental effort required to understand code
- **Maintainability Index**: Overall maintainability score (0-100)
- **Nesting Depth**: Maximum nesting level
- **Function Size**: Lines of code and parameter count
- **Technical Debt**: Estimated hours to address complexity issues
**Example:**
```typescript
import { ComplexityMetricsEngine } from '@wundr.io/analysis-engine';
const complexityEngine = new ComplexityMetricsEngine({
cyclomatic: { low: 5, medium: 10, high: 20, critical: 30 },
cognitive: { low: 7, medium: 15, high: 25, critical: 40 },
maintainability: { excellent: 85, good: 70, moderate: 50, poor: 25 },
nesting: { maxDepth: 4, warningDepth: 3 },
size: { maxLines: 100, maxParameters: 5 },
});
const report = await complexityEngine.analyze(entities, config);
console.log(`Average Cyclomatic: ${report.overallMetrics.averageCyclomaticComplexity}`);
console.log(`Average Cognitive: ${report.overallMetrics.averageCognitiveComplexity}`);
console.log(`Technical Debt: ${report.overallMetrics.totalTechnicalDebt} hours`);
// Complexity hotspots
report.complexityHotspots.forEach((hotspot, index) => {
console.log(`\n${index + 1}. ${hotspot.entity.name} (Score: ${hotspot.rank})`);
console.log(` Cyclomatic: ${hotspot.complexity.cyclomatic}`);
console.log(` Cognitive: ${hotspot.complexity.cognitive}`);
console.log(` Maintainability: ${hotspot.complexity.maintainability}`);
console.log(` Issues: ${hotspot.issues.join(', ')}`);
console.log(` Recommendations:`);
hotspot.recommendations.forEach(rec => console.log(` - ${rec}`));
});
```
### 4. Circular Dependency Engine
Detects and analyzes circular dependencies in your codebase.
**Features:**
- Dependency graph construction
- Cycle detection with depth analysis
- Impact assessment
- Break point suggestions
- Severity classification
**Example:**
```typescript
import { CircularDependencyEngine } from '@wundr.io/analysis-engine';
const circularEngine = new CircularDependencyEngine();
const cycles = await circularEngine.analyze(entities, config);
cycles.forEach(cycle => {
console.log(`\nCircular dependency (depth: ${cycle.depth}):`);
console.log(`Path: ${cycle.cycle.join(' -> ')}`);
console.log(`Severity: ${cycle.severity}`);
console.log(`Files involved: ${cycle.files.join(', ')}`);
console.log(`Suggestions:`);
cycle.suggestions.forEach(s => console.log(` - ${s}`));
});
```
### 5. Code Smell Engine
Identifies common code smells and anti-patterns.
**Detected Smells:**
- Long methods (>100 lines)
- Large classes (>15 methods)
- Duplicate code blocks
- Dead/unreachable code
- Complex conditionals
- Feature envy
- Inappropriate intimacy
- God objects
**Example:**
```typescript
import { CodeSmellEngine } from '@wundr.io/analysis-engine';
const smellEngine = new CodeSmellEngine();
const smells = await smellEngine.analyze(entities, config);
smells.forEach(smell => {
console.log(`\n[${smell.severity}] ${smell.type}`);
console.log(`File: ${smell.file}:${smell.line}`);
console.log(`Message: ${smell.message}`);
console.log(`Suggestion: ${smell.suggestion}`);
});
```
### 6. Unused Export Engine
Finds exported entities that are never imported elsewhere.
**Features:**
- Cross-file import tracking
- Public API detection
- Test file exclusion options
- Usage frequency analysis
**Example:**
```typescript
import { UnusedExportEngine } from '@wundr.io/analysis-engine';
const unusedEngine = new UnusedExportEngine();
const unused = await unusedEngine.analyze(entities, config);
console.log(`Found ${unused.length} unused exports`);
unused.forEach(entity => {
console.log(`${entity.name} in ${entity.file}:${entity.line}`);
});
```
## Performance Optimizations
### Worker Pool Management
Intelligent concurrent processing with auto-scaling workers.
```typescript
import { WorkerPoolManager } from '@wundr.io/analysis-engine';
const workerPool = new WorkerPoolManager({
minWorkers: 4,
maxWorkers: 30,
idleTimeout: 60000,
taskTimeout: 300000,
enableAutoScaling: true,
resourceThresholds: {
cpu: 0.85,
memory: 0.9,
},
});
// Execute tasks concurrently
const results = await Promise.all(tasks.map(task => workerPool.execute(task)));
// Monitor performance
const metrics = workerPool.getMetrics();
console.log(`Active workers: ${metrics.activeWorkers}`);
console.log(`Queue size: ${metrics.queueSize}`);
console.log(`Throughput: ${metrics.throughput} tasks/sec`);
console.log(`Error rate: ${metrics.errorRate}%`);
await workerPool.shutdown();
```
### Streaming File Processor
Process large codebases with minimal memory footprint.
```typescript
import { StreamingFileProcessor } from '@wundr.io/analysis-engine';
const processor = new StreamingFileProcessor({
batchSize: 100,
maxConcurrency: 10,
enableBackpressure: true,
highWaterMark: 1000,
lowWaterMark: 100,
});
processor.on('batch', batch => {
console.log(`Processing batch of ${batch.length} files`);
});
processor.on('progress', progress => {
console.log(`Processed ${progress.processed}/${progress.total} files`);
});
const results = await processor.processFiles(['src/**/*.ts'], async file => {
// Process each file
return analyzeFile(file);
});
console.log(`Processed ${results.length} files`);
console.log(`Peak memory: ${processor.getMemoryStats().peakUsage / 1024 / 1024} MB`);
```
### Memory Monitor
Track memory usage and prevent leaks.
```typescript
import { MemoryMonitor } from '@wundr.io/analysis-engine';
const monitor = new MemoryMonitor({
snapshotInterval: 5000, // 5 seconds
maxSnapshots: 200,
enableLeakDetection: true,
heapDumpThreshold: 0.9, // 90% of max memory
maxMemory: 500 * 1024 * 1024, // 500MB
});
monitor.on('warning', data => {
console.warn(`Memory warning: ${data.message}`);
console.warn(`Current usage: ${data.usage / 1024 / 1024} MB`);
});
monitor.on('critical', data => {
console.error(`Critical memory state: ${data.message}`);
// Trigger cleanup or halt processing
});
monitor.start();
// Your analysis code here
const stats = monitor.getStats();
console.log(`Peak memory: ${stats.peakUsage / 1024 / 1024} MB`);
console.log(`GC events: ${stats.gcEvents}`);
console.log(`Average heap: ${stats.averageHeap / 1024 / 1024} MB`);
monitor.stop();
```
## CLI Integration
The analysis engine includes a powerful command-line interface.
### Installation
```bash
npm install -g @wundr.io/analysis-engine
```
### Commands
```bash
# Analyze a codebase
wundr-analyze analyze ./src
# With options
wundr-analyze analyze ./src \
--output ./reports \
--format json,html,markdown \
--max-complexity 10 \
--min-similarity 0.8 \
--concurrency 30 \
--enable-ai \
--verbose
# Exclude patterns
wundr-analyze analyze ./src \
--exclude "**/*.spec.ts,**/*.test.ts"
# Include test files
wundr-analyze analyze ./src --include-tests
```
### CLI Options
| Option | Description | Default |
| ------------------ | --------------------------------------- | ------------------- |
| `-o, --output` | Output directory for reports | `./analysis-output` |
| `-f, --format` | Output formats (json,html,markdown,csv) | `json,html` |
| `--include-tests` | Include test files in analysis | `false` |
| `--exclude` | Additional exclude patterns | - |
| `--max-complexity` | Max cyclomatic complexity threshold | `10` |
| `--min-similarity` | Min similarity for duplicates | `0.8` |
| `--concurrency` | Max concurrent file processing | `10` |
| `--enable-ai` | Enable AI-powered analysis | `false` |
| `--verbose` | Enable verbose output | `false` |
## Benchmark Suite
Comprehensive performance benchmarking for optimizations.
```typescript
import { PerformanceBenchmarkSuite } from '@wundr.io/analysis-engine';
const benchmark = new PerformanceBenchmarkSuite({
testDataSets: [
{
name: 'Small Project',
fileCount: 100,
avgFileSize: 5000,
complexity: 'low',
duplicateRatio: 0.1,
},
{
name: 'Medium Project',
fileCount: 1000,
avgFileSize: 8000,
complexity: 'medium',
duplicateRatio: 0.2,
},
{
name: 'Large Project',
fileCount: 10000,
avgFileSize: 10000,
complexity: 'high',
duplicateRatio: 0.3,
},
],
iterations: 5,
outputDir: './benchmarks',
enableProfiling: true,
memoryLimit: 500 * 1024 * 1024,
concurrencyLevels: [1, 5, 10, 20, 30],
});
// Run benchmarks
const results = await benchmark.runFullSuite();
// Display results
console.log('\nBenchmark Results:');
console.log(`Speedup: ${results.improvement.speedup}x`);
console.log(`Memory reduction: ${results.improvement.memoryReduction}%`);
console.log(`Throughput increase: ${results.improvement.throughputIncrease}%`);
console.log(`Overall score: ${results.improvement.overallScore}`);
// Generate report
await benchmark.generateReport(results, 'html');
```
### Benchmark Metrics
- **Execution Time**: Total analysis duration
- **Throughput**: Files processed per second
- **Memory Usage**: Peak and average memory consumption
- **CPU Usage**: Average and peak CPU utilization
- **Concurrency Efficiency**: Worker pool utilization
- **Cache Performance**: Hit rates and efficiency
- **Error Rate**: Failed operations percentage
## Configuration Options
### Analysis Config
```typescript
interface AnalysisConfig {
// Target configuration
targetDir: string;
exclude: string[];
includeTests: boolean;
// Output configuration
outputFormats: ('json' | 'html' | 'markdown' | 'csv')[];
outputDir: string;
// Performance tuning
performance: {
maxConcurrency: number;
chunkSize: number;
enableStreaming: boolean;
enableMemoryOptimization: boolean;
memoryLimit: number;
};
// Duplicate detection
duplicateDetection: {
minSimilarity: number;
enableSemanticAnalysis: boolean;
enableStructuralAnalysis: boolean;
clusteringAlgorithm: 'hash' | 'hierarchical' | 'density';
};
// Complexity thresholds
complexity: {
maxCyclomaticComplexity: number;
maxCognitiveComplexity: number;
maxNestingDepth: number;
maxFunctionLength: number;
maxParameters: number;
};
// AI features
enableAIAnalysis: boolean;
aiConfig: {
model: string;
temperature: number;
maxTokens: number;
};
// Optimizations
useOptimizations: boolean;
}
```
## Related Packages
The Analysis Engine is part of the Wundr ecosystem:
- **[@wundr.io/cli](../cli)** - Command-line interface and project orchestration
- **[@wundr.io/governance](../governance)** - Governance framework and policy engine
- **[@wundr.io/drift-detection](../drift-detection)** - Code quality drift monitoring
- **[@wundr.io/pattern-standardization](../pattern-standardization)** - Pattern detection and
auto-fixing
- **[@wundr.io/dependency-analyzer](../dependency-analyzer)** - Advanced dependency analysis
- **[@wundr.io/test-management](../test-management)** - Test coverage and baseline tracking
- **[@wundr.io/monorepo-manager](../monorepo-manager)** - Monorepo management utilities
## API Reference
### Core Classes
- `AnalysisEngine` - Main orchestrator for all analysis operations
- `SimpleAnalyzer` - Simplified analysis interface
- `ASTParserEngine` - TypeScript/JavaScript AST parsing
- `DuplicateDetectionEngine` - Standard duplicate detection
- `OptimizedDuplicateDetectionEngine` - Memory-optimized duplicate detection
- `ComplexityMetricsEngine` - Complexity analysis
- `CircularDependencyEngine` - Circular dependency detection
- `CodeSmellEngine` - Code smell identification
- `UnusedExportEngine` - Unused export detection
### Performance Components
- `WorkerPoolManager` - Concurrent task execution
- `StreamingFileProcessor` - Memory-efficient file processing
- `MemoryMonitor` - Memory tracking and leak detection
- `PerformanceBenchmarkSuite` - Benchmarking utilities
### Utilities
- `generateNormalizedHash` - Create normalized code hashes
- `generateSemanticHash` - Generate semantic similarity hashes
- `createId` - Generate unique identifiers
- `processConcurrently` - Concurrent processing helper
## Examples
### Example 1: Full Codebase Analysis
```typescript
import { analyzeProject } from '@wundr.io/analysis-engine';
async function analyzeCodebase() {
const report = await analyzeProject('/path/to/project', {
outputFormats: ['json', 'html'],
performance: {
maxConcurrency: 30,
enableStreaming: true,
},
});
console.log(`\nAnalysis Summary:`);
console.log(`Total Files: ${report.summary.totalFiles}`);
console.log(`Total Entities: ${report.summary.totalEntities}`);
console.log(`Duplicate Clusters: ${report.duplicates.clusters.length}`);
console.log(`Circular Dependencies: ${report.circularDependencies.length}`);
console.log(`Code Smells: ${report.codeSmells.length}`);
console.log(`Unused Exports: ${report.unusedExports.length}`);
console.log(`Average Complexity: ${report.complexity.averageCyclomaticComplexity}`);
console.log(`Technical Debt: ${report.complexity.totalTechnicalDebt} hours`);
}
```
### Example 2: Targeted Complexity Analysis
```typescript
import { AnalysisEngine, ComplexityMetricsEngine } from '@wundr.io/analysis-engine';
async function findComplexFunctions() {
const engine = new AnalysisEngine({
targetDir: './src',
exclude: ['**/*.spec.ts'],
});
const report = await engine.analyze();
const complexFunctions = report.complexity.complexityHotspots
.filter(h => h.complexity.cyclomatic > 20)
.sort((a, b) => b.rank - a.rank);
console.log(`\nTop 10 Most Complex Functions:`);
complexFunctions.slice(0, 10).forEach((hotspot, i) => {
console.log(`\n${i + 1}. ${hotspot.entity.name}`);
console.log(` File: ${hotspot.entity.file}:${hotspot.entity.line}`);
console.log(` Cyclomatic: ${hotspot.complexity.cyclomatic}`);
console.log(` Cognitive: ${hotspot.complexity.cognitive}`);
console.log(` Maintainability: ${hotspot.complexity.maintainability}`);
});
}
```
### Example 3: Duplicate Code Cleanup
```typescript
import { OptimizedDuplicateDetectionEngine } from '@wundr.io/analysis-engine';
async function findDuplicates() {
const engine = new OptimizedDuplicateDetectionEngine({
minSimilarity: 0.85,
enableSemanticAnalysis: true
});
const entities = /* ... get entities from AST parser ... */;
const clusters = await engine.analyze(entities, config);
console.log(`\nFound ${clusters.length} duplicate clusters\n`);
clusters
.filter(c => c.severity === 'critical' || c.severity === 'high')
.forEach(cluster => {
console.log(`\nCluster: ${cluster.type} (${(cluster.similarity * 100).toFixed(1)}% similar)`);
console.log(`Severity: ${cluster.severity}`);
console.log(`Instances:`);
cluster.entities.forEach(e => {
console.log(` - ${e.file}:${e.line} (${e.name})`);
});
if (cluster.consolidationSuggestion) {
const suggestion = cluster.consolidationSuggestion;
console.log(`\nRecommendation: ${suggestion.strategy}`);
console.log(`Target: ${suggestion.targetFile}`);
console.log(`Effort: ${suggestion.estimatedEffort}`);
console.log(`Impact: ${suggestion.impact}`);
console.log(`Steps:`);
suggestion.steps.forEach(step => console.log(` ${step}`));
}
});
}
```
## Contributing
We welcome contributions! Please see our [Contributing Guide](../../CONTRIBUTING.md) for details.
## License
MIT © [Adaptic.ai](https://adaptic.ai)
## Support
- **Documentation**: [https://wundr.io/docs](https://wundr.io/docs)
- **GitHub Issues**:
[https://github.com/adapticai/wundr/issues](https://github.com/adapticai/wundr/issues)
- **Discord**: [https://discord.gg/wundr](https://discord.gg/wundr)
---
**Built with excellence by the Wundr team at Adaptic.ai**