@bcoders.gr/evm-disassembler
Version:
A comprehensive EVM bytecode disassembler and analyzer with support for multiple EVM versions
575 lines (457 loc) • 18.6 kB
Markdown
# EVM Disassembler
A comprehensive Node.js library for disassembling and analyzing Ethereum Virtual Machine (EVM) bytecode. This tool provides detailed analysis including function detection, stack analysis, security checks, and multiple output formats.
## Features
- 🔍 **Complete Bytecode Decoding** - Decode all EVM opcodes with support for multiple EVM versions
- 🎯 **Function Detection** - Automatically detect function selectors and signatures
- 📊 **Stack Analysis** - Track stack depth changes and detect potential issues
- 🔒 **Security Analysis** - Identify dangerous opcodes and potential vulnerabilities
- 📝 **Multiple Output Formats** - Text, JSON, Assembly, Markdown, and CSV
- 🏷️ **Metadata Detection** - Extract compiler information and metadata
- ⚡ **Performance Optimized** - Efficient parsing and analysis algorithms
- 🔧 **Modular Architecture** - Use individual components as needed
## Installation
```bash
npm install @bcoders.gr/evm-disassembler
```
## Quick Start
```javascript
const { EVMDisassembler } = require('@bcoders.gr/evm-disassembler');
// Create disassembler instance
const disassembler = new EVMDisassembler();
// Example bytecode (simple contract)
const bytecode = '0x608060405234801561001057600080fd5b50610150806100206000396000f3fe...';
// Disassemble with full analysis
const result = disassembler.disassemble(bytecode);
// Format as text
console.log(disassembler.format(result, 'text'));
// Get JSON output
const jsonOutput = disassembler.format(result, 'json', { pretty: true });
```
## API Reference
### EVMDisassembler Class
#### Constructor
```javascript
new EVMDisassembler(options)
```
Options:
- `evmVersion` (string): EVM version to use ('homestead', 'byzantium', 'constantinople', 'istanbul', 'berlin', 'london', 'paris', 'shanghai', 'cancun', 'latest'). Default: 'latest'
- `includeMetadata` (boolean): Include metadata detection. Default: true
- `stopAtMetadata` (boolean): Stop disassembly at metadata boundary. Default: true
- `performStackAnalysis` (boolean): Perform stack depth analysis. Default: true
- `performFunctionAnalysis` (boolean): Detect functions and signatures. Default: true
- `performSecurityAnalysis` (boolean): Perform security checks. Default: true
#### Methods
##### disassemble(bytecode)
Perform complete disassembly with all analysis.
```javascript
const result = disassembler.disassemble(bytecode);
```
Returns an object containing:
- `instructions`: Array of decoded instructions
- `metadata`: Detected compiler metadata
- `functions`: Detected function signatures
- `stack`: Stack analysis results
- `security`: Security analysis findings
- `summary`: High-level summary
##### decode(bytecode)
Decode bytecode without analysis.
```javascript
const instructions = disassembler.decode(bytecode);
```
##### format(results, format, options)
Format disassembly results.
Supported formats:
- `'text'`: Human-readable text format
- `'json'`: JSON format
- `'assembly'` or `'asm'`: Assembly-like syntax
- `'markdown'` or `'md'`: Markdown documentation
- `'csv'`: CSV format
```javascript
const textOutput = disassembler.format(result, 'text');
const jsonOutput = disassembler.format(result, 'json', { pretty: true });
```
##### validate(bytecode)
Quick validation without full disassembly.
```javascript
const validation = disassembler.validate(bytecode);
if (validation.valid) {
console.log('Bytecode is valid');
}
```
##### analyzeFunctionsWithPatterns(bytecode)
Get detailed function analysis including opcode patterns for comparison.
```javascript
const functionAnalysis = disassembler.analyzeFunctionsWithPatterns(bytecode);
// Access function patterns
functionAnalysis.functionsWithPatterns.forEach(func => {
console.log(`Function ${func.selector} (${func.signature}):`);
console.log(` Pattern Hash: ${func.patternHash}`);
console.log(` Instructions: ${func.instructionCount}`);
console.log(` Opcode Pattern: ${func.opcodePattern.join(', ')}`);
console.log(` Stack Ops: ${JSON.stringify(func.stackOperations)}`);
console.log(` Storage Ops: ${JSON.stringify(func.storageOperations)}`);
});
// Find similar functions
if (functionAnalysis.patternComparisons.exactMatches.length > 0) {
console.log('Functions with identical patterns:');
functionAnalysis.patternComparisons.exactMatches.forEach(match => {
console.log(` Pattern ${match.patternHash}: ${match.count} functions`);
match.functions.forEach(f => console.log(` - ${f.selector}: ${f.signature}`));
});
}
if (functionAnalysis.patternComparisons.similarFunctions.length > 0) {
console.log('Functions with similar patterns:');
functionAnalysis.patternComparisons.similarFunctions.forEach(comp => {
console.log(` ${comp.function1.selector} ~ ${comp.function2.selector} (${Math.round(comp.similarity * 100)}% similar)`);
});
}
```
##### compareFunctionPatterns(bytecode)
Compare function patterns to find similarities.
```javascript
const comparison = disassembler.compareFunctionPatterns(bytecode);
console.log(`Found ${comparison.exactMatches.length} exact pattern matches`);
console.log(`Found ${comparison.similarFunctions.length} similar function pairs`);
```
### Convenience Functions
```javascript
const { disassemble, decode } = require('evm-disassembler');
// Quick disassembly
const result = disassemble(bytecode);
// Quick decode
const instructions = decode(bytecode);
```
## Output Examples
### Text Format
```
PC | OPCODE | HEX | DATA
--------------------------------------------------
0 | PUSH1 | 60 | 0x80
2 | PUSH1 | 60 | 0x40
4 | MSTORE | 52 |
5 | CALLVALUE | 34 |
6 | DUP1 | 80 |
7 | ISZERO | 15 |
8 | PUSH2 | 61 | 0x0010
```
### Assembly Format
```assembly
PUSH1 0x80
PUSH1 0x40
MSTORE
CALLVALUE
DUP1
ISZERO
PUSH2 0x0010 ; 16
JUMPI
label_0:
PUSH1 0x00
DUP1
REVERT
```
### JSON Format
```json
{
"summary": {
"totalInstructions": 150,
"bytecodeSize": 336,
"functionCount": 5,
"securityScore": 85
},
"functions": [
{
"selector": "a9059cbb",
"signature": "transfer(address,uint256)",
"isKnown": true,
"opcodePattern": ["PUSH", "CALLDATALOAD", "PUSH", "SHR", "DUP1", "PUSH", "EQ", "PUSH", "JUMPI"],
"patternHash": "a1b2c3d4e5f6789a",
"instructionCount": 42,
"stackOperations": {
"pushes": 15,
"pops": 8,
"dups": 3,
"swaps": 2
},
"storageOperations": {
"loads": 2,
"stores": 1
},
"memoryOperations": {
"loads": 1,
"stores": 0
},
"controlFlow": {
"jumps": 3,
"calls": 0,
"returns": 1
}
}
],
"patternComparisons": {
"exactMatches": [
{
"patternHash": "a1b2c3d4e5f6789a",
"functions": [
{"selector": "a9059cbb", "signature": "transfer(address,uint256)"},
{"selector": "23b872dd", "signature": "transferFrom(address,address,uint256)"}
],
"count": 2
}
],
"similarFunctions": [
{
"function1": {"selector": "a9059cbb", "signature": "transfer(address,uint256)"},
"function2": {"selector": "095ea7b3", "signature": "approve(address,uint256)"},
"similarity": 0.85,
"similarityType": "high"
}
]
},
"instructions": [
{
"pc": 0,
"opcode": "PUSH1",
"pushData": "0x80"
}
]
}
```
## Advanced Usage
### Custom EVM Version
```javascript
const disassembler = new EVMDisassembler({
evmVersion: 'london'
});
```
### Security-Focused Analysis
```javascript
const result = disassembler.disassemble(bytecode);
if (result.security.score < 70) {
console.warn('Security issues detected:');
result.security.potentialVulnerabilities.forEach(vuln => {
console.warn(`- ${vuln.type}: ${vuln.description}`);
});
}
```
### Function Detection Only
```javascript
const functions = disassembler.detectFunctions(bytecode);
console.log(`Found ${functions.totalFunctions} functions`);
functions.functions.forEach(func => {
console.log(`- ${func.selector}: ${func.signature}`);
});
```
### Stack Analysis
```javascript
const stackAnalysis = disassembler.analyzeStack(bytecode);
console.log(`Max stack depth: ${stackAnalysis.maxDepth}`);
if (stackAnalysis.hasErrors) {
console.error('Stack errors detected:', stackAnalysis.errors);
}
```
## Error Handling
```javascript
const { InvalidBytecodeError } = require('evm-disassembler');
try {
const result = disassembler.disassemble(bytecode);
} catch (error) {
if (error instanceof InvalidBytecodeError) {
console.error('Invalid bytecode:', error.message);
} else {
console.error('Disassembly failed:', error.message);
}
}
```
## Pattern Analysis and Function Comparison
The disassembler now includes advanced pattern analysis to help identify similar functions and code reuse:
```javascript
const { EVMDisassembler } = require('@bcoders.gr/evm-disassembler');
const disassembler = new EVMDisassembler();
const analysis = disassembler.analyzeFunctionsWithPatterns(bytecode);
// Find functions with identical implementations
console.log('Identical Functions:');
analysis.patternComparisons.exactMatches.forEach(match => {
console.log(`Pattern ${match.patternHash}:`);
match.functions.forEach(func => {
console.log(` - ${func.selector}: ${func.signature}`);
});
});
// Find functions with similar implementations
console.log('Similar Functions:');
analysis.patternComparisons.similarFunctions.forEach(pair => {
const similarity = Math.round(pair.similarity * 100);
console.log(`${pair.function1.signature} ≈ ${pair.function2.signature} (${similarity}% similar)`);
});
// Analyze function complexity
analysis.functionsWithPatterns.forEach(func => {
console.log(`${func.signature}:`);
console.log(` Instructions: ${func.instructionCount}`);
console.log(` Stack Operations: ${func.stackOperations.pushes} pushes, ${func.stackOperations.pops} pops`);
console.log(` Storage Access: ${func.storageOperations.loads} reads, ${func.storageOperations.stores} writes`);
console.log(` External Calls: ${func.controlFlow.calls}`);
});
```
#### Pattern Hash
Each function gets a unique pattern hash based on its opcode sequence. Functions with identical hashes have the same implementation logic (ignoring specific values).
#### Similarity Scoring
The similarity algorithm compares opcode patterns using Levenshtein distance, normalized by sequence length:
- **0.9-1.0**: Very high similarity (likely copy/paste with minor changes)
- **0.7-0.9**: High similarity (similar logic, different implementations)
- **< 0.7**: Low similarity (filtered out by default)
## ERC20 Source Code Analysis
The disassembler now includes advanced ERC20 contract analysis capabilities for Solidity source code:
### Basic ERC20 Analysis
```javascript
const { EVMDisassembler } = require('evm-disassembler');
const disassembler = new EVMDisassembler();
// Analyze Solidity source code
const solidityCode = `
pragma solidity ^0.8.0;
contract MyToken {
string public name = "MyToken";
string public symbol = "MTK";
uint256 public totalSupply = 1000000;
mapping(address => uint256) public balanceOf;
mapping(address => mapping(address => uint256)) public allowance;
function transfer(address to, uint256 amount) public returns (bool) {
// implementation
return true;
}
function approve(address spender, uint256 amount) public returns (bool) {
// implementation
return true;
}
function transferFrom(address from, address to, uint256 amount) public returns (bool) {
// implementation
return true;
}
function balanceOf(address account) public view returns (uint256) {
// implementation
return balanceOf[account];
}
function totalSupply() public view returns (uint256) {
return totalSupply;
}
}
`;
// Extract ERC20 data
const erc20Data = disassembler.extractERC20Data(solidityCode);
console.log('Contract Analysis:');
console.log(`Contract Name: ${erc20Data.extraction_summary.contract_name}`);
console.log(`Total Functions: ${erc20Data.extraction_summary.total_functions}`);
console.log(`Public Functions: ${erc20Data.extraction_summary.public_functions}`);
console.log(`Total Variables: ${erc20Data.extraction_summary.total_variables}`);
console.log(`Total Mappings: ${erc20Data.extraction_summary.total_mappings}`);
// List all functions
console.log('\nFunctions:');
erc20Data.functions.forEach(func => {
console.log(` ${func.name}(${func.parameters.join(', ')}) ${func.visibility} ${func.state_mutability}`);
});
// List all variables
console.log('\nState Variables:');
erc20Data.variables.forEach(variable => {
console.log(` ${variable.type} ${variable.visibility} ${variable.name}`);
});
// List all mappings
console.log('\nMappings:');
erc20Data.mappings.forEach(mapping => {
console.log(` ${mapping.full_type} ${mapping.visibility} ${mapping.name}`);
});
```
### Combined Bytecode + Source Analysis
```javascript
// Analyze both bytecode and source code together
const combinedAnalysis = disassembler.analyzeWithSource(bytecode, solidityCode);
console.log('Combined Analysis Results:');
console.log(`Has Source Code: ${combinedAnalysis.combined.hasSourceCode}`);
console.log(`Is ERC20: ${combinedAnalysis.combined.isERC20}`);
console.log(`Source Matches Bytecode: ${combinedAnalysis.combined.sourceMatchesBytecode.match}`);
if (combinedAnalysis.combined.sourceMatchesBytecode.match) {
const comparison = combinedAnalysis.combined.sourceMatchesBytecode;
console.log(`Match Percentage: ${comparison.matchPercentage}%`);
console.log(`Common Functions: ${comparison.commonFunctions.join(', ')}`);
if (comparison.missingInBytecode.length > 0) {
console.log(`Functions in source but not in bytecode: ${comparison.missingInBytecode.join(', ')}`);
}
if (comparison.extraInBytecode.length > 0) {
console.log(`Functions in bytecode but not in source: ${comparison.extraInBytecode.join(', ')}`);
}
}
// Access both analyses
const bytecodeAnalysis = {
functions: combinedAnalysis.functions,
security: combinedAnalysis.security,
patterns: combinedAnalysis.patterns
};
const sourceAnalysis = combinedAnalysis.sourceAnalysis;
```
### Individual Extraction Methods
```javascript
// Check if contract is ERC20
const isERC20 = disassembler.isERC20Contract(solidityCode);
console.log(`Is ERC20: ${isERC20}`);
// Extract only functions
const functions = disassembler.extractSourceFunctions(solidityCode);
functions.forEach(func => {
console.log(`${func.name}: ${func.full_signature}`);
});
// Extract only variables
const variables = disassembler.extractSourceVariables(solidityCode);
variables.forEach(variable => {
console.log(`${variable.type} ${variable.name}`);
});
// Extract only mappings
const mappings = disassembler.extractSourceMappings(solidityCode);
mappings.forEach(mapping => {
console.log(`${mapping.name}: ${mapping.full_type}`);
});
// Extract contract name
const contractName = disassembler.extractContractName(solidityCode);
console.log(`Contract: ${contractName}`);
```
### Advanced Source Analysis Features
- **Function Extraction**: Captures visibility, state mutability, parameters, and return types
- **Variable Detection**: Identifies all state variables with their types and visibility
- **Mapping Analysis**: Extracts mapping structures with key-value type information
- **ERC20 Validation**: Automatically detects if source code implements ERC20 standard
- **Bytecode Comparison**: Compares source functions with detected bytecode functions
- **Comprehensive Reporting**: Provides detailed statistics and summaries
## 🚀 Latest Features & Improvements
### v2.0 - Advanced Pattern Analysis & ERC20 Support
#### 🎯 Function Pattern Analysis
- **Opcode Pattern Extraction**: Each detected function now includes its complete opcode pattern for comparison
- **Pattern Hashing**: Unique fingerprints for identical function implementations
- **Similarity Detection**: Advanced algorithm to find functions with similar logic (70%+ similarity threshold)
- **Cross-Function Comparison**: Automatic detection of code reuse and similar implementations
#### 🪙 ERC20 Source Code Analysis
- **Complete ERC20 Detection**: Automatically identifies ERC20 contracts in Solidity source code
- **Function Extraction**: Detailed analysis of all functions with parameters, visibility, and state mutability
- **Variable Analysis**: Extraction of state variables with type and visibility information
- **Mapping Detection**: Comprehensive mapping structure analysis
- **Source-Bytecode Correlation**: Compare source code functions with detected bytecode functions
#### 🔍 Enhanced Analysis Capabilities
- **Pattern Comparison Engine**: Find duplicate and similar function implementations
- **Levenshtein Distance Algorithm**: Precise similarity scoring between opcode patterns
- **Security Pattern Detection**: Identify common security patterns (Ownable, Pausable, etc.)
- **Complexity Metrics**: Advanced code complexity analysis based on patterns
#### 📊 Improved Output Formats
- **Extended JSON Output**: Includes pattern data, similarity scores, and source analysis
- **Pattern Visualization**: Clear representation of function opcode patterns
- **Comparison Reports**: Detailed similarity analysis between functions
### New API Methods
```javascript
// Pattern analysis
const patterns = disassembler.analyzeFunctionsWithPatterns(bytecode);
const comparison = disassembler.compareFunctionPatterns(bytecode);
// ERC20 source analysis
const erc20Data = disassembler.extractERC20Data(sourceCode);
const isERC20 = disassembler.isERC20Contract(sourceCode);
// Combined analysis
const combined = disassembler.analyzeWithSource(bytecode, sourceCode);
```
### Example Use Cases
1. **Smart Contract Auditing**: Identify similar functions that might share vulnerabilities
2. **Code Reuse Detection**: Find copied implementations across different contracts
3. **Pattern-Based Security Analysis**: Detect common security patterns and anti-patterns
4. **Source Code Verification**: Validate that source code matches deployed bytecode
5. **Contract Classification**: Automatically identify token standards and contract types