reverse-machine
Version:
**Next-generation JavaScript deobfuscation powered by AI**
611 lines (453 loc) โข 21.8 kB
Markdown
# Reverse Machine
**Next-generation JavaScript deobfuscation powered by AI**
[](https://www.npmjs.com/package/reverse-machine)
[](https://opensource.org/licenses/MIT)
> Transform minified, obfuscated, and bundled JavaScript into human-readable code using Large Language Models and advanced AST transformations.
## ๐ What Makes Reverse Machine Different
Reverse Machine represents a paradigm shift in JavaScript reverse engineering. Unlike traditional tools that rely solely on pattern matching and heuristics, Reverse Machine leverages the contextual understanding of Large Language Models to intelligently rename variables and functions while maintaining 100% semantic equivalence through AST-level transformations.
### โจ Key Features
- **๐ง AI-Powered Renaming**: Context-aware variable and function renaming using OpenAI GPT, Google Gemini, or Anthropic Claude
- **๐ง AST-Level Transformations**: Babel-powered structural improvements while preserving code semantics
- **๐ฆ Bundle Unpacking**: Automatic webpack bundle extraction using WebCrack
- **โก Parallel Processing**: Concurrent file processing for optimal performance
- **๐ Multi-Input Support**: Process single files, entire directories, or ZIP archives
- **๐จ Smart Formatting**: Integrated Prettier for consistent code style
## ๐ Before & After
**Input (minified):**
```javascript
function a(e,t){var n=[];var r=e.length;var i=0;for(;i<r;i+=t){if(i+t<r){n.push(e.substring(i,i+t))}else{n.push(e.substring(i,r))}}return n}
```
**Output (humanified):**
```javascript
function splitStringIntoChunks(inputString, chunkSize) {
var chunks = [];
var stringLength = inputString.length;
var currentIndex = 0;
for (; currentIndex < stringLength; currentIndex += chunkSize) {
if (currentIndex + chunkSize < stringLength) {
chunks.push(inputString.substring(currentIndex, currentIndex + chunkSize));
} else {
chunks.push(inputString.substring(currentIndex, stringLength));
}
}
return chunks;
}
```
## ๐ Installation
### Prerequisites
- **Node.js** โฅ 20.0.0
- **npm** or **yarn**
### Global Installation (Recommended)
```bash
npm install -g reverse-machine
```
### One-time Usage
```bash
npx reverse-machine [command] [options] <input>
```
## ๐ Usage Guide
### Command Overview
Reverse Machine offers three AI-powered processing modes, each optimized for different use cases and supports multiple input types:
```bash
reverse-machine <mode> [options] <input>
# Essential: Always estimate costs first (recommended)
reverse-machine <mode> --cost <input>
```
### ๐ Input Types
Reverse Machine supports three input types with intelligent output handling:
#### Single Files
```bash
reverse-machine openai script.min.js
# Creates: script.min - Deobfuscated.js (in same directory)
```
#### Project Directories
```bash
reverse-machine openai ./my-project
# Creates: ./my-project-deobfuscated/ (copy of entire project with all JS/TS files processed)
```
#### ZIP Archives
```bash
reverse-machine openai project.zip
# Creates: ./project-deobfuscated/ (extracts and processes all files)
```
**Supported file types for processing:**
- JavaScript: `.js`, `.jsx`, `.mjs`, `.cjs`
- TypeScript: `.ts`, `.tsx`
- HTML files with inline scripts: `.html`, `.htm`
- Component files: `.vue`, `.svelte`
- Configuration files: `.json` (if minified)
### ๐ค OpenAI Mode (Most Accurate)
Leverage OpenAI's GPT models for superior renaming accuracy:
```bash
# Estimate costs before processing (recommended)
reverse-machine openai --cost script.min.js
reverse-machine openai --cost --model="gpt-4o-mini" ./my-obfuscated-project
# Process a single minified file
reverse-machine openai --apiKey="sk-your-key" script.min.js
# Process an entire project directory
export OPENAI_API_KEY="sk-your-key"
reverse-machine openai ./my-obfuscated-project
# Process a ZIP archive
reverse-machine openai obfuscated-app.zip
# Advanced options for any input type
reverse-machine openai \
--model="gpt-4o" \
--concurrency=8 \
--verbose \
input-file-or-directory
```
**Environment Variables:**
- `OPENAI_API_KEY`: Your OpenAI API key
### ๐ Gemini Mode (Fast & Efficient)
Use Google's Gemini models for cost-effective processing:
```bash
# Estimate costs before processing (recommended)
reverse-machine gemini --cost script.min.js
reverse-machine gemini --cost --model="gemini-2.5-flash" ./my-project
# Process a single file
reverse-machine gemini --apiKey="your-gemini-key" script.min.js
# Process a project directory
export GEMINI_API_KEY="your-gemini-key"
reverse-machine gemini --model="gemini-1.5-pro" ./my-project
# Process a ZIP archive
reverse-machine gemini obfuscated-bundle.zip
```
**Environment Variables:**
- `GEMINI_API_KEY`: Your Google AI Studio API key
### ๐ง Anthropic Mode (Advanced Reasoning)
Use Anthropic's Claude models for superior code understanding:
```bash
# Estimate costs before processing (recommended)
reverse-machine anthropic --cost script.min.js
reverse-machine anthropic --cost --model="claude-3-5-haiku-latest" ./my-project
# Process a single file with Claude 4 reasoning
reverse-machine anthropic --apiKey="your-anthropic-key" --model="claude-4-opus-20250514-reasoning" script.min.js
# Process a project directory with environment variable
export ANTHROPIC_API_KEY="your-anthropic-key"
reverse-machine anthropic --model="claude-4-sonnet-20250514" ./my-project
# Process a ZIP archive with Claude 3.5
reverse-machine anthropic --model="claude-3-5-sonnet-latest" obfuscated-app.zip
# Advanced options with Claude 4 reasoning model
reverse-machine anthropic \
--model="claude-4-opus-20250514-reasoning" \
--verbose \
input-file-or-directory
# Fast processing with Claude 4 standard model
reverse-machine anthropic \
--model="claude-4-sonnet-20250514" \
complex-project.zip
```
**Available Anthropic Models:**
**Claude 4 Family (Latest - May 2025):**
- `claude-4-opus-20250514-reasoning` - Most powerful with extended reasoning for complex code
- `claude-4-sonnet-20250514-reasoning` - Balanced with extended reasoning capabilities
- `claude-4-opus-20250514` - Most powerful with near-instant responses
- `claude-4-sonnet-20250514` - Balanced with fast responses
**Claude 3.5 Family:**
- `claude-3-5-sonnet-latest` / `claude-3-5-sonnet-20241022` - Most capable Claude 3.5, best for complex code
- `claude-3-5-haiku-latest` / `claude-3-5-haiku-20241022` - Fast and efficient (default)
**Claude 3 Family:**
- `claude-3-opus-latest` / `claude-3-opus-20240229` - Highest accuracy for challenging code
- `claude-3-sonnet-20240229` - Balanced performance and speed
- `claude-3-haiku-20240307` - Fastest processing
**Claude 4 Reasoning vs Standard Models:**
- **Reasoning models** (`-reasoning` suffix): Use extended thinking for deeper code analysis and better variable naming
- **Standard models**: Provide near-instant responses for faster processing
**Environment Variables:**
- `ANTHROPIC_API_KEY`: Your Anthropic API key
#### ๐ง **Claude 4 Reasoning Models - Deep Code Understanding**
Claude 4 introduces revolutionary reasoning capabilities that dramatically improve variable naming quality:
**When to Use Reasoning Models:**
- **Complex, heavily obfuscated code** where context matters
- **Large codebases** with intricate dependencies
- **Critical production systems** where naming accuracy is paramount
- **Educational purposes** where you want to understand the AI's thought process
**When to Use Standard Models:**
- **Quick prototyping** and fast iterations
- **Simple minified files** with straightforward patterns
- **Batch processing** where speed is more important than perfection
- **Cost-sensitive applications** with high volume processing
**Example: Claude 4 Reasoning in Action**
```bash
# Standard Claude 4 (fast)
reverse-machine anthropic --model="claude-4-sonnet-20250514" app.min.js
# Result: Variables renamed in ~2-3 seconds with good accuracy
# Claude 4 with Reasoning (thorough)
reverse-machine anthropic --model="claude-4-sonnet-20250514-reasoning" app.min.js
# Result: Variables renamed in ~5-8 seconds with superior accuracy and context awareness
```
The reasoning models will internally analyze:
1. **Variable usage patterns** across the entire codebase
2. **Semantic relationships** between functions and data
3. **Domain-specific naming conventions** from the code context
4. **Potential conflicts** with existing variable names
### ๐ **Complete Model Comparison**
| Model Family | Model Name | Reasoning | Speed | Accuracy | Cost | Best For |
|--------------|------------|-----------|-------|----------|------|----------|
| **Claude 4** | `claude-4-opus-20250514-reasoning` | โ
Extended | Slow | Highest | $$$ | Complex obfuscated code, critical systems |
| **Claude 4** | `claude-4-opus-20250514` | โ None | Fast | High | $$$ | Production code, high accuracy needs |
| **Claude 4** | `claude-4-sonnet-20250514-reasoning` | โ
Extended | Medium | Very High | $$ | Balanced reasoning, most use cases |
| **Claude 4** | `claude-4-sonnet-20250514` | โ None | Fast | High | $$ | General purpose, good balance |
| **Claude 3.5** | `claude-3-5-sonnet-latest` | โ None | Fast | High | $$ | Proven reliability |
| **Claude 3.5** | `claude-3-5-haiku-latest` | โ None | Fastest | Good | $ | Quick processing (default) |
| **Claude 3** | `claude-3-opus-latest` | โ None | Medium | High | $$$ | Legacy complex code |
**๐ก Recommendations:**
- **Start with**: `claude-4-sonnet-20250514-reasoning` for best balance of cost, speed, and accuracy
- **For speed**: `claude-3-5-haiku-latest` for fast batch processing
- **For accuracy**: `claude-4-opus-20250514-reasoning` for the most challenging code
- **For cost**: `claude-3-5-haiku-latest` for budget-conscious projects
### ๐ **Claude 4 Usage Examples**
```bash
# 1. Claude 4 Opus with Reasoning - Maximum accuracy for complex code
reverse-machine anthropic \
--model="claude-4-opus-20250514-reasoning" \
--verbose \
complex-obfuscated-app.min.js
# 2. Claude 4 Sonnet with Reasoning - Best balance for most projects
reverse-machine anthropic \
--model="claude-4-sonnet-20250514-reasoning" \
./production-project
# 3. Claude 4 Opus Standard - Fast processing with high accuracy
reverse-machine anthropic \
--model="claude-4-opus-20250514" \
app-bundle.zip
# 4. Claude 4 Sonnet Standard - Balanced speed and performance
reverse-machine anthropic \
--model="claude-4-sonnet-20250514" \
./entire-codebase
```
**๐ Migration from Claude 3.5:**
```bash
# Old Claude 3.5 command
reverse-machine anthropic --model="claude-3-5-sonnet-latest" script.min.js
# New Claude 4 equivalent (better results)
reverse-machine anthropic --model="claude-4-sonnet-20250514-reasoning" script.min.js
```
## โ๏ธ Advanced Configuration
### Processing Pipeline
Reverse Machine uses a sophisticated 4-stage pipeline:
1. **๐ Bundle Extraction**: WebCrack unpacks webpack bundles
2. **๐ง AST Transformations**: Babel plugins normalize code structure
3. **๐ง AI Renaming**: LLMs provide context-aware variable names
4. **๐จ Formatting**: Prettier ensures consistent code style
### Performance Optimization
#### Parallel Processing
```bash
# Adjust concurrency based on your system
reverse-machine openai --concurrency=16 ./large-project # High-end systems
reverse-machine openai --concurrency=4 single-file.js # Standard systems
# For directories with many files, higher concurrency helps
reverse-machine openai --concurrency=12 project.zip # Process ZIP archives faster
```
### Output Structure
The output structure depends on your input type:
#### Single File Input
```
original-directory/
โโโ script.min.js # Original file
โโโ script.min - Deobfuscated.js # Processed output
```
#### Directory Input
```
parent-directory/
โโโ my-project/ # Original directory
โโโ my-project-deobfuscated/ # Processed copy
โโโ src/
โ โโโ app.js # All JS/TS files processed
โ โโโ utils.ts # Maintains file structure
โโโ assets/ # Non-JS files copied as-is
โโโ package.json # Configuration preserved
```
#### ZIP Archive Input
```
archive-directory/
โโโ project.zip # Original archive
โโโ project-deobfuscated/ # Extracted and processed
โโโ (same structure as directory input)
```
## ๐ฐ Cost Estimation & Planning
### ๐งฎ Built-in Cost Calculator
Reverse Machine includes a comprehensive cost estimation feature to help you plan your deobfuscation budget **before** spending money:
```bash
# Estimate costs for any input without processing
reverse-machine openai --cost script.min.js
reverse-machine gemini --cost ./my-project
reverse-machine anthropic --cost project.zip
# Combine with model selection for accurate estimates
reverse-machine openai --cost --model="gpt-4o-mini" large-project.zip
reverse-machine anthropic --cost --model="claude-4-opus-20250514-reasoning" complex-code.js
```
### ๐ Cost Estimation Features
The `--cost` flag provides detailed cost breakdowns including:
- **๐ File Discovery**: Automatically scans directories and ZIP archives
- **๐ Token Estimation**: Conservative estimates based on file sizes and minification
- **๐ก Processing Mode Analysis**: Separate estimates for basic vs advanced processing
- **๐ฐ Multi-Model Comparison**: Compare costs across different AI providers
- **โ ๏ธ Reality Warnings**: Alerts about potential cost overruns and exponential scaling
### ๐ฏ Example Cost Estimation Output
```bash
$ reverse-machine openai --cost ./typescript-project
Cost Estimation Report
======================
๐ Discovery Results:
- Total files found: 42 JavaScript/TypeScript files
- Total size: 2.8 MB
- Estimated tokens: ~1,120,000 tokens
๐ฐ OpenAI GPT-4o Cost Estimates:
- Basic Processing: $28.00
- Advanced Processing: $156.80
โ ๏ธ Important Warnings:
- Advanced mode may cost 2-5x more than estimates
- Large files (>100KB) have exponential cost scaling
- Consider using budget models for initial testing
๐ก Recommendations:
- Try gpt-4o-mini first ($1.68 estimated)
- Use basic mode for 99% of use cases
- Test with small samples before full processing
```
### ๐ธ Real-World Cost Accuracy
**โ ๏ธ IMPORTANT**: Cost estimates are **conservative approximations** and real costs may be **2-5x higher** due to:
- **Multi-phase processing**: Advanced mode uses exponential token growth
- **Context accumulation**: Each phase builds on previous results
- **Retry mechanisms**: Failed attempts still consume tokens
- **Model-specific overhead**: Provider-specific reasoning and safety checks
**Recommendation**: Always start with the smallest possible test to validate costs.
### ๐ท๏ธ Current AI Model Pricing (2025)
#### OpenAI Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|-------|----------------------|------------------------|----------|
| GPT-4o | $2.50 | $10.00 | Production quality |
| GPT-4.1 | $2.00 | $8.00 | Balanced performance |
| GPT-4o-mini | $0.15 | $0.60 | **Budget-friendly** |
| o1-mini | $3.00 | $12.00 | Complex reasoning |
| o3-mini | $15.00 | $60.00 | Advanced reasoning |
#### Anthropic Claude Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|-------|----------------------|------------------------|----------|
| Claude 4 Opus | $15.00 | $75.00 | Maximum accuracy |
| Claude 4 Sonnet | $3.00 | $15.00 | **Balanced choice** |
| Claude 3.5 Sonnet | $3.00 | $15.00 | Proven reliability |
| Claude 3.5 Haiku | $0.80 | $4.00 | Fast processing |
#### Google Gemini Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|-------|----------------------|------------------------|----------|
| Gemini 2.5 Pro | $1.25 | $10.00 | High performance |
| Gemini 2.5 Flash | $0.30 | $2.50 | **Most affordable** |
| Gemini 1.5 Pro | $1.25 | $5.00 | Legacy compatibility |
| Gemini 1.5 Flash | $0.075 | $0.30 | Ultra budget |
### ๐ก Cost Optimization Strategies
#### ๐ฏ Choose the Right Model
```bash
# For budget-conscious projects
reverse-machine openai --model="gpt-4o-mini" project.zip # ~$5-15 typical
reverse-machine gemini --model="gemini-2.5-flash" project.zip # ~$3-10 typical
# For production quality
reverse-machine anthropic --model="claude-4-sonnet-20250514" project.zip # ~$20-60 typical
reverse-machine openai --model="gpt-4o" project.zip # ~$25-75 typical
```
#### ๐ Start Small, Scale Up
```bash
# 1. Test with cost estimation first
reverse-machine openai --cost ./large-project
# 2. Process a small sample
reverse-machine openai ./large-project/src/single-file.js
# 3. If satisfied, process incrementally
reverse-machine openai ./large-project/src/ # Just src folder
reverse-machine openai ./large-project # Full project
```
#### โก Use Basic Mode by Default
- **Basic processing**: Single-pass renaming (recommended for 99% of use cases)
- **Advanced processing**: Multi-agent analysis (only for critical production code)
```bash
# Basic mode (default) - cost-effective
reverse-machine openai script.min.js
# Advanced mode - expensive but thorough
reverse-machine openai --advanced script.min.js
```
### ๐ Scaling Cost Examples
Real-world cost examples based on project sizes:
| Project Size | Files | Estimated Cost (Budget) | Estimated Cost (Premium) | Reality Check |
|--------------|-------|------------------------|--------------------------|---------------|
| **Small** (1-5 files, <1MB) | 3 | $2-5 | $10-25 | Usually accurate |
| **Medium** (10-50 files, 1-10MB) | 25 | $15-40 | $75-200 | May be 2x higher |
| **Large** (50+ files, 10MB+) | 100 | $60-150 | $300-800 | Often 3-5x higher |
**Budget Models**: OpenAI GPT-4o-mini, Gemini 2.5 Flash
**Premium Models**: Claude 4 Opus, OpenAI o3-mini
### ๐จ Cost Safety Tips
1. **Always estimate first**: Use `--cost` before processing
2. **Set spending limits**: Configure API billing limits
3. **Test incrementally**: Start with single files
4. **Monitor spending**: Check API usage dashboards regularly
5. **Use budget models**: Start with cheaper options for experimentation
## ๐ง Technical Architecture
### Core Technologies
- **AST Processing**: Babel ecosystem for semantic-preserving transformations
- **Bundle Analysis**: WebCrack for webpack bundle extraction
- **AI Integration**: OpenAI API, Google Generative AI, Anthropic SDK
- **Language Models**: GPT-4o, Claude-4 (with reasoning), Claude-3.5, Gemini-1.5
- **Performance**: Parallel processing and concurrent file handling
### Babel Transformations
Reverse Machine includes custom Babel plugins for:
- Converting `void 0` โ `undefined`
- Normalizing comparison operators (`5 === x` โ `x === 5`)
- Expanding scientific notation (`5e3` โ `5000`)
- Code beautification and structure improvement
### AI Prompt Engineering
The tool uses sophisticated prompting strategies:
- **Context-aware analysis**: Provides surrounding code context to LLMs
- **Incremental processing**: Processes variables in scope-aware batches
- **Conflict resolution**: Automatically handles naming conflicts
- **Structured output**: Uses JSON formatting for consistent AI responses
## ๐งช Testing & Quality Assurance
Reverse Machine includes comprehensive test suites:
```bash
# Run all tests
npm test
# Run specific test types
npm run test:unit # Unit tests
npm run test:e2e # End-to-end tests
npm run test:llm # LLM integration tests
npm run test:openai # OpenAI API tests
npm run test:gemini # Gemini API tests
```
## ๐ค Contributing
We welcome contributions! The codebase is designed for maintainability:
### Development Setup
```bash
# Clone and setup
git clone https://github.com/yourusername/reverse-machine.git
cd reverse-machine
npm install
# Development commands
npm run start # Run from source
npm run build # Build for distribution
npm run lint # Code quality checks
```
### Project Structure
```
src/
โโโ commands/ # CLI command implementations
โโโ plugins/ # Processing plugins (Babel, LLM, etc.)
โโโ security/ # Input validation and security
โโโ test/ # Test suites
โโโ babel-utils.ts # AST transformation utilities
โโโ input-handler.ts # Multi-input type handling (files/dirs/zip)
โโโ unminify.ts # Legacy processing pipeline
โโโ unminify-enhanced.ts # Enhanced processing pipeline
```
## ๐ License
This project is licensed under the [MIT License](LICENSE) - see the LICENSE file for details.
## ๐ Acknowledgments
- **WebCrack** team for bundle extraction capabilities
- **Babel** ecosystem for AST transformations
- **OpenAI**, **Google**, and **Anthropic** for AI APIs
- **Open source community** for the foundational tools
## ๐ Support & Community
- **Issues**: [GitHub Issues](https://github.com/mariolqn/reverse-machine/issues)
- **Discussions**: [GitHub Discussions](https://github.com/mariolqn/reverse-machine/discussions)
- **Blog**: [Introduction Blog Post](https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification)
---
<div align="center">
**Made with โค๏ธ for the reverse engineering community**
[โญ Star on GitHub](https://github.com/mariolqn/reverse-machine) โข [๐ฆ npm Package](https://www.npmjs.com/package/reverse-machine) โข [๐ Documentation](https://github.com/mariolqn/reverse-machine/wiki)
</div>