secure-scan-js
Version:
A JavaScript implementation of Yelp's detect-secrets tool - no Python required
246 lines (165 loc) • 6.91 kB
Markdown
# Enhanced Secret Detection System
This document describes the enhanced secret detection capabilities that have been implemented to improve accuracy and coverage for detecting secrets in code repositories.
## Overview
The enhanced detection system addresses several key gaps found in realistic secret scenarios:
1. **Environment Variable Fallbacks** - Secrets in `os.getenv()` fallback values
2. **String Concatenation** - Secrets formed by combining string parts
3. **Comment Secrets** - Secrets accidentally left in code comments
4. **Base64 Encoded Secrets** - Secrets that are base64 encoded
5. **Generic Variable Names** - Secrets with non-obvious variable names
6. **Multi-Language Support** - Language-specific detection patterns
## Architecture
The enhanced detection system consists of several interconnected components:
```
CustomHeuristicDetector (Main Entry Point)
├── MultiLanguageSecretDetector (Language-aware detection)
│ └── EnhancedSecretDetector (Complex pattern detection)
│ └── SecretClassifier (Pattern classification)
├── AdvancedSecretAnalyzer (Entropy and validation analysis)
└── Original extraction methods (Fallback detection)
```
## Components
### 1. EnhancedSecretDetector
**File**: `wasm-version/src/python/enhanced_detector.py`
Handles complex secret detection scenarios:
- **Environment Fallbacks**: Detects secrets in `os.getenv("VAR", "fallback_secret")`
- **String Concatenation**: Detects secrets formed by `part1 + part2`
- **Comment Analysis**: Extracts secrets from code comments
- **Base64 Decoding**: Automatically decodes and analyzes base64 strings
- **Generic Variables**: Detects secrets in variables like `config`, `data`, etc.
- **Function Parameters**: Analyzes function calls for secret parameters
### 2. MultiLanguageSecretDetector
**File**: `wasm-version/src/python/multi_language_detector.py`
Provides language-specific detection patterns for:
- **Python** (`.py`, `.pyw`)
- **JavaScript/TypeScript** (`.js`, `.jsx`, `.ts`, `.tsx`, `.mjs`)
- **Java** (`.java`)
- **C#** (`.cs`)
- **Go** (`.go`)
- **Rust** (`.rs`)
- **PHP** (`.php`)
- **Ruby** (`.rb`)
- **YAML** (`.yml`, `.yaml`)
- **JSON** (`.json`)
Each language has specific patterns for:
- Variable assignments
- Environment variable access
- Function calls
- Comments
- String literals
### 3. Enhanced SecretClassifier
**File**: `wasm-version/src/python/secret_patterns.py`
Enhanced pattern classification with:
- **Context-Aware Classification**: Uses variable names and context
- **Enhanced Patterns**: Additional patterns for edge cases
- **Fallback Classification**: Better handling of unknown patterns
- **Environment Fallback Detection**: Special handling for env fallbacks
### 4. AdvancedSecretAnalyzer
**File**: `wasm-version/src/python/advanced_analyzer.py`
Provides advanced analysis using:
- **Multiple Entropy Algorithms**: Shannon, character frequency, n-gram
- **Pattern Validation**: Format validation for known secret types
- **Context Analysis**: Surrounding code analysis
- **Bayesian Confidence**: Statistical confidence scoring
## Detection Examples
### Environment Variable Fallbacks
**Before**: Not detected
```python
api_key = os.getenv("STRIPE_KEY", "sk_test_fallbackFakeKey123")
```
**After**: ✅ Detected as "Stripe Test Key (Environment Fallback)"
### String Concatenation
**Before**: Not detected
```python
part1 = "pk_test_"
part2 = "abcdEfGhIjKlMnOpQrStUvWxYz"
stripe_key = part1 + part2
```
**After**: ✅ Detected as "Stripe Publishable Key (String Concatenation)"
### Comment Secrets
**Before**: Not detected
```python
# Debug: token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.abc.def"
```
**After**: ✅ Detected as "JWT Token (In Comment)"
### Base64 Encoded Secrets
**Before**: Not detected
```python
secret_b64 = "QUtJQUlPU0ZPRE5ON0VYQU1QTEU=" # Decodes to AWS access key
```
**After**: ✅ Detected as "AWS Access Key ID (Base64 Encoded)"
### Generic Variable Names
**Before**: Not detected
```python
config = {
"key": "AIzaFakeEmbeddedKeyValue12345678"
}
```
**After**: ✅ Detected as "Google API Key (Generic Variable)"
## Multi-Language Examples
### JavaScript Environment Fallbacks
```javascript
const apiKey = process.env.STRIPE_KEY || "sk_test_fallbackKey123";
```
### Java System Properties
```java
String dbPassword = System.getProperty("db.password", "defaultSecret123");
```
### C# Configuration
```csharp
string connectionString = Environment.GetEnvironmentVariable("DB_CONN") ?? "Server=localhost;Password=secret123";
```
### Go Environment Variables
```go
apiKey := os.Getenv("API_KEY")
if apiKey == "" {
apiKey = "default_secret_key_123"
}
```
## Configuration
### Detection Thresholds
- **High Confidence**: 0.8+ (Specific patterns like `ghp_`, `sk_live_`)
- **Medium Confidence**: 0.6-0.8 (Generic patterns with context)
- **Low Confidence**: 0.4-0.6 (High entropy with weak context)
- **Info**: 0.3-0.4 (Potential secrets for review)
### Entropy Thresholds
- **Shannon Entropy**: > 3.5 for potential secrets
- **High Entropy**: > 4.5 for strong indicators
- **Normalized Entropy**: > 0.6 for randomness detection
## Testing
Run the enhanced detection test:
```bash
python test_enhanced_detection.py
```
This will test the detection on `python/realistic_secrets_example.py` and show:
- Total secrets detected
- Detection by line number
- Specific test case results
- Individual pattern testing
## Performance
The enhanced detection system is designed to be:
- **Efficient**: Parallel detection methods with early termination
- **Accurate**: Multiple validation layers reduce false positives
- **Comprehensive**: Language-aware patterns increase coverage
- **Scalable**: Modular design allows easy extension
## Future Enhancements
Potential areas for improvement:
1. **Machine Learning Integration**: Train models on secret patterns
2. **Context Window Expansion**: Analyze larger code contexts
3. **Cross-File Analysis**: Detect secrets split across files
4. **API Validation**: Real-time validation of detected secrets
5. **Custom Pattern Support**: User-defined secret patterns
## Integration
The enhanced detection is automatically integrated into the existing scanning pipeline through the `CustomHeuristicDetector` class. No changes are required to existing code that uses the scanner.
## Troubleshooting
### Common Issues
1. **Import Errors**: Ensure all files are in the correct directory structure
2. **False Positives**: Adjust confidence thresholds in detector configuration
3. **Missed Secrets**: Add new patterns to the appropriate detector class
4. **Performance**: Enable logging to monitor detection performance
### Debug Mode
Enable detailed logging by setting `log_enabled = True` in the detector classes to see:
- Detection method used
- Confidence scores
- Pattern matches
- Context analysis results