secure-scan-js

# Enhanced Secret Detection System This document describes the enhanced secret detection capabilities that have been implemented to improve accuracy and coverage for detecting secrets in code repositories. ## Overview The enhanced detection system addresses several key gaps found in realistic secret scenarios: 1. **Environment Variable Fallbacks** - Secrets in `os.getenv()` fallback values 2. **String Concatenation** - Secrets formed by combining string parts 3. **Comment Secrets** - Secrets accidentally left in code comments 4. **Base64 Encoded Secrets** - Secrets that are base64 encoded 5. **Generic Variable Names** - Secrets with non-obvious variable names 6. **Multi-Language Support** - Language-specific detection patterns ## Architecture The enhanced detection system consists of several interconnected components: ``` CustomHeuristicDetector (Main Entry Point) ├── MultiLanguageSecretDetector (Language-aware detection) │ └── EnhancedSecretDetector (Complex pattern detection) │ └── SecretClassifier (Pattern classification) ├── AdvancedSecretAnalyzer (Entropy and validation analysis) └── Original extraction methods (Fallback detection) ``` ## Components ### 1. EnhancedSecretDetector **File**: `wasm-version/src/python/enhanced_detector.py` Handles complex secret detection scenarios: - **Environment Fallbacks**: Detects secrets in `os.getenv("VAR", "fallback_secret")` - **String Concatenation**: Detects secrets formed by `part1 + part2` - **Comment Analysis**: Extracts secrets from code comments - **Base64 Decoding**: Automatically decodes and analyzes base64 strings - **Generic Variables**: Detects secrets in variables like `config`, `data`, etc. - **Function Parameters**: Analyzes function calls for secret parameters ### 2. MultiLanguageSecretDetector **File**: `wasm-version/src/python/multi_language_detector.py` Provides language-specific detection patterns for: - **Python** (`.py`, `.pyw`) - **JavaScript/TypeScript** (`.js`, `.jsx`, `.ts`, `.tsx`, `.mjs`) - **Java** (`.java`) - **C#** (`.cs`) - **Go** (`.go`) - **Rust** (`.rs`) - **PHP** (`.php`) - **Ruby** (`.rb`) - **YAML** (`.yml`, `.yaml`) - **JSON** (`.json`) Each language has specific patterns for: - Variable assignments - Environment variable access - Function calls - Comments - String literals ### 3. Enhanced SecretClassifier **File**: `wasm-version/src/python/secret_patterns.py` Enhanced pattern classification with: - **Context-Aware Classification**: Uses variable names and context - **Enhanced Patterns**: Additional patterns for edge cases - **Fallback Classification**: Better handling of unknown patterns - **Environment Fallback Detection**: Special handling for env fallbacks ### 4. AdvancedSecretAnalyzer **File**: `wasm-version/src/python/advanced_analyzer.py` Provides advanced analysis using: - **Multiple Entropy Algorithms**: Shannon, character frequency, n-gram - **Pattern Validation**: Format validation for known secret types - **Context Analysis**: Surrounding code analysis - **Bayesian Confidence**: Statistical confidence scoring ## Detection Examples ### Environment Variable Fallbacks **Before**: Not detected ```python api_key = os.getenv("STRIPE_KEY", "sk_test_fallbackFakeKey123") ``` **After**: ✅ Detected as "Stripe Test Key (Environment Fallback)" ### String Concatenation **Before**: Not detected ```python part1 = "pk_test_" part2 = "abcdEfGhIjKlMnOpQrStUvWxYz" stripe_key = part1 + part2 ``` **After**: ✅ Detected as "Stripe Publishable Key (String Concatenation)" ### Comment Secrets **Before**: Not detected ```python # Debug: token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.abc.def" ``` **After**: ✅ Detected as "JWT Token (In Comment)" ### Base64 Encoded Secrets **Before**: Not detected ```python secret_b64 = "QUtJQUlPU0ZPRE5ON0VYQU1QTEU=" # Decodes to AWS access key ``` **After**: ✅ Detected as "AWS Access Key ID (Base64 Encoded)" ### Generic Variable Names **Before**: Not detected ```python config = { "key": "AIzaFakeEmbeddedKeyValue12345678" } ``` **After**: ✅ Detected as "Google API Key (Generic Variable)" ## Multi-Language Examples ### JavaScript Environment Fallbacks ```javascript const apiKey = process.env.STRIPE_KEY || "sk_test_fallbackKey123"; ``` ### Java System Properties ```java String dbPassword = System.getProperty("db.password", "defaultSecret123"); ``` ### C# Configuration ```csharp string connectionString = Environment.GetEnvironmentVariable("DB_CONN") ?? "Server=localhost;Password=secret123"; ``` ### Go Environment Variables ```go apiKey := os.Getenv("API_KEY") if apiKey == "" { apiKey = "default_secret_key_123" } ``` ## Configuration ### Detection Thresholds - **High Confidence**: 0.8+ (Specific patterns like `ghp_`, `sk_live_`) - **Medium Confidence**: 0.6-0.8 (Generic patterns with context) - **Low Confidence**: 0.4-0.6 (High entropy with weak context) - **Info**: 0.3-0.4 (Potential secrets for review) ### Entropy Thresholds - **Shannon Entropy**: > 3.5 for potential secrets - **High Entropy**: > 4.5 for strong indicators - **Normalized Entropy**: > 0.6 for randomness detection ## Testing Run the enhanced detection test: ```bash python test_enhanced_detection.py ``` This will test the detection on `python/realistic_secrets_example.py` and show: - Total secrets detected - Detection by line number - Specific test case results - Individual pattern testing ## Performance The enhanced detection system is designed to be: - **Efficient**: Parallel detection methods with early termination - **Accurate**: Multiple validation layers reduce false positives - **Comprehensive**: Language-aware patterns increase coverage - **Scalable**: Modular design allows easy extension ## Future Enhancements Potential areas for improvement: 1. **Machine Learning Integration**: Train models on secret patterns 2. **Context Window Expansion**: Analyze larger code contexts 3. **Cross-File Analysis**: Detect secrets split across files 4. **API Validation**: Real-time validation of detected secrets 5. **Custom Pattern Support**: User-defined secret patterns ## Integration The enhanced detection is automatically integrated into the existing scanning pipeline through the `CustomHeuristicDetector` class. No changes are required to existing code that uses the scanner. ## Troubleshooting ### Common Issues 1. **Import Errors**: Ensure all files are in the correct directory structure 2. **False Positives**: Adjust confidence thresholds in detector configuration 3. **Missed Secrets**: Add new patterns to the appropriate detector class 4. **Performance**: Enable logging to monitor detection performance ### Debug Mode Enable detailed logging by setting `log_enabled = True` in the detector classes to see: - Detection method used - Confidence scores - Pattern matches - Context analysis results