mcp-sanitizer

# Security Documentation ## Overview MCP Sanitizer provides comprehensive, defense-in-depth protection against common web application attack vectors through multi-layered validation, advanced Unicode normalization, and context-aware sanitization. ## Security Features ### Core Protection Layers 1. **Multi-layered Validation** - Command injection prevention using shell-quote - SQL injection protection via sqlstring - XSS prevention through escape-html - Path traversal blocking with path-is-inside - NoSQL injection detection for MongoDB operators 2. **Advanced Unicode & Encoding Defense** - Homograph attack prevention (Cyrillic, Greek, mathematical symbols) - Multi-pass normalization (handles nested encoding) - Zero-width character detection and removal - Directional override character blocking (RLO/LRO) - Full-width character normalization 3. **Context-Aware Sanitization** - Different validation rules for file paths, URLs, commands, and SQL - Protocol-specific validation (HTTP/HTTPS/FTP/MCP) - Database-specific SQL injection detection (PostgreSQL, MySQL, MSSQL, Oracle) ## Security Enhancements ### 1. Directional Override Detection Prevents Unicode text direction manipulation attacks commonly used in file name spoofing. ```javascript // Attack: "invoice.doc" that's actually "invoice<RLO>cod.exe" const result = detectDirectionalOverrides('invoice\u202Ecod.exe'); // Detected and sanitized ``` ### 2. Null Byte Protection Prevents C-style string termination attacks that can bypass validation. ```javascript // Attack: Bypass extension checks with null bytes const result = sanitize('malicious.php\0.txt'); // Null bytes stripped, attack prevented ``` ### 3. Multi-Layer Encoding Defense Handles double and triple encoding attempts. ```javascript // Attack: Double URL encoded path traversal const result = sanitize('%252e%252e%252f'); // Decoded through multiple passes, attack blocked ``` ### 4. Database-Specific SQL Protection Detects database-specific injection patterns: - PostgreSQL dollar quotes: `$$; DROP TABLE users; --$$` - MySQL backticks: `` `users`; DROP TABLE `accounts` `` - MSSQL Unicode strings: `N'; DROP TABLE users; --'` - Oracle alternative quoting: `q'['; DROP TABLE users; --]'` ### 5. Homograph Attack Prevention Normalizes look-alike characters used in phishing: - Cyrillic 'а' (U+0430) → Latin 'a' (U+0061) - Greek 'ο' (U+03BF) → Latin 'o' (U+006F) - Mathematical '𝒂' (U+1D482) → Latin 'a' (U+0061) ## Security Policies | Policy | Use Case | String Limit | Protocols | Blocking Level | |--------|----------|--------------|-----------|----------------| | **STRICT** | Untrusted input | 1,000 chars | HTTPS only | Medium+ | | **MODERATE** | Balanced security | 5,000 chars | HTTP/HTTPS/MCP | High+ | | **PERMISSIVE** | Trusted environments | 50,000 chars | All | Critical only | | **DEVELOPMENT** | Development/debugging | 20,000 chars | HTTP/HTTPS/MCP/File | High+ | | **PRODUCTION** | Production systems | 8,000 chars | HTTPS/MCP | High+ | ## Security Philosophy ### What We Provide - **Comprehensive protection** against known attack vectors - **Defense-in-depth** with multiple validation layers - **Regular updates** based on emerging threats - **Extensive test coverage** with 500+ security tests - **Performance-optimized** for production use (<10ms latency) ### What We Don't Claim - We do **NOT** claim 100% protection against all attacks - Zero-day vulnerabilities may exist - New attack vectors emerge constantly - Security is a continuous process, not a destination ### Best Practices 1. **Always use the strictest policy** appropriate for your use case 2. **Keep the library updated** to get latest security patches 3. **Implement defense-in-depth** - don't rely on a single security layer 4. **Monitor and log** blocked attempts for security analysis 5. **Report vulnerabilities** via GitHub Security Advisories ## Threat Model ### Protected Against - Command injection (shell commands) - SQL injection (all major databases) - NoSQL injection (MongoDB operators) - Cross-site scripting (XSS) - Path traversal attacks - Prototype pollution - Template injection - Unicode-based bypasses - Encoding-based evasion - Homograph/phishing attacks ### Assumptions - Input is untrusted by default - Attackers may use sophisticated encoding/obfuscation - Multiple attack vectors may be combined - Timing attacks are not a primary concern for middleware ### Out of Scope - DDoS protection (infrastructure concern) - Rate limiting (application layer) - Authentication/authorization (application logic) - Business logic vulnerabilities - Side-channel attacks beyond basic timing ## Testing The library includes comprehensive test coverage: - **517 security tests** covering all major attack vectors - **Unicode security tests** for homograph attacks - **Database-specific tests** for SQL injection variants - **Performance benchmarks** ensuring <10ms latency - **Regression tests** for all fixed vulnerabilities ## Responsible Disclosure We take security seriously and appreciate responsible disclosure of vulnerabilities. ### Reporting Security Issues 1. **DO NOT** create public issues for security vulnerabilities 2. Use GitHub Security Advisories or email security contact 3. Include proof-of-concept if possible 4. Allow reasonable time for fixes before public disclosure ### Response Timeline - **Initial response**: Within 48 hours - **Confirmation**: Within 7 days - **Fix timeline**: Based on severity (Critical: 7 days, High: 14 days, Medium: 30 days) - **Credit**: Security researchers will be credited (unless they prefer anonymity) ## Version History ### Current Version - Comprehensive Unicode protection - NoSQL injection detection - Multi-layer encoding defense - Database-specific SQL protection ### Security Improvements We continuously improve security based on: - Emerging attack vectors - Security research findings - Community feedback - Penetration testing results ## License MIT License - See LICENSE file for details ## Disclaimer This software is provided "as is" without warranty of any kind. While we strive to provide comprehensive security, no system is perfectly secure. Users should implement appropriate additional security measures based on their specific requirements and threat model.