miniml

# SQL Injection Attack Prevention Plan ## Overview This plan outlines the implementation of comprehensive SQL injection protection for MiniML's `where` and `having` clause handling using AST-based validation with the node-sql-parser library. ## Current Security Risk Assessment ### Vulnerable Code Locations - `lib/query.ts:69` - WHERE clause injection: `where_clause.push(\`(\${expandFilterReferences(where, model.dimensions, dimensions)})\`)` - `lib/query.ts:82` - HAVING clause injection: `query.push(\`HAVING \${expandFilterReferences(having, model.measures, measures)}\`)` ### Risk Analysis - **High Risk**: Direct string interpolation allows arbitrary SQL injection - **Attack Vectors**: Malicious WHERE/HAVING expressions can execute unauthorized SQL - **Current Protection**: Only `expandFilterReferences()` - insufficient for security - **Impact**: Potential data breach, unauthorized access, data modification ## Implementation Plan ### Phase 1: Core SQL Validation Module #### 1.1 Create `lib/validation.ts` **Primary Functions:** ```typescript export interface ValidationResult { isValid: boolean; errors: string[]; warnings: string[]; } export function validateSqlExpression( expression: string, dialect: 'bigquery' | 'snowflake', model: MinimlModel ): ValidationResult export function validateWhereClause( where: string, model: MinimlModel ): ValidationResult export function validateHavingClause( having: string, model: MinimlModel ): ValidationResult ``` **Core Validation Logic:** - Parse SQL expressions using node-sql-parser - Generate AST for security analysis - Validate against allowlist of safe SQL constructs - Check column references against model schema #### 1.2 AST-Based Safety Checks (Phase 1 - Simple) **Allowed SQL Constructs:** - Comparison operators: `=`, `!=`, `<>`, `>`, `<`, `>=`, `<=` - Logical operators: `AND`, `OR`, `NOT` - Parentheses for grouping: `(`, `)` - Basic string functions: `UPPER()`, `LOWER()`, `TRIM()` - Null checks: `IS NULL`, `IS NOT NULL` - Pattern matching: `LIKE`, `ILIKE` (with literal patterns only) - List membership: `IN` (with literal values only) **Explicitly Disallowed:** - Subqueries: `SELECT`, `EXISTS`, `ANY`, `ALL` - DDL statements: `DROP`, `ALTER`, `CREATE`, `TRUNCATE` - DML statements: `INSERT`, `UPDATE`, `DELETE` - Schema operations: `DESCRIBE`, `SHOW`, `EXPLAIN` - System functions: `SYSTEM()`, `EXEC()`, etc. - Dynamic SQL construction: `CONCAT()` in suspicious contexts - Comments: `--`, `/* */`, `#` #### 1.3 Integration with Query Generation **Modify `lib/query.ts`:** ```typescript // Line 69 - WHERE clause validation if (where) { const validation = validateWhereClause(where, model); if (!validation.isValid) { throw new SqlValidationError(`Invalid WHERE clause: ${validation.errors.join(', ')}`); } where_clause.push(`(${expandFilterReferences(where, model.dimensions, dimensions)})`); } // Line 82 - HAVING clause validation if (having) { const validation = validateHavingClause(having, model); if (!validation.isValid) { throw new SqlValidationError(`Invalid HAVING clause: ${validation.errors.join(', ')}`); } query.push(`HAVING ${expandFilterReferences(having, model.measures, measures)}`); } ``` ### Phase 2: Enhanced Security Features #### 2.1 Advanced AST Analysis **Complexity Limits:** - Maximum AST depth: 10 levels - Maximum number of nodes: 100 - Maximum expression length: 1000 characters - Maximum number of OR conditions: 20 **Function Whitelist per Dialect:** ```typescript const SAFE_FUNCTIONS = { bigquery: [ 'UPPER', 'LOWER', 'TRIM', 'LENGTH', 'SUBSTR', 'DATE', 'TIMESTAMP', 'EXTRACT', 'DATE_TRUNC', 'COALESCE', 'IFNULL', 'SAFE_CAST' ], snowflake: [ 'UPPER', 'LOWER', 'TRIM', 'LENGTH', 'SUBSTRING', 'TO_DATE', 'TO_TIMESTAMP', 'EXTRACT', 'DATE_TRUNC', 'COALESCE', 'NVL', 'TRY_CAST' ] }; ``` #### 2.2 Column Reference Validation **Model-Aware Validation:** - Verify column references exist in model dimensions/measures - Validate data types for comparison operations - Check join requirements for cross-table references - Ensure aggregation context for measure references #### 2.3 Value Sanitization **Literal Value Checks:** - String literal validation (escape sequences, length limits) - Numeric value bounds checking - Date format validation - Pattern injection prevention in LIKE clauses ### Phase 3: Error Handling & User Experience #### 3.1 Custom Error Types ```typescript export class SqlValidationError extends Error { constructor( message: string, public violations: string[], public suggestions?: string[] ) { super(message); this.name = 'SqlValidationError'; } } export class UnsafeConstructError extends SqlValidationError {} export class UnknownColumnError extends SqlValidationError {} export class ComplexityLimitError extends SqlValidationError {} ``` #### 3.2 Helpful Error Messages **Examples:** - `"Column 'user_id' not found. Available dimensions: account_name, date, category_name"` - `"Subqueries are not allowed in WHERE clauses. Use simple comparisons instead."` - `"Expression too complex (45 nodes). Simplify by breaking into multiple filters."` #### 3.3 Security Logging **Monitoring Capabilities:** - Log blocked injection attempts - Track validation performance - Monitor false positive rates - Alert on suspicious patterns ### Phase 4: Testing & Documentation #### 4.1 Comprehensive Test Suite **Security Tests:** ```typescript // test/validation.security.test.ts describe('SQL Injection Prevention', () => { it('blocks classic injection attempts', () => { const malicious = "1=1; DROP TABLE users; --"; expect(() => validateWhereClause(malicious, model)) .to.throw(SqlValidationError); }); it('prevents data exfiltration attempts', () => { const malicious = "1=1 UNION SELECT password FROM users"; expect(() => validateWhereClause(malicious, model)) .to.throw(UnsafeConstructError); }); }); ``` **Integration Tests:** - Test with existing query generation - Verify performance impact - Test all supported SQL constructs - Cross-dialect compatibility #### 4.2 Documentation Updates **README.md Security Section:** ```markdown ## Security MiniML includes comprehensive SQL injection protection for WHERE and HAVING clauses: - **AST-based validation**: All user-provided SQL expressions are parsed and validated - **Allowlist approach**: Only safe SQL constructs are permitted - **Model-aware**: Column references are validated against your model schema - **Dialect-specific**: Validation rules adapt to BigQuery/Snowflake syntax ### Safe Expression Examples - `account_name = 'Acme Corp'` - `date >= '2024-01-01' AND category_name LIKE 'Electronics%'` - `revenue > 1000 OR quantity IS NOT NULL` ### Blocked Constructs - Subqueries, DDL/DML statements, system functions - Comments, dynamic SQL construction - Unauthorized column references ``` ## Implementation Timeline ### Week 1: Foundation - [ ] Create `lib/validation.ts` with basic AST validation - [ ] Implement core safety checks - [ ] Add error types and handling ### Week 2: Integration - [ ] Modify `lib/query.ts` to use validation - [ ] Add comprehensive test suite - [ ] Performance testing and optimization ### Week 3: Enhancement - [ ] Advanced security features - [ ] Dialect-specific function whitelists - [ ] Security logging and monitoring ### Week 4: Documentation & Polish - [ ] Update README.md with security section - [ ] Create user guide for safe expressions - [ ] Final testing and code review ## Configuration Options **Model-level Security Settings:** ```yaml # model.yaml security: validation_level: strict # strict | moderate | permissive max_expression_complexity: 50 allowed_functions: - UPPER - LOWER - DATE_TRUNC log_blocked_attempts: true ``` ## Migration Strategy 1. **Backward Compatibility**: Existing valid expressions continue working 2. **Graceful Degradation**: Clear error messages for invalid expressions 3. **Opt-in Strictness**: Start with moderate validation, allow stricter modes 4. **Documentation**: Comprehensive examples of safe vs unsafe patterns ## Success Metrics - **Security**: Zero successful injection attacks in testing - **Usability**: < 5% false positive rate for legitimate expressions - **Performance**: < 10ms validation overhead per query - **Adoption**: Clear migration path for existing users ## Risk Mitigation - **False Positives**: Comprehensive testing with real-world expressions - **Performance Impact**: Efficient AST parsing with caching - **User Confusion**: Detailed documentation and error messages - **Maintenance Burden**: Automated security testing in CI/CD This plan provides a robust foundation for SQL injection prevention while maintaining the flexibility and ease of use that makes MiniML valuable for data modeling.