miniml
Version:
A minimal, embeddable semantic data modeling language for generating SQL queries from YAML model definitions. Inspired by LookML.
287 lines (227 loc) • 9.05 kB
Markdown
# SQL Injection Attack Prevention Plan
## Overview
This plan outlines the implementation of comprehensive SQL injection protection for MiniML's `where` and `having` clause handling using AST-based validation with the node-sql-parser library.
## Current Security Risk Assessment
### Vulnerable Code Locations
- `lib/query.ts:69` - WHERE clause injection: `where_clause.push(\`(\${expandFilterReferences(where, model.dimensions, dimensions)})\`)`
- `lib/query.ts:82` - HAVING clause injection: `query.push(\`HAVING \${expandFilterReferences(having, model.measures, measures)}\`)`
### Risk Analysis
- **High Risk**: Direct string interpolation allows arbitrary SQL injection
- **Attack Vectors**: Malicious WHERE/HAVING expressions can execute unauthorized SQL
- **Current Protection**: Only `expandFilterReferences()` - insufficient for security
- **Impact**: Potential data breach, unauthorized access, data modification
## Implementation Plan
### Phase 1: Core SQL Validation Module
#### 1.1 Create `lib/validation.ts`
**Primary Functions:**
```typescript
export interface ValidationResult {
isValid: boolean;
errors: string[];
warnings: string[];
}
export function validateSqlExpression(
expression: string,
dialect: 'bigquery' | 'snowflake',
model: MinimlModel
): ValidationResult
export function validateWhereClause(
where: string,
model: MinimlModel
): ValidationResult
export function validateHavingClause(
having: string,
model: MinimlModel
): ValidationResult
```
**Core Validation Logic:**
- Parse SQL expressions using node-sql-parser
- Generate AST for security analysis
- Validate against allowlist of safe SQL constructs
- Check column references against model schema
#### 1.2 AST-Based Safety Checks (Phase 1 - Simple)
**Allowed SQL Constructs:**
- Comparison operators: `=`, `!=`, `<>`, `>`, `<`, `>=`, `<=`
- Logical operators: `AND`, `OR`, `NOT`
- Parentheses for grouping: `(`, `)`
- Basic string functions: `UPPER()`, `LOWER()`, `TRIM()`
- Null checks: `IS NULL`, `IS NOT NULL`
- Pattern matching: `LIKE`, `ILIKE` (with literal patterns only)
- List membership: `IN` (with literal values only)
**Explicitly Disallowed:**
- Subqueries: `SELECT`, `EXISTS`, `ANY`, `ALL`
- DDL statements: `DROP`, `ALTER`, `CREATE`, `TRUNCATE`
- DML statements: `INSERT`, `UPDATE`, `DELETE`
- Schema operations: `DESCRIBE`, `SHOW`, `EXPLAIN`
- System functions: `SYSTEM()`, `EXEC()`, etc.
- Dynamic SQL construction: `CONCAT()` in suspicious contexts
- Comments: `--`, `/* */`, `#`
#### 1.3 Integration with Query Generation
**Modify `lib/query.ts`:**
```typescript
// Line 69 - WHERE clause validation
if (where) {
const validation = validateWhereClause(where, model);
if (!validation.isValid) {
throw new SqlValidationError(`Invalid WHERE clause: ${validation.errors.join(', ')}`);
}
where_clause.push(`(${expandFilterReferences(where, model.dimensions, dimensions)})`);
}
// Line 82 - HAVING clause validation
if (having) {
const validation = validateHavingClause(having, model);
if (!validation.isValid) {
throw new SqlValidationError(`Invalid HAVING clause: ${validation.errors.join(', ')}`);
}
query.push(`HAVING ${expandFilterReferences(having, model.measures, measures)}`);
}
```
### Phase 2: Enhanced Security Features
#### 2.1 Advanced AST Analysis
**Complexity Limits:**
- Maximum AST depth: 10 levels
- Maximum number of nodes: 100
- Maximum expression length: 1000 characters
- Maximum number of OR conditions: 20
**Function Whitelist per Dialect:**
```typescript
const SAFE_FUNCTIONS = {
bigquery: [
'UPPER', 'LOWER', 'TRIM', 'LENGTH', 'SUBSTR',
'DATE', 'TIMESTAMP', 'EXTRACT', 'DATE_TRUNC',
'COALESCE', 'IFNULL', 'SAFE_CAST'
],
snowflake: [
'UPPER', 'LOWER', 'TRIM', 'LENGTH', 'SUBSTRING',
'TO_DATE', 'TO_TIMESTAMP', 'EXTRACT', 'DATE_TRUNC',
'COALESCE', 'NVL', 'TRY_CAST'
]
};
```
#### 2.2 Column Reference Validation
**Model-Aware Validation:**
- Verify column references exist in model dimensions/measures
- Validate data types for comparison operations
- Check join requirements for cross-table references
- Ensure aggregation context for measure references
#### 2.3 Value Sanitization
**Literal Value Checks:**
- String literal validation (escape sequences, length limits)
- Numeric value bounds checking
- Date format validation
- Pattern injection prevention in LIKE clauses
### Phase 3: Error Handling & User Experience
#### 3.1 Custom Error Types
```typescript
export class SqlValidationError extends Error {
constructor(
message: string,
public violations: string[],
public suggestions?: string[]
) {
super(message);
this.name = 'SqlValidationError';
}
}
export class UnsafeConstructError extends SqlValidationError {}
export class UnknownColumnError extends SqlValidationError {}
export class ComplexityLimitError extends SqlValidationError {}
```
#### 3.2 Helpful Error Messages
**Examples:**
- `"Column 'user_id' not found. Available dimensions: account_name, date, category_name"`
- `"Subqueries are not allowed in WHERE clauses. Use simple comparisons instead."`
- `"Expression too complex (45 nodes). Simplify by breaking into multiple filters."`
#### 3.3 Security Logging
**Monitoring Capabilities:**
- Log blocked injection attempts
- Track validation performance
- Monitor false positive rates
- Alert on suspicious patterns
### Phase 4: Testing & Documentation
#### 4.1 Comprehensive Test Suite
**Security Tests:**
```typescript
// test/validation.security.test.ts
describe('SQL Injection Prevention', () => {
it('blocks classic injection attempts', () => {
const malicious = "1=1; DROP TABLE users; --";
expect(() => validateWhereClause(malicious, model))
.to.throw(SqlValidationError);
});
it('prevents data exfiltration attempts', () => {
const malicious = "1=1 UNION SELECT password FROM users";
expect(() => validateWhereClause(malicious, model))
.to.throw(UnsafeConstructError);
});
});
```
**Integration Tests:**
- Test with existing query generation
- Verify performance impact
- Test all supported SQL constructs
- Cross-dialect compatibility
#### 4.2 Documentation Updates
**README.md Security Section:**
```markdown
## Security
MiniML includes comprehensive SQL injection protection for WHERE and HAVING clauses:
- **AST-based validation**: All user-provided SQL expressions are parsed and validated
- **Allowlist approach**: Only safe SQL constructs are permitted
- **Model-aware**: Column references are validated against your model schema
- **Dialect-specific**: Validation rules adapt to BigQuery/Snowflake syntax
### Safe Expression Examples
- `account_name = 'Acme Corp'`
- `date >= '2024-01-01' AND category_name LIKE 'Electronics%'`
- `revenue > 1000 OR quantity IS NOT NULL`
### Blocked Constructs
- Subqueries, DDL/DML statements, system functions
- Comments, dynamic SQL construction
- Unauthorized column references
```
## Implementation Timeline
### Week 1: Foundation
- [ ] Create `lib/validation.ts` with basic AST validation
- [ ] Implement core safety checks
- [ ] Add error types and handling
### Week 2: Integration
- [ ] Modify `lib/query.ts` to use validation
- [ ] Add comprehensive test suite
- [ ] Performance testing and optimization
### Week 3: Enhancement
- [ ] Advanced security features
- [ ] Dialect-specific function whitelists
- [ ] Security logging and monitoring
### Week 4: Documentation & Polish
- [ ] Update README.md with security section
- [ ] Create user guide for safe expressions
- [ ] Final testing and code review
## Configuration Options
**Model-level Security Settings:**
```yaml
# model.yaml
security:
validation_level: strict # strict | moderate | permissive
max_expression_complexity: 50
allowed_functions:
- UPPER
- LOWER
- DATE_TRUNC
log_blocked_attempts: true
```
## Migration Strategy
1. **Backward Compatibility**: Existing valid expressions continue working
2. **Graceful Degradation**: Clear error messages for invalid expressions
3. **Opt-in Strictness**: Start with moderate validation, allow stricter modes
4. **Documentation**: Comprehensive examples of safe vs unsafe patterns
## Success Metrics
- **Security**: Zero successful injection attacks in testing
- **Usability**: < 5% false positive rate for legitimate expressions
- **Performance**: < 10ms validation overhead per query
- **Adoption**: Clear migration path for existing users
## Risk Mitigation
- **False Positives**: Comprehensive testing with real-world expressions
- **Performance Impact**: Efficient AST parsing with caching
- **User Confusion**: Detailed documentation and error messages
- **Maintenance Burden**: Automated security testing in CI/CD
This plan provides a robust foundation for SQL injection prevention while maintaining the flexibility and ease of use that makes MiniML valuable for data modeling.