meld
Version:
Meld: A template language for LLM prompts
141 lines (105 loc) • 5.42 kB
Markdown
# Regex Fallbacks in Meld Resolution
This document catalogs the current regex-based fallback mechanisms in the Meld codebase. These fallbacks are used when AST parsing fails, providing a more permissive but less precise alternative to maintain functionality.
## Overview
Multiple handler classes in the ResolutionService include regex fallbacks that are triggered when the AST parser is unavailable or fails to parse the input. These exist as a safety mechanism but represent a technical debt that should eventually be addressed.
## Fallback Mechanisms by Handler
### 1. StringLiteralHandler
**File:** `/services/resolution/ResolutionService/resolvers/StringLiteralHandler.ts`
**Edge Cases Handled:**
- Basic string literals with different quote types (`"`, `'`, or `` ` ``)
- Strings with escaped quotes
- Empty strings
- Strings with newlines
**Fallback Methods:**
- `isStringLiteral()`: Checks if a value is a string literal using character-by-character examination
- `validateLiteral()`: Validates string literals by checking quote types, content, and escaped characters
- `parseLiteral()`: Manually extracts content from string literals by removing quotes
**Fallback Trigger:**
```typescript
catch (error) {
console.warn('Failed to check string literal with AST, falling back to manual check:', error);
return this.isStringLiteral(value);
}
```
### 2. StringConcatenationHandler
**File:** `/services/resolution/ResolutionService/resolvers/StringConcatenationHandler.ts`
**Edge Cases Handled:**
- String concatenation with the `++` operator (`"hello" ++ "world"`)
- Whitespace variations around the operator
**Fallback Methods:**
- `splitConcatenationParts()`: Splits a string by the `++` operator using regex `/\s*\+\+\s*/`
- `hasConcatenation()`: Checks for concatenation using regex `/\s\+\+\s/`
**Fallback Trigger:**
```typescript
catch (error) {
console.warn('Failed to parse concatenation with AST, falling back to manual parsing:', error);
}
```
**Regex Used:**
```typescript
// Split by ++ operator, preserving spaces around it
const parts = value.split(/\s*\+\+\s*/);
// Look for ++ with required spaces on both sides
return /\s\+\+\s/.test(value);
```
### 3. VariableReferenceResolver
**File:** `/services/resolution/ResolutionService/resolvers/VariableReferenceResolver.ts`
**Edge Cases Handled:**
- Variable references with `{{variable}}` syntax
- Field access notation with dots (`{{data.field.subfield}}`)
- Multiple variable references in a single string
**Fallback Methods:**
- `resolveSimpleVariables()`: Resolves variables using regex when AST parsing fails
- `extractReferencesRegex()`: Extracts variable references using regex
- `hasVariableReferences()`: Checks for variable references using a simple string `includes()` check
**Fallback Trigger:**
```typescript
// If parsing failed or returned empty, return original text
if (!nodes || nodes.length === 0) {
console.log('*** No AST nodes, falling back to simple variables');
return this.resolveSimpleVariables(text, context);
}
```
**Regex Used:**
```typescript
// Replace variable references in format {{varName}}
const variableRegex = /\{\{([^{}]+?)\}\}/g;
// Match {{varName}} pattern without using the parser
const matches = text.match(/\{\{([^{}]+)\}\}/g) || [];
```
### 4. CommandResolver
**File:** `/services/resolution/ResolutionService/resolvers/CommandResolver.ts`
**Edge Cases Handled:**
- Command template parameters in `{{paramName}}` format
- Command argument parsing
- Parameter count validation
**Fallback Methods:**
- `parseCommandParameters()`: Manually parses command parameters when AST fails
- `countParameterReferences()`: Counts parameter references in a template
- `extractParameterNames()`: Extracts parameter names from a template
**Fallback Trigger:**
```typescript
catch (error) {
// If parsing fails, fall back to manual parsing
console.warn('Failed to parse command with AST, falling back to manual parsing:', error);
}
```
## Common Edge Cases Across All Handlers
1. **Syntax Errors**: When the parser can't handle certain syntax or combination of syntax elements
2. **Incomplete Structures**: Mismatched quotes or brackets that should fail but regex allows partial matching
3. **Context-Free Parsing**: The regex approach doesn't consider the broader context, making it more permissive
4. **Special Characters**: The AST might handle special characters differently than regex
5. **Nested Structures**: Regex can't correctly handle complex nested structures that the AST would recognize
## Migration Strategy
To eliminate regex fallbacks and rely solely on the AST:
1. Enhance the AST parser to handle all edge cases currently caught by regex
2. Update test cases that expect regex fallbacks to work with pure AST parsing
3. Remove fallback methods and replace with appropriate error handling
4. Add more comprehensive error messages when AST parsing fails
## Impact of Removing Fallbacks
Removing regex fallbacks will make the code:
- More consistent (single parsing approach)
- More precise (AST enforces syntax rules more strictly)
- Potentially less forgiving of syntax errors
- Simpler to maintain in the long run
Tests that specifically test the fallback behavior (with descriptions like "should fall back to regex") will need to be updated to either expect errors when AST parsing fails or to verify that the AST parsing now correctly handles these cases.