clarity-pattern-parser
Version:
Parsing Library for Typescript and Javascript.
254 lines (211 loc) • 6.07 kB
Markdown
# Clarity Pattern Parser Grammar
This document describes the grammar features supported by the Clarity Pattern Parser.
## Basic Patterns
### Literal Strings
Define literal string patterns using double quotes:
```
name = "John"
```
Escaped characters are supported in literals:
- `\n` - newline
- `\r` - carriage return
- `\t` - tab
- `\b` - backspace
- `\f` - form feed
- `\v` - vertical tab
- `\0` - null character
- `\x00` - hex character
- `\u0000` - unicode character
- `\"` - escaped quote
- `\\` - escaped backslash
### Regular Expressions
Define regex patterns using forward slashes:
```
name = /\w/
```
## Pattern Operators
### Options (|)
Match one of multiple patterns using the `|` operator. This is used for simple alternatives where order doesn't matter:
```
names = john | jane
```
### Expression (|)
Expression patterns also use the `|` operator but are used for defining operator precedence in expressions. The order of alternatives determines precedence, with earlier alternatives having higher precedence. By default, operators are left-associative.
Example of an arithmetic expression grammar:
```
prefix-operators = "+" | "-"
prefix-expression = prefix-operators + expression
postfix-operators = "++" | "--"
postfix-expression = expression + postfix-operators
add-sub-operators = "+" | "-"
add-sub-expression = expression + add-sub-operators + expression
mul-div-operators = "*" | "/"
mul-div-expression = expression + mul-div-operators + expression
expression = prefix-expression | mul-div-expression | add-sub-expression | postfix-expression
```
In this example:
- `prefix-expression` has highest precedence
- `mul-div-expression` has next highest precedence
- `add-sub-expression` has next highest precedence
- `postfix-expression` has lowest precedence
To make an operator right-associative, add the `right` keyword:
```
expression = prefix-expression | mul-div-expression | add-sub-expression right | postfix-expression
```
### Sequence (+)
Concatenate patterns in sequence using the `+` operator:
```
full-name = first-name + space + last-name
```
### Optional (?)
Make a pattern optional using the `?` operator:
```
full-name = first-name + space + middle-name? + last-name
```
### Not (!)
Negative lookahead using the `!` operator:
```
pattern = !excluded-pattern + actual-pattern
```
### Take Until (?->|)
Match all characters until a specific pattern is found:
```
script-text = ?->| "</script"
```
## Repetition
### Basic Repeat
Repeat a pattern one or more times using `+`:
```
digits = (digit)+
```
### Zero or More
Repeat a pattern zero or more times using `*`:
```
digits = (digit)*
```
### Bounded Repetition
Specify exact repetition counts using curly braces:
- `{n}` - Exactly n times: `(pattern){3}`
- `{n,}` - At least n times: `(pattern){1,}`
- `{,n}` - At most n times: `(pattern){,3}`
- `{n,m}` - Between n and m times: `(pattern){1,3}`
### Repetition with Divider
Repeat patterns with a divider between occurrences:
```
digits = (digit, comma){3}
```
Add `trim` keyword to trim the divider from the end:
```
digits = (digit, comma trim)+
```
## Imports and Parameters
### Basic Import
Import patterns from other files:
```
import { pattern-name } from "path/to/file.cpat"
```
### Import with Parameters
Import with custom parameters:
```
import { pattern } from "file.cpat" with params {
custom-param = "value"
}
```
### Parameter Declaration
Declare parameters that can be passed to the grammar:
```
use params {
param-name
}
```
### Default Parameters
Specify default values for parameters:
```
use params {
param = default-value
}
```
## Decorators
Decorators can be applied to patterns using the `@` syntax:
### Token Decorator
Specify tokens for a pattern:
```
@tokens([" "])
spaces = /\s+/
```
### Custom Decorators
Support for custom decorators with various argument types:
```
@decorator() // No arguments
@decorator(["value"]) // Array argument
@decorator({"prop": value}) // Object argument
```
## Comments
Add comments using the `#` symbol:
```
# This is a comment
pattern = "value"
```
## Expression Patterns
Support for recursive expression patterns with optional right association:
```
expression = ternary | variables
ternary = expression + " ? " + expression + " : " + expression
```
Add `right` keyword for right association:
```
expression = ternary right | variables
```
## Pattern References
Reference other patterns by name:
```
pattern1 = "value"
pattern2 = pattern1
```
## Pattern Aliasing
Import patterns with aliases:
```
import { original as alias } from "file.cpat"
```
## String Template Patterns
Patterns can be defined inline using string templates. This allows for quick pattern definition and testing without creating separate files.
### Basic Example
```typescript
const { fullName } = patterns`
first-name = "John"
last-name = "Doe"
space = /\s+/
full-name = first-name + space + last-name
`;
const result = fullName.exec("John Doe");
// result.ast.value will be "John Doe"
```
### Complex Example (HTML-like Markup)
```typescript
const { body } = patterns`
tag-name = /[a-zA-Z_-]+[a-zA-Z0-9_-]*/
space = /\s+/
opening-tag = "<" + tag-name + space? + ">"
closing-tag = "</" + tag-name + space? + ">"
child = space? + element + space?
children = (child)*
element = opening-tag + children + closing-tag
body = space? + element + space?
`;
const result = body.exec(`
<div>
<div></div>
<div></div>
</div>
`, true);
// Clean up spaces from the AST
result?.ast?.findAll(n => n.name.includes("space")).forEach(n => n.remove());
// result.ast.value will be "<div><div></div><div></div></div>"
```
### Key Features
1. Patterns are defined using backticks (`)
2. Each pattern definition is on a new line
3. The `patterns` function returns an object with all defined patterns
4. Patterns can be used immediately after definition
5. The AST can be manipulated after parsing (e.g., removing spaces)
6. The `exec` method can take an optional second parameter to enable debug mode