clarity-pattern-parser
Version:
Parsing Library for Typescript and Javascript.
514 lines (405 loc) • 12.9 kB
Markdown
# Clarity Pattern Parser
A powerful pattern matching and parsing library that provides a flexible grammar for defining complex patterns. Perfect for building parsers, validators, and text processing tools.
> **Try it online!** 🚀 [Open in Playground](https://jaredjbarnes.github.io/cpat-editor/)
## Features
- 🎯 Flexible pattern matching with both grammar and direct API
- 🔄 Support for recursive patterns and expressions
- 🎨 Customizable pattern composition
- 🚀 High performance parsing
- 🔍 Built-in debugging support
- 📝 Rich AST manipulation capabilities
- 🔌 Extensible through custom patterns and decorators
## Installation
```bash
npm install clarity-pattern-parser
```
## Quick Start
### Using Grammar
```typescript
import { patterns } from "clarity-pattern-parser";
// Define patterns using grammar
const { fullName } = patterns`
first-name = "John"
last-name = "Doe"
space = /\s+/
full-name = first-name + space + last-name
`;
// Execute pattern
const result = fullName.exec("John Doe");
console.log(result.ast?.value); // "John Doe"
```
### Using Direct API
```typescript
import { Literal, Sequence } from "clarity-pattern-parser";
// Create patterns directly
const firstName = new Literal("first-name", "John");
const space = new Literal("space", " ");
const lastName = new Literal("last-name", "Doe");
const fullName = new Sequence("full-name", [firstName, space, lastName]);
// Execute pattern
const result = fullName.exec("John Doe");
console.log(result.ast?.value); // "John Doe"
```
## Online Playground
Try Clarity Pattern Parser in your browser with our interactive playground:
[Open in Playground](https://jaredjbarnes.github.io/cpat-editor/)
The playground allows you to:
- Write and test patterns in real-time
- See the AST visualization
- Debug pattern execution
- Share patterns with others
- Try out different examples
## Table of Contents
1. [Grammar Documentation](#grammar-documentation)
- [Basic Patterns](#basic-patterns)
- [Pattern Operators](#pattern-operators)
- [Repetition](#repetition)
- [Imports and Parameters](#imports-and-parameters)
- [Decorators](#decorators)
- [Comments](#comments)
- [Pattern References](#pattern-references)
- [Pattern Aliasing](#pattern-aliasing)
- [String Template Patterns](#string-template-patterns)
2. [Direct Pattern Usage](#direct-pattern-usage)
- [Basic Patterns](#basic-patterns-1)
- [Composite Patterns](#composite-patterns)
- [Pattern Context](#pattern-context)
- [Pattern Reference](#pattern-reference)
- [Pattern Execution](#pattern-execution)
- [AST Manipulation](#ast-manipulation)
3. [Advanced Topics](#advanced-topics)
- [Custom Patterns](#custom-patterns)
- [Performance Tips](#performance-tips)
- [Debugging](#debugging)
- [Error Handling](#error-handling)
## Grammar Documentation
This document describes the grammar features supported by the Clarity Pattern Parser.
### Basic Patterns
#### Literal Strings
Define literal string patterns using double quotes:
```
name = "John"
```
Escaped characters are supported in literals:
- `\n` - newline
- `\r` - carriage return
- `\t` - tab
- `\b` - backspace
- `\f` - form feed
- `\v` - vertical tab
- `\0` - null character
- `\x00` - hex character
- `\u0000` - unicode character
- `\"` - escaped quote
- `\\` - escaped backslash
#### Regular Expressions
Define regex patterns using forward slashes:
```
name = /\w/
```
### Pattern Operators
#### Options (|)
Match one of multiple patterns using the `|` operator. This is used for simple alternatives where order doesn't matter:
```
names = john | jane
```
#### Expression (|)
Expression patterns also use the `|` operator but are used for defining operator precedence in expressions. The order of alternatives determines precedence, with earlier alternatives having higher precedence. By default, operators are left-associative.
Example of an arithmetic expression grammar:
```
prefix-operators = "+" | "-"
prefix-expression = prefix-operators + expression
postfix-operators = "++" | "--"
postfix-expression = expression + postfix-operators
add-sub-operators = "+" | "-"
add-sub-expression = expression + add-sub-operators + expression
mul-div-operators = "*" | "/"
mul-div-expression = expression + mul-div-operators + expression
expression = prefix-expression | mul-div-expression | add-sub-expression | postfix-expression
```
### Repetition
#### Basic Repeat
Repeat a pattern one or more times using `+`:
```
digits = (digit)+
```
#### Zero or More
Repeat a pattern zero or more times using `*`:
```
digits = (digit)*
```
#### Bounded Repetition
Specify exact repetition counts using curly braces:
- `{n}` - Exactly n times: `(pattern){3}`
- `{n,}` - At least n times: `(pattern){1,}`
- `{,n}` - At most n times: `(pattern){,3}`
- `{n,m}` - Between n and m times: `(pattern){1,3}`
#### Repetition with Divider
Repeat patterns with a divider between occurrences:
```
digits = (digit, comma){3}
```
Add `trim` keyword to trim the divider from the end:
```
digits = (digit, comma trim)+
```
### Imports and Parameters
#### Basic Import
Import patterns from other files:
```
import { pattern-name } from "path/to/file.cpat"
```
#### Import with Parameters
Import with custom parameters:
```
import { pattern } from "file.cpat" with params {
custom-param = "value"
}
```
#### Parameter Declaration
Declare parameters that can be passed to the grammar:
```
use params {
param-name
}
```
#### Default Parameters
Specify default values for parameters:
```
use params {
param = default-value
}
```
### Decorators
#### Token Decorator
Specify tokens for a pattern:
```
@tokens([" "])
spaces = /\s+/
```
#### Custom Decorators
Support for custom decorators with various argument types:
```
@decorator() // No arguments
@decorator(["value"]) // Array argument
@decorator({"prop": value}) // Object argument
```
### Comments
Add comments using the `#` symbol:
```
# This is a comment
pattern = "value"
```
### Pattern References
Reference other patterns by name:
```
pattern1 = "value"
pattern2 = pattern1
```
### Pattern Aliasing
Import patterns with aliases:
```
import { original as alias } from "file.cpat"
```
### String Template Patterns
Patterns can be defined inline using string templates. This allows for quick pattern definition and testing without creating separate files.
#### Basic Example
```typescript
const { fullName } = patterns`
first-name = "John"
last-name = "Doe"
space = /\s+/
full-name = first-name + space + last-name
`;
const result = fullName.exec("John Doe");
// result.ast.value will be "John Doe"
```
#### Complex Example (HTML-like Markup)
```typescript
const { body } = patterns`
tag-name = /[a-zA-Z_-]+[a-zA-Z0-9_-]*/
ws = /\s+/
opening-tag = "<" + tag-name + ws? + ">"
closing-tag = "</" + tag-name + ws? + ">"
child = ws? + element + ws?
children = (child)*
element = opening-tag + children + closing-tag
body = ws? + element + ws?
`;
const result = body.exec(`
<div>
<div></div>
<div></div>
</div>
`, true);
// Clean up spaces from the AST
result?.ast?.findAll(n => n.name.includes("ws")).forEach(n => n.remove());
// result.ast.value will be "<div><div></div><div></div></div>"
```
## Direct Pattern Usage
While the grammar provides a convenient way to define patterns, you can also use the Pattern classes directly for more control and flexibility.
### Basic Patterns
#### Literal
```typescript
import { Literal } from "clarity-pattern-parser";
const firstName = new Literal("first-name", "John");
const result = firstName.exec("John");
// result.ast.value will be "John"
```
#### Regex
```typescript
import { Regex } from "clarity-pattern-parser";
const digits = new Regex("digits", "\\d+");
const result = digits.exec("123");
// result.ast.value will be "123"
```
### Composite Patterns
#### Sequence
```typescript
import { Sequence, Literal } from "clarity-pattern-parser";
const firstName = new Literal("first-name", "John");
const space = new Literal("space", " ");
const lastName = new Literal("last-name", "Doe");
const fullName = new Sequence("full-name", [firstName, space, lastName]);
const result = fullName.exec("John Doe");
// result.ast.value will be "John Doe"
```
#### Options
```typescript
import { Options, Literal } from "clarity-pattern-parser";
const john = new Literal("john", "John");
const jane = new Literal("jane", "Jane");
const names = new Options("names", [john, jane]);
const result = names.exec("Jane");
// result.ast.value will be "Jane"
```
#### Expression
```typescript
import { Expression, Literal } from "clarity-pattern-parser";
const a = new Literal("a", "a");
const b = new Literal("b", "b");
const c = new Literal("c", "c");
const expression = new Expression("expression", [a, b, c]);
const result = expression.exec("a ? b : c");
// result.ast.value will be "a ? b : c"
```
### Pattern Context
```typescript
import { Context, Literal } from "clarity-pattern-parser";
const name = new Literal("name", "John");
const context = new Context("name-context", name);
const result = context.exec("John");
// result.ast.value will be "John"
```
### Pattern Reference
```typescript
import { Reference, Literal, Sequence } from "clarity-pattern-parser";
const name = new Literal("name", "John");
const reference = new Reference("name-ref", name);
const pattern = new Sequence("pattern", [reference]);
const result = pattern.exec("John");
// result.ast.value will be "John"
```
### Pattern Execution
Pattern execution returns a `ParseResult` that includes the AST and any error information:
```typescript
const result = pattern.exec("some text");
if (result.error) {
console.error(result.error.message);
console.error(result.error.expected);
console.error(result.error.position);
} else {
console.log(result.ast?.value);
}
```
### AST Manipulation
The AST (Abstract Syntax Tree) returned by pattern execution can be manipulated:
```typescript
const result = pattern.exec("some text");
if (result.ast) {
// Find all nodes with a specific name
const nodes = result.ast.findAll(n => n.name === "space");
// Remove nodes
nodes.forEach(n => n.remove());
// Get the final value
const value = result.ast.value;
}
```
## Advanced Topics
### Custom Patterns
You can create custom patterns by extending the base `Pattern` class:
```typescript
import { Pattern } from "clarity-pattern-parser";
class CustomPattern extends Pattern {
constructor(name: string) {
super(name);
}
exec(text: string) {
// Custom pattern implementation
}
}
```
### Performance Tips
1. Use `test()` instead of `exec()` when you only need to check if a pattern matches
2. Cache frequently used patterns
3. Use `Reference` for recursive patterns instead of direct recursion
4. Minimize the use of optional patterns in sequences
5. Use bounded repetition when possible
### Debugging
Enable debug mode to get detailed information about pattern execution:
```typescript
const result = pattern.exec("some text", true);
// Debug information will be available in result.debug
```
### Error Handling
Pattern execution returns a `ParseResult` that includes error information:
```typescript
const result = pattern.exec("invalid text");
if (result.error) {
console.error(result.error.message);
console.error(result.error.expected);
console.error(result.error.position);
}
```
## Examples
### JSON Parser
```typescript
const { json } = patterns`
# Basic JSON grammar
ws = /\s+/
string = /"[^"]*"/
number = /-?\d+(\.\d+)?/
boolean = "true" | "false"
null = "null"
value = string | number | boolean | null | array | object
array-items = (value, /\s*,\s*/)+
array = "[" +ws? + array-items? + ws? + "]"
object-property = string + ws? + ":" + ws? + value
object-properties = (object-property, /\s*,\s*/ trim)+
object = "{" + ws? + object-properties? + ws? + "}"
json = ws? + value + ws?
`;
```
### HTML Parser
```typescript
const { html } = patterns`
# Basic HTML grammar
ws = /\s+/
tag-name = /[a-zA-Z_-]+[a-zA-Z0-9_-]*/
attribute-name = /[a-zA-Z_-]+[a-zA-Z0-9_-]*/
attribute-value = /"[^"]*"/
value-attribute = attribute-name + "=" + attribute-value
bool-attribute = attribute-name
attribute = value-attribute | bool-attribute
attributes = (attribute, ws)*
opening-tag = "<" + ws? + tag-name + ws? + attributes? + ">"
closing-tag = "</" + ws? + tag-name + ws? + ">"
text = /[^<]+/
child = text | element
children = (child, /\s*/)+
element = opening-tag + children? + closing-tag
html = ws? + element + ws?
`;
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.