UNPKG

parsil

Version:

A parser combinators library written in Typescript

222 lines (149 loc) 6.98 kB
# Parsil [![Build Status](https://github.com/salty-max/parsil/workflows/CI/badge.svg)](https://github.com/salty-max/parsil/actions) [![npm Version](https://img.shields.io/npm/v/parsil.svg?style=flat-square)](https://www.npmjs.com/package/parsil) [![License](https://img.shields.io/npm/l/parsil.svg?style=flat-square)](https://github.com/salty-max/parsil/blob/main/LICENSE) A lightweight parser‑combinator library for JavaScript/TypeScript. Compose small, pure parsers into powerful language parsers that run in **Node**, **Bun**, and modern **browsers**. --- ## Key features - **Combinators** for building complex grammars from tiny pieces - Great **TypeScript** inference - **UTF‑8 aware** character parsers - **Source positions**: capture current offset (`index`) and attach **spans** with `parser.withSpan()` / `parser.spanMap()` - Works on **string** and **binary** inputs (`TypedArray`/`ArrayBuffer`/`DataView`) - Helpful error messages and ergonomics (`run`, `fork`, `map`, `chain`, `errorMap`) --- ## Install ```bash # npm npm i parsil # bun bun add parsil ``` > **ESM‑only** as of v2.0.0. If you use CommonJS, dynamically import: > > ```js > const P = await import('parsil') > ``` --- ## Quick start ```ts import * as P from 'parsil' // or: import P from 'parsil'; // default namespace export // Parse one or more letters, then digits const wordThenNumber = P.sequenceOf([P.letters, P.digits]) const ok = wordThenNumber.run('hello123') // { isError: false, result: ['hello', '123'], index: 8 } const fail = wordThenNumber.run('123') // { isError: true, error: "ParseError ...", index: 0 } ``` ### Binary example: IPv4 header (excerpt) ```ts import * as P from 'parsil' const tag = (type: string) => (value: unknown) => ({ type, value }) const packetHeader = P.sequenceOf([ P.uint(4).map(tag('Version')), P.uint(4).map(tag('IHL')), P.uint(6).map(tag('DSCP')), P.uint(2).map(tag('ECN')), P.uint(16).map(tag('Total Length')), ]) // run against a DataView/ArrayBuffer ``` --- ## Breaking changes in **v2.0.0** - **ESM‑only** distribution. The CommonJS entry has been removed. Use `import` (or dynamic import in CJS). - **Engines**: Node **≥ 20** (Bun ≥ 1.1). - **Character parsers** (`anyChar`, `anyCharExcept`, etc.) return **string** values (not code points) and have updated TS types. --- ## API (overview) Parsil exposes a `Parser<T>` type and a set of combinators. Everything below is available as a **named export** and also through the **default namespace**. ### Methods on `Parser<T>` - **`.run(input)`**`{ isError, result?, error?, index }` - **`.fork(input, onError, onSuccess)`** → call either callback - **`.map<U>(fn: (value: T) => U)`**`Parser<U>` - **`.chain<U>(fn: (value: T) => Parser<U>)`**`Parser<U>` - **`.errorMap(fn)`** → map error details - **`.skip<U>(other: Parser<U>)`**`Parser<T>` - **`.then<U>(other: Parser<U>)`**`Parser<U>` - **`.between<L, R>(left: Parser<L>, right: Parser<R>)`**`Parser<U>` - **`.lookahead()`** → peek without consuming - **`.withSpan()`**`Parser<{ value: T; start: number; end: number }>` (returns value + byte offsets consumed) - **`.spanMap(fn)`** → map `(value, { start, end })` to your own node shape ### Core primitives - **`str(text)`** – match a string - **`char(c)`** – match a single UTF‑8 char exactly - **`regex(re)`** – match via JS RegExp (anchored at current position) - **`digit`/`digits`**, **`letter`/`letters`**, **`whitespace`/`optionalWhitespace`** - **`anyChar`**, **`anyCharExcept(p)`** - **`index`** – current byte offset (non‑consuming) ### Combinators - **`sequenceOf([p1, p2, ...])`** – run in order, collect results - **`choice([p1, p2, ...])`** – try in order until one succeeds - **`many(p)`** / **`manyOne(p)`** – zero or more / one or more - **`exactly(n)(p)`** – repeat parser `n` times - **`between(left, right)(value)`** – parse `value` between `left` and `right` - **`sepBy(sep)(value)`** / **`sepByOne(sep)(value)`** – separated lists - **`possibly(p)`** – optional (returns `null` when absent) - **`lookAhead(p)`**, **`peek`**, **`startOfInput`**, **`endOfInput`** - **`recursive(thunk)`** – define mutually recursive parsers - **`succeed(x)`** / **`fail(msg)`** – constant success/failure ### Binary helpers - **`uint(n)`** – read the next **n bits** as an unsigned integer - **`int(n)`** – read the next **n bits** as a signed integer - Utilities: `getString`, `getUtf8Char`, `getNextCharWidth`, `getCharacterLength` > Full examples live in the [`examples/`](./examples) directory: simple expression parser, IPv4 header, etc. --- ## Error handling Use `.fork` if you want callbacks instead of returned objects: ```ts P.str('hello').fork( 'hello', (error, state) => console.error(error, state), (result, state) => console.log(result, state) ) ``` --- ## Source positions & spans Parsil exposes a non‑consuming `index` parser and span helpers on every parser instance: ```ts import * as P from 'parsil' // Read current offset const at = P.index.run('hello') // { result: 0, index: 0 } // Attach start/end byte offsets to any parser const greet = P.str('hello').withSpan() // greet.run('hello!') → { result: { value: 'hello', start: 0, end: 5 }, index: 5 } // Map value + span to your own node shape (e.g., for AST tooling) const node = P.str('XY').spanMap((value, loc) => ({ kind: 'tok', value, loc })) // node.run('XY!') → { result: { kind: 'tok', value: 'XY', loc: { start: 0, end: 2 } }, index: 2 } ``` Offsets are byte‑based; editors like VS Code/CodeMirror can convert to line/column. --- ## Contributing - Run tests: `bun test` - Lint: `bun run lint` - Build: `bun run build` PRs welcome! Please add tests for new combinators. --- ## License MIT © [Maxime Blanc](https://github.com/salty-max) --- ## Changelog Further changes are listed in [CHANGELOG.md](./CHANGELOG.md). ### v2.0.0 (BREAKING) - **ESM-only** distribution. CommonJS entry removed. Use `import` (or dynamic `import()` in CJS). - **Engines**: Node **≥ 20**, Bun **≥ 1.1**. - **Character parsers** (`anyChar`, `anyCharExcept`, etc.) now return **string** values; types updated accordingly. - Build & DX: moved to **Bun** for tests/build; CI updated; tests relocated out of `src/`. ### v1.6.0 - New parsers: [`everythingUntil`](#everythinguntil), [`everyCharUntil`](#everycharuntil). ### v1.5.0 - New parser: [`anyCharExcept`](#anycharexcept). ### v1.4.0 - New parsers: [`lookAhead`](#lookahead), [`startOfInput`](#startofinput), [`endOfInput`](#endofinput). ### v1.3.0 - Improved type inference in `choice`, `sequenceOf`, and `exactly` using TS variadic tuple types. ### v1.2.0 - New parsers: [`exactly`](#exactly), [`peek`](#peek). ### v1.1.0 - New parsers: [`coroutine`](#coroutine), [`digit`](#digit), [`letter`](#letter), [`possibly`](#possibly), [`optionalWhitespace`](#optionalwhitespace), [`whitespace`](#whitespace). `