apg-unicode
Version:
JavaScript APG parser of Unicode code point arrays
95 lines (62 loc) • 4.61 kB
Markdown
# APG Unicode Parser
Parsers created with [`apg-js`](https://github.com/ldthomas/apg-js) and [`apg-lite`](https://github.com/ldthomas/apg-lite) operate on arrays of positive integers—typically representing character codes. The `apg-unicode` variant extends this by supporting **typed arrays**, enabling more memory-efficient parsing workflows for modern JavaScript environments.
> **Note:** `apg-unicode` does not natively parse Unicode. Instead, Unicode handling must be implemented via SABNF grammar and application logic. Typed arrays and conversion utilities simplify this process. See `./examples/unicode` for an illustration of UTF-8 and UTF-16 parsing without prior transformation.
## Key Features
### Typed Array Support
`apg-unicode` accepts the following input types:
- `Array`
- `Buffer`
- `Uint8Array`
- `Uint16Array`
- `Uint32Array`
- `String` (converted internally to `Uint32Array` of code points)
Using typed arrays—especially `Uint8Array`—can reduce memory usage by up to **75%** for large UTF-8 files.
### Substring Parsing
Efficiently parse substrings within large strings without slicing or reallocating. Ideal for partial parsing scenarios. See `./examples/substrings` for usage patterns.
## Parser Generation
Like `apg-lite`, `apg-unicode` does **not** include a parser generator. To generate a grammar object, for example:
```bash
npm run apg -- -i ./examples/stats/sip.bnf -o ./examples/stats/sip
```
## GitHub Usage
Clone the repo and run the user application and examples from the root directory:
```bash
git clone https://github.com/ldthomas/apg-unicode.git
cd apg-unicode
```
Include the modules in an application with:
```bash
import { Parser } from './src/parser.js';
import { Ast } from './src/ast.js';
import { Trace } from './src/tracer.js';
import { Stats } from './src/stats.js';
import { utilities } from './src/utilities.js';
import { identifiers } from './src/identifiers.js';
```
To run the examples use:
| Command | Description |
| ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `node examples/ast/main` | Demonstrates Abstract Syntax Tree (AST) usage |
| `node examples/trace/main` | Traces the parser through the parse tree |
| `node examples/stats/main` | Collects and displays node hit statistics |
| `node examples/substrings/main` | Parses substrings within a full input string |
| `node examples/unicode/main` | Parses UTF-8 and UTF-16 directly without prior transformation to code points |
| display `examples/web/web.html` in any browser | Illustrates running a parser in a web page. Note that `web-app.js` is created with [esbuild](https://github.com/evanw/esbuild) from `app.js`. Use the script `npm run esbuild`. |
## npm Usage
Install the repo from the npm registry. In the application root directory:
```bash
npm install apg-unicode
```
To access the modules in the application:
```bash
import { Parser, Ast, Trace, Stats, utilities, identifiers } from 'apg-unicode';
```
## Documentation
The documentation is in in the code in [docco](https://davidwalsh.name/javascript-documentation) format. To generate it use:
```bash
npm run docco
```
The documentation will then be in at `./docs/index.html`
Or view it [here](https://sabnf.com/docs/apg-unicode/index.html) on the APG website.
## License
`apg-unicode` is licensed under the permissive [MIT](https://github.com/ldthomas/apg-unicode?tab=License-1-ov-file) license.