codepoints
Version:
A parser for files in the Unicode database
74 lines (57 loc) • 3.01 kB
Markdown
# codepoints
A parser for files in the Unicode database. Produces a giant array of codepoint objects for
every character represented by Unicode, with many properties derived from files in the Unicode
database.
**BUILD SCRIPTS ONLY**: Use in production is not recommended
as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a
huge amount of memory. To access this data in real world applications, use modules that have
precompiled the data into a compressed form:
* [unicode-properties](https://github.com/devongovett/unicode-properties)
## Installation
Install using npm:
npm install codepoints
## Usage
Basic usage:
```js
codepoints = require('codepoints');
```
The parser generates data by reading the text files contained in the
[Unicode Character Database](http://unicode.org/ucd/). By default, it will use the database
bundled with this package. To use a custom version of UCD, use `codepoints/parser` instead,
which accepts an optional path to a directory containing the uncompressed UCD data:
```js
parser = require('codepoints/parser');
codepoints = parser('/path/to/UCD');
```
## Codepoint data
Each element in the generated array is either `undefined` (for unassigned code
points), or an object containing the following properties:
* `code` - the code point index
* `name` - character name
* `unicode1Name` - legacy name used by Unicode 1
* `category` - Unicode category
* `block` - the block name this character is a part of
* `script` - the script this character belongs to
* `eastAsianWidth` - the east asian width for this character
* `combiningClass` - numeric combining class value
* `combiningClassName` - a string name for the combining class
* `bidiClass` - class for the Unicode bidirectional algorithm
* `bidiMirrored` - whether the character is mirrored in the bidi algorithm
* `numeric` - the numeric value for this character
* `uppercase` - an array of code points mapping this character to upper case, if any
* `lowercase` - an array of code points mapping this character to lower case, if any
* `titlecase` - an array of code points mapping this character to title case, if any
* `folded` - an array of code points mapping this character to a folded equivalent, if any
* `caseConditions` - conditions used during case mapping for this character
* `decomposition` - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.
* `compositions` - a dictionary mapping of compositions for this character
* `isCompat` - whether the decomposition is a compatibility one
* `isExcluded` - whether the character is excluded from composition
* `NFC_QC` - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)
* `NFKC_QC` - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)
* `NFD_QC` - quickcheck value for NFD (0 = YES, 1 = NO)
* `NFKD_QC` - quickcheck value for NFKD (0 = YES, 1 = NO)
* `joiningType` - arabic joining type
* `joiningGroup` - arabic joining group
## License
MIT