wcwidth-o1
Version:
Unicode text width calculation (wcwidth/wcswidth) for Node.js and TypeScript, optimized to O(1), based on Markus Kuhn's reference.
151 lines (99 loc) • 3.65 kB
Markdown
# Wcwidth-O1
[](https://www.npmjs.com/package/wcwidth-o1)
A TypeScript/JavaScript port of Markus Kuhn’s **wcwidth** and **wcswidth**
implementations, optimized to O(1).
These functions are defined in IEEE Std 1003.1 (POSIX).
### Superior Performance
- ⚡️ Instant *O*(1) lookup time
- 🌏 Full Unicode 17.0 coverage
### References:
- [OpenGroup wcwidth()](http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html)
- [OpenGroup wcswidth()](http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html)
## Getting Started
Install Wcwidth-O1 via npm:
```bash
npm i wcwidth-o1
```
## Usage
<h3>JavaScript / TypeScript:</h3>
```ts
import wcwidth from 'wcwidth-o1';
const example1 = wcwidth('a'); // 1
const example2 = wcwidth('好'); // 2
const example3 = wcwidth('😊'); // 2
```
or
```ts
import { wcwidth, wcswidth, wcswidthCjk } from 'wcwidth-o1';
const example = wcwidth('a'); // 1
const example1 = wcswidth('hi'); // 2
const example2 = wcswidth('안녕하세요'); // 10
const example3 = wcswidth('😊こんにちは'); // 12
const example4 = wcswidthCjk('°C'); // 3
```
### Function Parameters:
**wcwidth()**, **wcwidthCjk()**:
- `char`: A single-character string to measure.
**wcswidth()**, **wcswidthCjk()**:
- `str`: Input string to evaluate.
- `n`: Max characters to process (defaults to full length).
## Updating Unicode Data
When a new Unicode version is released, the `ambiguous` and `combining` lookup bitsets must be regenerated.
### 1. Generate new bitsets
```bash
npx tsx unicode/generateBitset.ts
```
This will produce updated `ambiguous.ts` and `combining.ts` files under `unicode/`.
### 2. Optimize repetitive masks
Example of unoptimized output:
```ts
export const ambiguous = (idx: number): number => ambiguousMap[idx] ?? 0;
const map: Record<number, number> = {
1: 2947937,
2: -1,
3: -1,
4: -1,
... // assume 5-99 are -1
100: -1,
101: 394231
};
```
Optimized version:
```ts
export const ambiguous = (idx: number): number =>
2 <= idx && idx <= 100 ? -1 : (ambiguousMap[idx] ?? 0);
// example
const map: Record<number, number> = {
1: 2947937,
101: 394231
};
```
### 3. Replace files
Copy the optimized `ambiguous.ts` and `combining.ts` into `src/`.
The Unicode update is then complete.
## Behind Wcwidth
In fixed-width terminals, most Latin characters take up one column, while East
Asian (CJK) ideographs usually take up two. The challenge is deciding how many
“cells” each Unicode character should occupy so that text aligns correctly.
The Unicode standard defines width classes:
- **Wide (W)** and **Fullwidth (F)** - always 2 columns
- **Halfwidth (H)** and **Narrow (Na)** - always 1 column
- **Ambiguous (A)** - 1 column normally, but 2 in CJK compatibility mode
- **Neutral (N)** - treated as 1 column here for simplicity
Other rules include:
- `U+0000` (null) - width 0
- Control characters - `-1`
- Combining marks - width 0
- Soft hyphen (`U+00AD`) - width 1
- Zero width space (`U+200B`) - width 0
This logic originates from Markus Kuhn’s reference implementation and is widely
used in terminal emulators to ensure consistent alignment.
See [Unicode TR#11](http://www.unicode.org/unicode/reports/tr11/) for more details.
## Feedback
Found something odd?
Feel free to [open an issue](https://github.com/dawsonhuang0/Wcwidth-O1/issues).
## Acknowledgments
- Original [implementation](http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c) by Markus Kuhn.
## License
Distributed under the MIT License.
See [`LICENSE`](LICENSE) for more information.