unicode-substring
Version:
Unicode-aware substring
22 lines (13 loc) • 1.21 kB
Markdown
for JavaScript. Surrogate pairs are counted as a single character.
Characters in JavaScript strings are exposed as 16-bit code points, also known as UCS-2 encoding. This usually good enough, but since there are more than 2^16 characters in Unicode, 16 bits is not enough to represent all characters. To overcome this limitation, characters with scalar value over `0x10FFFF` need to be encoded as surrogate pairs. This encoding is known as UTF-16.
The purpose of this library is to treat surrogate pairs as one character when extracting substrings from a string. This might be preferable if indices are returned from an Unicode-compatible environment.
```javascript
var unicodeSubstring = require('unicode-substring')
// unicodeSubstring(string, start, end)
unicodeSubstring("💥Emoji Rule💥", 0, 6)
// => "💥Emoji"
```
The `start` and `end` parameters behave similarly as [String.prototype.substring](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/substring).
Unicode-aware substring