qb-utf8-ez
Version:
Easy-to-use functions for encoding/decoding UTF-8 in the browser
170 lines (112 loc) • 6.63 kB
Markdown
# qb-utf8-ez
[![npm][npm-image]][npm-url]
[![downloads][downloads-image]][npm-url]
[![bitHound Dependencies][proddep-image]][proddep-link]
[![dev dependencies][devdep-image]][devdep-link]
[![code analysis][code-image]][code-link]
[npm-image]: https://img.shields.io/npm/v/qb-utf8-ez.svg
[downloads-image]: https://img.shields.io/npm/dm/qb-utf8-ez.svg
[npm-url]: https://npmjs.org/package/qb-utf8-ez
[proddep-image]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez/badges/dependencies.svg
[proddep-link]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez/master/dependencies/npm
[devdep-image]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez/badges/devDependencies.svg
[devdep-link]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez/master/dependencies/npm
[code-image]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez/badges/code.svg
[code-link]: https://www.bithound.io/github/quicbit-js/qb-utf8-ez
Easy-to-use UTF-8 encoding and decoding that work in all browsers (except ancient < IE 5.5).
Based on tiny implementations (qb-utf8-to-str-tiny and qb-utf8-from-str-tiny), which are
tiny and good for small decoding jobs, but not fast for very large files.
**Complies with the 100% test coverage and minimum dependency requirements** of
[qb-standard](http://github.com/quicbit-js/qb-standard) .
# Install
install qb-utf8-ez
# API Update 2.x -> 3.x
Functions that take array-like parameters and ranges have been updated to work with
terms defined in the [glossary](https://github.com/quicbit-js/qb-standard/blob/master/doc/variable-glossary.md).
Namely, functions of the form
* **function ( buf, beg, end )**
* **function ( buf, {beg:0, end:10} )**
have been updated to
* **function ( [src][src-link], [off][off-link], [lim][lim-link] )**
* **function ( [src][src-link], {[off][off-link]:0, [lim][lim-link]:10 } )**
[src-link]: https://github.com/quicbit-js/qb-standard/blob/master/doc/variable-glossary.md#src-source
[off-link]: https://github.com/quicbit-js/qb-standard/blob/master/doc/variable-glossary.md#off-offset
[lim-link]: https://github.com/quicbit-js/qb-standard/blob/master/doc/variable-glossary.md#lim-limit
# Usage
var utf8 = require('qb-utf8-ez');
var buf = utf8.buffer('hello. 你好');
console.log(buf);
Prints a buffer with UTF-8 code points:
> [ 104, 101, 108, 108, 111, 46, 32, 228, 189, 160, 229, 165, 189 ]
and
var s = utf8.string(buf);
console.log(s);
Prints:
> hello. 你好
## buffer(value, options)
same as utf8 (old function name)
## utf8(value, options)
Return an array or buffer of UTF-8 encoded bytes for the given value 'v'
v may be:
options:
ret_type: (string) 'array', 'buffer', or 'uint8array' - the type to create and return.
fill_to_length: (integer) if set, an array of the given length will be returned,
filled with encoded values copied from v.
Invalid truncated encodings are replaced with the
fill_byte.
fill_byte: (integer or string) ascii code or single character string used if needed to
fill buffer at the end to prevent truncated utf8.
For convenience, <code>value</code> may be:
an unicode code point, such as 0x10400 '𐐀'
an array of code points
a string
... in any case, buffer(value) will return an array.
## string([src][src-link], options)
Convert an array-like object (buffer) to a javascript string.
options
* **[off][off-link]**: index to start at
* **[lim][lim-link]**: index to stop before
* **escape**: string expression, single ascii character or integer. (default is '?').
If ascii integer or string, illegal bytes will be replaced 1-for-1 by this value.
If expression of the form "!{%H}", then strings of illegal bytes will be prefixed
with the value before %H, such as '!{', and suffixed with value after %H, e.g. '}'
and bytes will be written as ascii hex between these values.
string() makes use of [qb-utf8-illegal-bytes](https://github.com/quicbit-js/qb-utf8-illegal-bytes)
to automatically detect and escape illegal UTF-8 encodings. The default decoding behavior
is to replace illegal values with '?':
var utf8 = require('qb-utf8-ez');
utf8.string([... some buffer with four illegal characters *** here *** then ok again.. ]);
> ... some buffer with illegal characters ??? then ok again..
Another option is to use the <code>encode</code> option to substitute bad bytes in situ, keeping
all other buffer contents in place.
utf8.string([...], { encode: '!{%H}' });
> ... some buffer with illegal characters !{F09082} then ok again..
## compare( [src1][src-link], [off1][off-link], [lim1][lim-link],[src2][src-link], [off2][off-link], [lim2][lim-link] )
Compare code points of two byte ranges holding UTF8-encoded data. The function works similarly
to the sort comparator in javascript.
return
* **1** if src1 selection is greater
* **-1** if src2 selection is greater
* **0** if selections are equal
(compare is also available as a
[separate package with zero dependencies](https://github.com/quicbit-js/qb-utf8-compare))
## fill(dst, sample, options)
Fill up a buffer with a smaller buffer sample which may be a string or array-like object.
options
off: index to start at
lim: index to stop before ( < lim )
escape: handling for illegal bytes (same as string(), above) (default is '?')
## join(buffers, joinbuf)
Like string.join(), but joins together arrays/buffers of bytes. Joins together buffers
into one buffer with <code>joinbuf</code> as a separator between each.
<code>buffers</code> can
be an array of array-like objects with byte/integer values. joinbuf can value accepted
by the <code>buffer()</code> function such as string or array of code points.
## escape_illegal([src][src-link], opt)
Return a buffer with illegal characters replaced. If a single character or number escape is given,
the buffer will be changed in place and returned. If an escape expression is given, a new
(longer) buffer will be returned copied from the old with the escaped areas.
Options <code>escape</code>, <code>off</code>, and <code>lim</code> work as they
do with <code>string()</code>, above.
## illegal_bytes([src][src-link], [off][off-link], [lim][lim-link])
Return ranges of illegal UTF-8 encoding. See [qb-utf8-illegal-bytes](https://github.com/quicbit-js/qb-utf8-illegal-bytes#illegal_bytessrc-off-lim)