@jswalden/streaming-json
Version:
Streaming JSON parsing and stringification for JavaScript/TypeScript
147 lines (114 loc) • 6.53 kB
Markdown
# streaming-json
This package implements streaming versions of
[`JSON.parse`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse)
and
[`JSON.stringify`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify)
functionality. Read [the full API documentation](https://jswalden.github.io/streaming-json/)
or a high-level package overview below.
The operations in this package behave consistent with ECMAScript semantics, but
modifications to various standard-library functionality can interfere with these
semantics. (And of course user code between `stringify` iteration or
`add(fragment)` operations can perform actions that alter the intermediate
states dictated by ECMAScript semantics.)
## Stringification
This package implements a `stringify` function that returns an iterable iterator
over the fragments that constitute the JSON stringification of a value:
```js
import { stringify } from "@jswalden/streaming-json";
async function writeAsJSONToFileAsync(value, file) {
for (const frag of stringify(value, null, " ")) {
await file.write(frag);
}
}
```
`stringify` implements JSON stringification where it's undesirable (or
impossible because the entire stringification is too large to represent as a JS
string or in memory) to compute the entire JSON string at once. It accepts the
same arguments as `JSON.stringify` (albeit with narrower types to make clearer
code). It returns an iterable iterator that yields successive fragments of the
overall JSON stringification.[^between-emits]
[^between-emits]: If the object graph being stringified is modified between calls to the
iterator's `next()` function, stringification behavior will change in
potentially unexpected ways. You should take care to protect your value being
stringified from modification during the stringification process to prevent
confusing behavior.
Where fragment boundaries are placed is explicitly not defined. Thus for
example `stringify(true, null, "")` might successively yield `"t"`, `"ru"`,
`"e"` — or instead simply `"true"`. Don't make semantically visible
distinctions based on where these boundaries occur!
If any operation during iteration throws (e.g. property gets, `toJSON`
invocations, stray `bigint` values in the graph), the `next()` call that
triggers that operation will throw that value.
As long as type signatures are respected, the stringification performed by
`stringify` is the same as `JSON.stringify(value, replacer, space)` performs.
However, one special case must be noted: if `JSON.stringify` would return the
literal value `undefined` and not a string value[^stringify-not-string], the
iterator returned by `stringify` will produce no fragments:
```js
import { stringify } from "@jswalden/streaming-json";
const value = () => 42;
let res = JSON.stringify(value, null, 2);
assert(res === undefined); // not a string value!
let frags = [...stringify(value, null, 2)];
assert(frags.length === 0);
```
[^stringify-not-string]: `JSON.stringify` returns `undefined` if the `value`
passed to it is `undefined`, a
[symbol](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol),
a callable object (i.e. `typeof value === "function"`), or an object whose
`toJSON` property is a function that returns one of these values. It also
returns `undefined` if a `replacer` function is supplied and if `replacer`, when
invoked for `value`, returns `undefined`, a symbol, or a callable object.
It's incumbent upon users who stringify sufficiently-broad values or use
sufficiently-uncautious `replacer` functions to appropriately handle no
fragments being iterated.
## Parsing
This package exports a `StreamingJSONParser` class that can be used to
incrementally parse fragments of a full JSON text. Create a
`StreamingJSONParser`, feed it JSON fragments using `add(fragment)`, and then
finish parsing and retrieve the result of parsing using `finish()` -- passing a
`reviver` that behaves as the optional `reviver` argument to `JSON.parse` would
if desired:
```js
import { JSONParser } from "@jswalden/streaming-json";
const parser = new StreamingJSONParser();
parser.add("{");
parser.add('"property');
parser.add('Name": 1');
parser.add('7, "complex": {');
parser.add("}}");
const result = parser.finish();
assert(typeof result === "object" && result !== null);
assert(result.propertyName === 17);
assert(typeof result.complex === "object" && result.complex !== null);
assert(Object.keys(result.complex).length === 0);
const withReviver = new StreamingJSONParser();
withReviver.add("true");
const resultWithReviver = withReviver.finish(function(_name, _value) {
// throws away `this[_name] === _value` where `_value === true`
return 42;
});
assert(resultWithReviver === 42);
```
If the fragments can't be the prefix of valid JSON, the `add(fragment)` that
creates this condition will throw a `SyntaxError`. If the fragments aren't
valid JSON at time `finish()` is called, `finish()` will throw a `SyntaxError`.
`add(fragment)` and `finish()` may only be called while parsing is incomplete
and has not fallen into error: after this the parser is no longer usable.
## Known issues
### `stringify` misinterprets boxed primitives from other globals as records
`JSON.stringify` treats boxed primitives, e.g. `new Boolean(false)`, as if they
were the primitive value. This happens even for boxed primitives from other
global objects/realms, e.g. `new (window.open("about:blank").Boolean)(false)`.
It's not possible to detect cross-global boxed primitives without substantially
slowing down stringifying objects that aren't boxed primitives.[^inefficient]
Therefore this package's `stringify` function
[doesn't recognize cross-global boxed primitives as such](https://github.com/jswalden/streaming-json/issues/1)
and instead interprets them as records of property/value pairs.
[^inefficient]: Boxed primitives can be detected and unboxed, regardless of
global/realm, using [`Number.prototype.valueOf`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/valueOf)
and similar: if the object is that kind of boxed primitive, the function call
returns the primitive value, and if not it throws a `TypeError`. But an object
that isn't a boxed primitive would incur exception creation/throwing/catching
overhead *four times* for `Number`, `String`, `Boolean`, and `BigInt`:
unacceptable overhead when cross-global objects are likely never used.