@discoveryjs/json-ext
Version:
A set of utilities that extend the use of JSON
327 lines (259 loc) • 13.6 kB
Markdown
# json-ext
[](https://www.npmjs.com/package/@discoveryjs/json-ext)
[](https://github.com/discoveryjs/json-ext/actions/workflows/ci.yml)
[](https://coveralls.io/github/discoveryjs/json-ext)
[](https://www.npmjs.com/package/@discoveryjs/json-ext)
A set of utilities designed to extend JSON's capabilities, especially for handling large JSON datasets (over 100MB) efficiently and streaming JSONL/NDJSON processing:
- [parseChunked()](#parsechunked) – Parses JSON and JSONL/NDJSON incrementally; similar to [`JSON.parse()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse), but processing data in chunks.
- [stringifyChunked()](#stringifychunked) – Converts JavaScript objects to JSON or JSONL incrementally; similar to [`JSON.stringify()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify), but returns a generator that yields strings in parts.
- [stringifyInfo()](#stringifyinfo) – Estimates the size of the JSON or JSONL stringify result and identifies circular references without generating the output.
- [parseFromWebStream()](#parsefromwebstream) – A helper function to parse JSON chunks directly from a Web Stream.
- [createStringifyWebStream()](#createstringifywebstream) – A helper function to generate JSON data as a Web Stream.
### Key Features
- Optimized to handle large JSON data with minimal resource usage (see [benchmarks](./benchmarks/README.md))
- Built-in JSONL/NDJSON support for parsing and serializing newline-delimited JSON
- Works seamlessly with browsers, Node.js, Deno, and Bun
- Supports both Node.js and Web streams
- Available in both ESM and CommonJS
- TypeScript typings included
- No external dependencies
- Compact size: 9.0Kb (minified), 4.0Kb (min+gzip)
### Why json-ext?
- **Handles large JSON files**: Overcomes the limitations of V8 for strings larger than ~500MB, enabling the processing of huge JSON data.
- **Prevents main thread blocking**: Distributes parsing and stringifying over time, ensuring the main thread remains responsive during heavy JSON operations.
- **Reduces memory usage**: Traditional `JSON.parse()` and `JSON.stringify()` require loading entire data into memory, leading to high memory consumption and increased garbage collection pressure. `parseChunked()` and `stringifyChunked()` process data incrementally, optimizing memory usage.
- **Size estimation**: `stringifyInfo()` allows estimating the size of resulting JSON before generating it, enabling better decision-making for JSON generation strategies.
- **JSONL/NDJSON streaming**: Native support for parsing and serializing newline-delimited JSON, enabling efficient processing of log streams, data pipelines, and large datasets without loading everything into memory.
## Install
```bash
npm install @discoveryjs/json-ext
```
## API
### parseChunked()
Functions like [`JSON.parse()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse), iterating over chunks to reconstruct the result object, and returns a [Promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise).
```ts
function parseChunked(input: Iterable<Chunk> | AsyncIterable<Chunk>, reviver?: Reviver): Promise<any>;
function parseChunked(input: Iterable<Chunk> | AsyncIterable<Chunk>, options?: ParseChunkedOptions): Promise<any>;
function parseChunked(input: () => (Iterable<Chunk> | AsyncIterable<Chunk>), reviver?: Reviver): Promise<any>;
function parseChunked(input: () => (Iterable<Chunk> | AsyncIterable<Chunk>), options?: ParseChunkedOptions): Promise<any>;
type Chunk = string | Buffer | Uint8Array;
type Reviver = (this: any, key: string, value: any) => any;
type ParseChunkedOptions = {
reviver?: Reviver;
mode?: 'json' | 'jsonl' | 'auto';
onRootValue?: (value: any, state: ParseChunkState) => void;
onChunk?: (chunkParsed: number, chunk: string | null, pending: string | null, state: ParseChunkState) => void;
};
type ParseChunkState = {
mode: 'json' | 'jsonl';
rootValuesCount: number;
consumed: number;
parsed: number;
};
```
[Benchmark](https://github.com/discoveryjs/json-ext/tree/master/benchmarks#parse-chunked)
Usage:
```js
import { parseChunked } from '@discoveryjs/json-ext';
const data = await parseChunked(chunkEmitter);
```
Parameter `chunkEmitter` can be an iterable or async iterable that iterates over chunks, or a function returning such a value. A chunk can be a `string`, `Uint8Array`, or Node.js `Buffer`.
You can pass `reviver` either as the second argument (`parseChunked(input, reviver)`) or inside options (`parseChunked(input, { mode, reviver })`). `reviver` works the same way as in `JSON.parse()`.
`options.mode` controls JSON Lines support:
- `'json'` (default): parse as regular JSON;
- `'jsonl'`: parse as JSONL (Newline Delimited JSON) and always return an array of parsed lines;
- `'auto'`: parse as regular JSON, but switch to JSONL mode when an additional value appears after a newline.
`options.onRootValue` is called when a root value is parsed and finalized. When `onRootValue` is specified, `parseChunked()` resolves to the number of processed root values (instead of returning parsed value(s)), which allows processing huge or infinite streams without accumulating all values in memory.
`options.onChunk` is called after each input chunk is processed and once at the end with `chunk = null`. It provides parsing progress and parser state (`consumed`, `parsed`, current mode and root values count).
Examples:
- Generator:
```js
parseChunked(function*() {
yield '{ "hello":';
yield Buffer.from(' "wor'); // Node.js only
yield new TextEncoder().encode('ld" }'); // returns Uint8Array
});
```
- Async generator:
```js
parseChunked(async function*() {
for await (const chunk of someAsyncSource) {
yield chunk;
}
});
```
- Array:
```js
parseChunked(['{ "hello":', ' "world"}'])
```
- Function returning iterable:
```js
parseChunked(() => ['{ "hello":', ' "world"}'])
```
- Node.js [`Readable`](https://nodejs.org/dist/latest-v14.x/docs/api/stream.html#stream_readable_streams) stream:
```js
import fs from 'node:fs';
parseChunked(fs.createReadStream('path/to/file.json'))
```
- Web stream (e.g., using [fetch()](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API)):
> Note: Iterability for Web streams was added later in the Web platform, not all environments support it. Consider using `parseFromWebStream()` for broader compatibility.
```js
const response = await fetch('https://example.com/data.json');
const data = await parseChunked(response.body); // body is ReadableStream
```
### stringifyChunked()
Functions like [`JSON.stringify()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify), but returns a generator yielding strings instead of a single string.
> Note: Returns `"null"` when `JSON.stringify()` returns `undefined` (since a chunk cannot be `undefined`).
```ts
function stringifyChunked(value: any, replacer?: Replacer, space?: Space): Generator<string, void, unknown>;
function stringifyChunked(value: any, options: StringifyOptions): Generator<string, void, unknown>;
type Replacer =
| ((this: any, key: string, value: any) => any)
| (string | number)[]
| null;
type Space = string | number | null;
type StringifyOptions = {
replacer?: Replacer;
space?: Space;
mode?: 'json' | 'jsonl';
highWaterMark?: number;
};
```
[Benchmark](https://github.com/discoveryjs/json-ext/tree/master/benchmarks#stream-stringifying)
Usage:
- Getting an array of chunks:
```js
const chunks = [...stringifyChunked(data)];
```
- Iterating over chunks:
```js
for (const chunk of stringifyChunked(data)) {
console.log(chunk);
}
```
- Specifying the minimum size of a chunk with `highWaterMark` option:
```js
const data = [1, "hello world", 42];
console.log([...stringifyChunked(data)]); // default 16kB
// ['[1,"hello world",42]']
console.log([...stringifyChunked(data, { highWaterMark: 16 })]);
// ['[1,"hello world"', ',42]']
console.log([...stringifyChunked(data, { highWaterMark: 1 })]);
// ['[1', ',"hello world"', ',42', ']']
```
- JSONL output mode:
```js
const rows = [{ id: 1 }, { id: 2 }, { id: 3 }];
const jsonl = [...stringifyChunked(rows, { mode: 'jsonl' })].join('');
// {"id":1}\n{"id":2}\n{"id":3}
```
- Streaming into a stream with a `Promise` (modern Node.js):
```js
import { pipeline } from 'node:stream/promises';
import fs from 'node:fs';
await pipeline(
stringifyChunked(data),
fs.createWriteStream('path/to/file.json')
);
```
- Wrapping into a `Promise` streaming into a stream (legacy Node.js):
```js
import { Readable } from 'node:stream';
new Promise((resolve, reject) => {
Readable.from(stringifyChunked(data))
.on('error', reject)
.pipe(stream)
.on('error', reject)
.on('finish', resolve);
});
```
- Writing into a file synchronously:
> Note: Slower than `JSON.stringify()` but uses much less heap space and has no limitation on string length
```js
import fs from 'node:fs';
const fd = fs.openSync('output.json', 'w');
for (const chunk of stringifyChunked(data)) {
fs.writeFileSync(fd, chunk);
}
fs.closeSync(fd);
```
- Using with fetch (JSON streaming):
> Note: This feature has limited support in browsers, see [Streaming requests with the fetch API](https://developer.chrome.com/docs/capabilities/web-apis/fetch-streaming-requests)
> Note: `ReadableStream.from()` has limited [support in browsers](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/from_static), use [`createStringifyWebStream()`](#createstringifywebstream) instead.
```js
fetch('http://example.com', {
method: 'POST',
duplex: 'half',
body: ReadableStream.from(stringifyChunked(data))
});
```
- Wrapping into `ReadableStream`:
> Note: Use `ReadableStream.from()` or [`createStringifyWebStream()`](#createstringifywebstream) when no extra logic is needed
```js
new ReadableStream({
start() {
this.generator = stringifyChunked(data);
},
pull(controller) {
const { value, done } = this.generator.next();
if (done) {
controller.close();
} else {
controller.enqueue(value);
}
},
cancel() {
this.generator = null;
}
});
```
### stringifyInfo()
```ts
export function stringifyInfo(value: any, replacer?: Replacer, space?: Space): StringifyInfoResult;
export function stringifyInfo(value: any, options?: StringifyInfoOptions): StringifyInfoResult;
type StringifyInfoOptions = {
replacer?: Replacer;
space?: Space;
mode?: 'json' | 'jsonl';
continueOnCircular?: boolean;
}
type StringifyInfoResult = {
bytes: number; // size of JSON in bytes
spaceBytes: number; // size of white spaces in bytes (when space option used)
circular: object[]; // list of circular references
};
```
Functions like [`JSON.stringify()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify), but returns an object with the expected overall size of the stringify operation and a list of circular references.
Example:
```js
import { stringifyInfo } from '@discoveryjs/json-ext';
console.log(stringifyInfo({ test: true }, null, 4));
// {
// bytes: 20, // Buffer.byteLength('{\n "test": true\n}')
// spaceBytes: 7,
// circular: []
// }
```
#### Options
##### continueOnCircular
Type: `Boolean`
Default: `false`
Determines whether to continue collecting info for a value when a circular reference is found. Setting this option to `true` allows finding all circular references.
### parseFromWebStream()
A helper function to consume JSON from a Web Stream. You can use `parseChunked(stream)` instead, but `@@asyncIterator` on `ReadableStream` has limited support in browsers (see [ReadableStream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream) compatibility table).
```js
import { parseFromWebStream } from '@discoveryjs/json-ext';
const data = await parseFromWebStream(readableStream);
// equivalent to (when ReadableStream[@@asyncIterator] is supported):
// await parseChunked(readableStream);
```
### createStringifyWebStream()
A helper function to convert `stringifyChunked()` into a `ReadableStream` (Web Stream). You can use `ReadableStream.from()` instead, but this method has limited support in browsers (see [ReadableStream.from()](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/from_static) compatibility table).
```js
import { createStringifyWebStream } from '@discoveryjs/json-ext';
createStringifyWebStream({ test: true });
// equivalent to (when ReadableStream.from() is supported):
// ReadableStream.from(stringifyChunked({ test: true }))
```
## License
MIT