UNPKG

stream-chain

Version:

Chain functions, generators, Node streams, and Web streams into a pipeline with backpressure support.

317 lines (252 loc) 26 kB
# Architecture `stream-chain` is a library for building stream-processing pipelines from functions, generators, and existing streams. It has **zero runtime dependencies** — only dev dependencies for testing, benchmarking, and type-checking. 4.x ships three substrate variants: - **`stream-chain` / `stream-chain/node`** — canonical Node Streams chain (`Duplex`). Default. - **`stream-chain/web`** — native Web Streams chain (`{readable, writable}`). Browser-safe. - **`stream-chain/core`** — substrate-free async-iterable chain. No `node:stream`, no Web Streams. ## Project layout ``` package.json # Package config; "tape6" section configures test discovery; "exports" map drives subpaths src/ # Source code ├── index.js # /node entry: chain() factory + asStream + asWebStream + gen + dataSource + re-exports ├── index.d.ts # TypeScript declarations for /node ├── defs.js # Special values (none, stop, many, finalValue, flushable, fList) + Web Streams type guards ├── defs.d.ts ├── exec.js # Shared sync-when-possible value-or-promise executor — the engine behind gen/fun/asStream/asWebStream ├── exec.d.ts ├── gen.js # Push→pull async-generator bridge over exec ├── gen.d.ts ├── fun.js # Function pipeline from a list of functions (sync-first; collects via exec.next); exported via /web, /core ├── fun.d.ts ├── asStream.js # Wraps a function as a Node Duplex with per-item backpressure ├── asStream.d.ts ├── asWebStream.js # Wraps a function as a Web Streams {readable, writable} pair with per-item backpressure ├── asWebStream.d.ts ├── dataSource.js # dataSource(fn|iterable) — coerces to iterator-producing function (substrate-agnostic; on /node, /web, /core) ├── dataSource.d.ts ├── typed-streams.js # TypedReadable, TypedWritable, TypedDuplex, TypedTransform ├── typed-streams.d.ts ├── node/ # Subpath: stream-chain/node — thin re-export of root index │ ├── index.js │ ├── index.d.ts │ └── jsonl/ # Node-flavored bundled JSONL entries (+ .d.ts) │ ├── index.js # barrel: stream-chain/node/jsonl → {jsonlParser, jsonlStringer} │ ├── parser.js # jsonlParser() chain + .asStream (Duplex) + .asWebStream (Web pair) │ └── stringer.js # jsonlStringer() Transform + .asStream (self) + .asWebStream (Web) ├── web/ # Subpath: stream-chain/web — native Web Streams chain │ ├── index.js # chain() over duplex pairs; pipeTo wires stages together │ ├── index.d.ts │ └── jsonl/ # Web-flavored bundled JSONL entries — browser-safe, .asWebStream only (+ .d.ts) │ ├── index.js # barrel: stream-chain/web/jsonl → {jsonlParser, jsonlStringer} │ ├── parser.js # jsonlParser() chain + .asWebStream (Web pair) │ └── stringer.js # jsonlStringer() Web TransformStream + .asWebStream (self) ├── core/ # Subpath: stream-chain/core — async-iterable chain │ ├── index.js # chain() returns a callable async-generator factory │ └── index.d.ts ├── jsonl/ # JSONL (line-separated JSON) support │ ├── parser.js # JSONL parser: returns gen() pipeline (fixUtf8 → lines → JSON.parse) │ ├── parser.d.ts │ ├── parserStream.js # JSONL parser (Node Duplex): parser() wrapped with asStream │ ├── parserStream.d.ts │ ├── parserWebStream.js # JSONL parser (Web duplex pair): parser() wrapped with asWebStream │ ├── parserWebStream.d.ts │ ├── stringer.js # JSONL stringer (function-pipeline flushable): values → newline-separated JSON │ ├── stringer.d.ts │ ├── stringerStream.js # JSONL stringer (Node Transform): objects → newline-separated JSON strings │ ├── stringerStream.d.ts │ ├── stringerWebStream.js # JSONL stringer (Web TransformStream): same contract, Web Streams substrate │ ├── stringerWebStream.d.ts │ └── file/ # File-edge composites for JSONL (Node-only) │ ├── parser.js # parseFile(options) → gen(asyncBlockReader, parser) │ ├── parser.d.ts │ ├── stringer.js # stringerToFile(path, options) → gen(stringer, asyncBlockWriter) │ └── stringer.d.ts └── utils/ # Utility functions (most return values for use in chain()) ├── take.js # take(n, finalValue) — take N items, then stop ├── takeWhile.js # takeWhile(fn, finalValue) — take while predicate is true ├── takeWithSkip.js # takeWithSkip(n, skip, finalValue) — skip then take ├── skip.js # skip(n) — skip N items ├── skipWhile.js # skipWhile(fn) — skip while predicate is true ├── fold.js # fold(fn, initial) — reduce stream to single value at end ├── reduce.js # Alias for fold ├── scan.js # scan(fn, initial) — running accumulator, emits each step ├── batch.js # batch(size) — group items into fixed-size arrays ├── readableFrom.js # readableFrom({iterable}) — iterable/iterator to Node Readable ├── readableWebStreamFrom.js # readableWebStreamFrom({iterable}) — iterable/iterator to Web ReadableStream ├── reduceStream.js # reduceStream(fn, initial) — reduce as Node Writable (.accumulator) ├── reduceWebStream.js # reduceWebStream(fn, initial) — reduce as Web WritableStream ({writable, result, accumulator}) ├── fixUtf8Stream.js # fixUtf8Stream() — repartition chunks for valid UTF-8 ├── lines.js # lines() — split byte stream into lines ├── streamPuller.js # makeStreamPuller(readable) — wrap Node Readable as non-destructive async iterator ├── webStreamPuller.js # makeWebStreamPuller(readable) — wrap Web ReadableStream as non-destructive async iterator ├── drain.js # drain(asyncIter) — await an async iterable, return its last value ├── pipe.js # pipe(...stages) — one-shot single-value gen driver with end-of-input flush ├── asyncBlockReader.js # asyncBlockReader({readBlockSize?}) → (path) async-yields UTF-8 blocks (Node-only) ├── asyncBlockWriter.js # asyncBlockWriter(path, {writeBlockSize?}) — flushable file-block sink (Node-only) └── *.d.ts # TypeScript declarations for each utility tests/ # Test files organized by environment (tape-six) ├── core/ # Substrate-agnostic — runs in browser AND CLI (uses /web chain internally via runChain helper) ├── web/ # Web Streams substrate (asWebStream, /web chain, webStreamPuller) — runs in browser AND CLI ├── node/ # Node Streams substrate (asStream, JSONL via fs+zlib, streamPuller, etc.) — runs only in CLI ├── helpers.js # Node-stream test helpers (re-exports web-helpers) ├── web-helpers.js # Pure + Web Streams helpers (delay, webStreamToArray, writeAndCollect, runChain) ├── data/ # Test fixtures (used by tests/node/test-jsonl-*.js) └── manual/ # Manual test scripts (not part of the automated suite) bench/ # Benchmarks (chain-1-stage, chain-2-stage, raw-streams, gen-opt, fun-opt, …) wiki/ # GitHub wiki documentation (git submodule) .github/ # CI workflows, Dependabot config ``` ## Core concepts ### How chain() works (`/node` and `/web`) 1. User calls `chain(fns, options)` with an array of functions, streams, and/or arrays. 2. The array is flattened (nested arrays are inlined, falsy values removed). 3. Unless `noGrouping: true` (`/node` only), consecutive functions are grouped together using `gen()` for efficiency and wrapped into a single stream stage via `asStream()` (`/node`) or `asWebStream()` (`/web`). 4. All resulting stages are piped together sequentially — `Duplex.pipe()` in `/node`, `ReadableStream.pipeTo()` in `/web`. 5. A wrapper is created (Node `Duplex` for `/node`, plain `{readable, writable}` object for `/web`) that delegates writes to the first stage and reads from the last stage. 6. (`/node` only) Error events from all internal stages are forwarded to the wrapper unless `skipEvents: true`. (`/web` propagates errors via `pipeTo`'s default abort-on-error semantics.) 7. The wrapper exposes `.streams` (all internal stages), `.input` (first), and `.output` (last). ### How chain() works (`/core`) 1. Same flattening + function-list inlining (via `gen()`) as `/node`. 2. Returns a callable: `(input?) => AsyncGenerator<R>`. Input handling: - `null` / `undefined` → empty output. - String → passed through as a single value (strings are technically iterable, but treating them as a stream-of-characters is almost always the wrong intent). - Anything without `Symbol.iterator` / `Symbol.asyncIterator` (numbers, booleans, plain objects, …) → passed through as a single value. - Otherwise → iterated, with each yielded value driven through the composed pipeline. 3. No streams; `.streams` / `.input` / `.output` are `null` for parity with the substrate variants. ### Special return values (defs.js) Functions in a chain can return special values to control flow: | Value | Symbol | Effect | | ------------------- | --------------------- | -------------------------------------------------------------- | | `none` | `object-stream.none` | Skip — no value passed downstream | | `null`/`undefined` | — | Same as `none` in `asStream()`/`asWebStream()`/`chain()` | | `stop` | `object-stream.stop` | Skip and terminate the generator (gen/fun and stream wrappers) | | `many(values)` | `object-stream.many` | Emit multiple values from a single input | | `finalValue(value)` | `object-stream.final` | Skip remaining chain steps, emit value directly | | `flushable(fn)` | `object-stream.flush` | Mark function to be called at stream end with `none` | **Note on `null`/`undefined`:** `gen()` and `fun()` are general-purpose compositors that pass any value through the pipeline, including `null` and `undefined`. `asStream()`, `asWebStream()`, and `chain()` treat `null` and `undefined` as `none` (skip) because streams reserve these values for end-of-stream signaling. **Convention: generators yield plain values.** Generator functions (sync `function*` and async `async function*`) must NOT yield `none`, `stop`, `many(...)`, or `finalValue(...)`. Express those semantics with the language: skip with `continue`, terminate with `return`, emit multiple via separate `yield`s. The special markers are for regular-function returns only. See [wiki/defs.md § Convention: generators yield plain values](https://github.com/uhop/stream-chain/wiki/defs#convention-generators-yield-plain-values). ### exec() — the shared executor `exec.js` is the single engine that threads a value through a function-list and emits terminal values through a `push` callback. It is **not** an `async function` — it returns `undefined` when a value traversed the whole list synchronously, or a `Promise` when it had to suspend. It stays fully synchronous until the first real promise appears (an async stage, a thenable value, or a backpressuring push), then chains the remainder. This "sync-when-possible, value-or-promise" discipline is what lets purely synchronous pipelines avoid a per-item microtask. It duck-types each returned value the way 1.x `fun()` did: thenable → chain and resume; `many()` → expand; an object with `.next` → iterate as a generator (an async generator is just one whose `next()` returns a promise — no special case); `none`/`null` → drop; `stop` → throw `Stop`; `finalValue` → emit and short-circuit. Crucially, the **`push` return value is honored**: when `push` returns a Promise (a downstream backpressure signal), the executor suspends _at that push_ and chains the rest, keeping the queue bounded even when one input expands to a chunk-sized `many()`. Both the `many()` and generator paths resume via a `step` closure allocated **per actual suspension**, not per element — so live allocation stays O(1) in the array/generator length under backpressure. `exec.js` is internal (no public export); the four public compositors are thin adapters over `exec.next` / `exec.flush`: - **`gen()`** — a push→pull bridge: `exec` drives a producer whose pushes park on a promise the consumer resolves as it pulls, so production stays one item ahead. - **`fun()`** — collects every `push` into a `Many`. - **`asStream()`** / **`asWebStream()`** — drive `exec.next` on write and `exec.flush` on end, with `push` = the stream's backpressure-aware enqueue. ### gen() — async generator pipeline `gen(...fns)` takes multiple functions and returns a single async generator function. It is a push→pull bridge over the shared executor (`exec.next`, or `exec.flush` on `none`). The returned generator: 1. Processes each input value through the function pipeline sequentially. 2. Handles all special return values (`none`, `stop`, `many`, `finalValue`). 3. Supports regular, async, generator, and async generator functions. 4. Calls flushable functions with `none` when the input is exhausted. 5. Tags the result with a function list (`fListSymbol`) so `chain()` can inline it. ### fun() — function pipeline (sync-first) `fun(...fns)` is like `gen()` but returns a function instead of a generator. Generator results are collected into `many()` arrays. For purely synchronous pipelines it returns a synchronous result; for asynchronous pipelines it returns a `Promise`. **Memory caveat:** `fun()` collects the entire output of a single input into one `Many` before returning. Its memory footprint scales with output-per-input — unsafe for pipelines that produce unbounded values from a single input. `gen()` is the safe default; reach for `fun()` only when output-per-input is bounded and small. For this reason `fun()` is intentionally NOT on the default `stream-chain` / `/node` export — it requires an explicit import from `stream-chain/fun.js`. It is re-exported (and attached to `chain`) from the `/web` and `/core` subpaths where the output-size discipline is closer to the user's mental model. ### asStream() — function to Node Duplex `asStream(fn[, options])` wraps any function (regular, async, generator, async generator) as a `Duplex` stream. Per-item backpressure: every `stream.push()` is awaited if it returned `false`, keeping the readable queue at `hwm + 1` regardless of how many output values one input chunk produces. ### asWebStream() — function to Web Streams duplex pair `asWebStream(fn[, options])` wraps any function as a `{readable, writable}` Web Streams duplex pair. NOT a `TransformStream``transform()` can't suspend mid-call for per-item backpressure. Per-item backpressure: when `controller.desiredSize <= 0` after an `enqueue`, the next push returns a Promise that resolves when `pull()` fires. ### Stream-type detection (`/node` chain) `chain()` in `/node` detects stream types to decide how to integrate them: - **Node streams**: `isReadableNodeStream`, `isWritableNodeStream`, `isDuplexNodeStream` (local to `src/index.js`). - **Web streams**: `isReadableWebStream`, `isWritableWebStream`, `isDuplexWebStream` (canonical in `src/defs.js`, re-exported from `src/index.js`, `src/asWebStream.js`, `src/web/index.js`). - Web streams passed to the `/node` chain are adapted via `Readable.fromWeb()`, `Writable.fromWeb()`, `Duplex.fromWeb()` with `{objectMode: true}`. ### Async-iterator wrappers (`makeStreamPuller` / `makeWebStreamPuller`) `makeStreamPuller(readable)` wraps a Node `Readable` as a non-destructive async iterator — `stream.iterator({destroyOnReturn: false})` under the hood. Preserves the original `'error'` value, synthesizes `Error('Premature close')` on destroy-without-end, leaves the source alive when iteration ends early. `makeWebStreamPuller(readable)` wraps a Web `ReadableStream` similarly — `stream[Symbol.asyncIterator]({preventCancel: true})` plus a `cancel(reason)` extension method (the iterator-protocol `return()` can't carry a cancel reason cleanly). Both intended for downstream consumers (stream-join, stream-sorting) that need original-error preservation and non-destructive break. ### JSONL support - `parser(reviver?)` — returns a `gen()` pipeline: `fixUtf8Stream → lines → JSON.parse`. Each emitted record is `{key, value}` where `key` is the zero-based line index. Empty lines are dropped. Error handling: `ignoreErrors: true` drops failed lines but the counter still bumps (gappy keys; back-compat); `errorIndicator` (presence-checked) replaces failed lines with the value or with a function-form `(error, input, reviver) => unknown` result — `undefined` return drops without bumping the counter, so keys stay sequential. `errorIndicator` wins when both are set. - `parserStream(options?)` — wraps `parser()` with `asStream()`; threads `errorIndicator` through. - `parserWebStream(options?)` — wraps `parser()` with `asWebStream()`; threads `errorIndicator` through. - Raw export: `jsonlParser(options?)` (the per-line factory without the `fixUtf8Stream → lines` front, for callers whose chunks already arrive line-aligned). - `stringer(options?)` (`src/jsonl/stringer.js`) — function-pipeline flushable that serializes values to JSONL fragments. Used as the canonical building block; `stringerStream` (Node Transform) and `stringerWebStream` (Web TransformStream) keep their existing Transform-shape contracts for stream consumers. - `stringerStream(options?)` — Duplex stream that serializes objects to JSONL format. - Factory-bundled entries (in `src/node/jsonl/` and `src/web/jsonl/`): one factory carrying `.asStream` (Node `Duplex`, node entry only) and `.asWebStream` (Web pair) as methods — `jsonlParser.asStream()`, `jsonlStringer.asWebStream()`, etc. The node parser/stringer attach both adapters; the web entries omit `.asStream` and never import `node:stream` (browser-safe). The two `index.js` barrels export `{jsonlParser, jsonlStringer}` and are exposed as the `stream-chain/node/jsonl` and `stream-chain/web/jsonl` subpaths. These delegate to the suffixed adapters above; they exist so stream-json's deprecated JSONL can be migrated to stream-chain by changing only the import specifier. - File-edge composites (Node-only, in `src/jsonl/file/`): `parseFile(options)` returns `gen(asyncBlockReader, parser)` — drive with `pipe(...)` and a path; `stringerToFile(path, options)` returns `gen(stringer, asyncBlockWriter)` — drive with `pipe(...)` so the writer's flushable closes the file. Round-trip via `pipe(parseFile(), r => r.value, stringerToFile(out))` is ~40% faster than the equivalent `fs streams + parserStream + stringerStream` pipeline on 50k-row fixtures (see `bench/jsonl-file.js`). The gain comes from collapsing the per-chunk Transform/Writable boundaries into one fused executor. ### Utility functions All utilities return functions or constructors suitable for use in `chain()`: - **Slicing**: `take`, `takeWhile`, `takeWithSkip`, `skip`, `skipWhile` - **Folding**: `fold` (reduce to single value at end), `scan` (emit running accumulator), `reduce` (alias for fold), `reduceStream` (Writable stream with `.accumulator`) - **Batching**: `batch(size)` — group items into arrays - **Stream helpers**: `readableFrom` (iterable → Readable), `fixUtf8Stream` (UTF-8 repartitioning), `lines` (byte stream → line stream) - **Async-iterator wrappers**: `makeStreamPuller` (Node Readable), `makeWebStreamPuller` (Web ReadableStream) ## Module dependency graph ``` src/index.js (= /node) ── src/defs.js, src/gen.js, src/asStream.js, src/asWebStream.js src/node/index.js ── src/index.js (thin re-export) src/web/index.js ── src/defs.js, src/gen.js, src/fun.js, src/asWebStream.js src/core/index.js ── src/defs.js, src/gen.js, src/fun.js src/exec.js ── src/defs.js # shared sync-when-possible executor src/asStream.js ── src/defs.js, src/exec.js src/asWebStream.js ── src/defs.js, src/exec.js src/gen.js ── src/defs.js, src/exec.js src/fun.js ── src/defs.js, src/exec.js src/jsonl/parser.js ── src/gen.js, src/utils/fixUtf8Stream.js, src/utils/lines.js src/jsonl/parserStream.js ── src/jsonl/parser.js, src/asStream.js src/jsonl/stringerStream.js (standalone Duplex) src/utils/* ── src/defs.js (most utilities use none, stop, many, flushable) src/utils/streamPuller.js ── (just wraps stream.iterator()) src/utils/webStreamPuller.js ── (just wraps stream[Symbol.asyncIterator]()) ``` ## Testing - **Framework**: tape-six (`tape6`) - **Run all**: `npm test` (parallel workers via `tape6 --flags FO`) - **Run single file**: `node tests/test-<name>.js` - **Run with Bun**: `npm run test:bun` - **Run with Deno**: `npm run test:deno` - **Run sequential**: `npm run test:seq` (also `test:seq:bun`, `test:seq:deno`) - **TypeScript check**: `npm run ts-check` - **JavaScript type check (dual tsconfig)**: `npm run js-check` - **TypeScript tests**: `npm run ts-test` (also `ts-test:bun`, `ts-test:deno`) - **Lint**: `npm run lint` (Prettier check) - **Lint fix**: `npm run lint:fix` (Prettier write) ## Benchmarks Benchmarks use [nano-benchmark](https://www.npmjs.com/package/nano-benchmark). Run a benchmark by specifying its file: ```bash npm run bench -- bench/<name>.js ``` ### Key benchmark files | File | What it measures | | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | `bench/chain-1-stage.js` | 1-stage chain: `/node` (`asStream(gen(...))`) vs `/web` (`asWebStream(gen(...))`) on the same function pipeline. Shows substrate overhead. | | `bench/chain-2-stage.js` | Same as above but 2 stages. Shows per-composition overhead. | | `bench/raw-streams.js` | Raw Node Duplex vs raw Web Streams duplex (`{readable, writable}`). Substrate baseline without `stream-chain` involvement. | | `bench/raw-streams-burst.js` | Web-streams burst-enqueue behavior: synchronous enqueue-then-drain vs per-item drain. | | `bench/core-chain.js` | `/core` chain throughput — no substrate cost; isolates `gen()`'s overhead. | | `bench/gen-fun-stream.js` | Compares `gen()`, `fun()`, and `chain(asStream(...))` on the same pipeline of sync functions. | | `bench/gen-fun.js` | Head-to-head `gen()` vs `fun()` without stream overhead. | | `bench/gen-opt.js` | `gen()` function-list inlining optimization: flat vs nested-with-inlining vs nested-with-`clearFunctionList()`. | | `bench/fun-opt.js` | Same as `gen-opt.js` but for `fun()`. | All benchmarks use a pipeline of simple sync arithmetic functions (`x => x - 2`, `x => x + 1`, etc.) to isolate framework overhead from application logic. ## Import paths ```js // Main API (default = /node; ESM) import chain from 'stream-chain'; import {chain, none, stop, many, gen, asStream, asWebStream, dataSource} from 'stream-chain'; // CJS — destructure required (no bare-callable fallback in 4.x) const {chain} = require('stream-chain'); // Substrate variants import chain from 'stream-chain/node'; // same as the default import chain from 'stream-chain/web'; // native Web Streams chain import chain from 'stream-chain/core'; // substrate-free async-iterable chain // Individual modules import gen from 'stream-chain/gen.js'; import fun from 'stream-chain/fun.js'; import asStream from 'stream-chain/asStream.js'; import asWebStream from 'stream-chain/asWebStream.js'; import {none, stop, many, finalValue, flushable, isReadableWebStream} from 'stream-chain/defs.js'; // Utilities import take from 'stream-chain/utils/take.js'; import fold from 'stream-chain/utils/fold.js'; import batch from 'stream-chain/utils/batch.js'; import makeStreamPuller from 'stream-chain/utils/streamPuller.js'; import makeWebStreamPuller from 'stream-chain/utils/webStreamPuller.js'; // JSONL import parser from 'stream-chain/jsonl/parser.js'; import parserStream from 'stream-chain/jsonl/parserStream.js'; import stringerStream from 'stream-chain/jsonl/stringerStream.js'; // TypeScript helpers import {TypedTransform} from 'stream-chain/typed-streams.js'; ```