stream-chain
Version:
Chain functions, generators, Node streams, and Web streams into a pipeline with backpressure support.
317 lines (252 loc) • 26 kB
Markdown
# Architecture
`stream-chain` is a library for building stream-processing pipelines from functions, generators, and existing streams. It has **zero runtime dependencies** — only dev dependencies for testing, benchmarking, and type-checking. 4.x ships three substrate variants:
- **`stream-chain` / `stream-chain/node`** — canonical Node Streams chain (`Duplex`). Default.
- **`stream-chain/web`** — native Web Streams chain (`{readable, writable}`). Browser-safe.
- **`stream-chain/core`** — substrate-free async-iterable chain. No `node:stream`, no Web Streams.
## Project layout
```
package.json # Package config; "tape6" section configures test discovery; "exports" map drives subpaths
src/ # Source code
├── index.js # /node entry: chain() factory + asStream + asWebStream + gen + dataSource + re-exports
├── index.d.ts # TypeScript declarations for /node
├── defs.js # Special values (none, stop, many, finalValue, flushable, fList) + Web Streams type guards
├── defs.d.ts
├── exec.js # Shared sync-when-possible value-or-promise executor — the engine behind gen/fun/asStream/asWebStream
├── exec.d.ts
├── gen.js # Push→pull async-generator bridge over exec
├── gen.d.ts
├── fun.js # Function pipeline from a list of functions (sync-first; collects via exec.next); exported via /web, /core
├── fun.d.ts
├── asStream.js # Wraps a function as a Node Duplex with per-item backpressure
├── asStream.d.ts
├── asWebStream.js # Wraps a function as a Web Streams {readable, writable} pair with per-item backpressure
├── asWebStream.d.ts
├── dataSource.js # dataSource(fn|iterable) — coerces to iterator-producing function (substrate-agnostic; on /node, /web, /core)
├── dataSource.d.ts
├── typed-streams.js # TypedReadable, TypedWritable, TypedDuplex, TypedTransform
├── typed-streams.d.ts
├── node/ # Subpath: stream-chain/node — thin re-export of root index
│ ├── index.js
│ ├── index.d.ts
│ └── jsonl/ # Node-flavored bundled JSONL entries (+ .d.ts)
│ ├── index.js # barrel: stream-chain/node/jsonl → {jsonlParser, jsonlStringer}
│ ├── parser.js # jsonlParser() chain + .asStream (Duplex) + .asWebStream (Web pair)
│ └── stringer.js # jsonlStringer() Transform + .asStream (self) + .asWebStream (Web)
├── web/ # Subpath: stream-chain/web — native Web Streams chain
│ ├── index.js # chain() over duplex pairs; pipeTo wires stages together
│ ├── index.d.ts
│ └── jsonl/ # Web-flavored bundled JSONL entries — browser-safe, .asWebStream only (+ .d.ts)
│ ├── index.js # barrel: stream-chain/web/jsonl → {jsonlParser, jsonlStringer}
│ ├── parser.js # jsonlParser() chain + .asWebStream (Web pair)
│ └── stringer.js # jsonlStringer() Web TransformStream + .asWebStream (self)
├── core/ # Subpath: stream-chain/core — async-iterable chain
│ ├── index.js # chain() returns a callable async-generator factory
│ └── index.d.ts
├── jsonl/ # JSONL (line-separated JSON) support
│ ├── parser.js # JSONL parser: returns gen() pipeline (fixUtf8 → lines → JSON.parse)
│ ├── parser.d.ts
│ ├── parserStream.js # JSONL parser (Node Duplex): parser() wrapped with asStream
│ ├── parserStream.d.ts
│ ├── parserWebStream.js # JSONL parser (Web duplex pair): parser() wrapped with asWebStream
│ ├── parserWebStream.d.ts
│ ├── stringer.js # JSONL stringer (function-pipeline flushable): values → newline-separated JSON
│ ├── stringer.d.ts
│ ├── stringerStream.js # JSONL stringer (Node Transform): objects → newline-separated JSON strings
│ ├── stringerStream.d.ts
│ ├── stringerWebStream.js # JSONL stringer (Web TransformStream): same contract, Web Streams substrate
│ ├── stringerWebStream.d.ts
│ └── file/ # File-edge composites for JSONL (Node-only)
│ ├── parser.js # parseFile(options) → gen(asyncBlockReader, parser)
│ ├── parser.d.ts
│ ├── stringer.js # stringerToFile(path, options) → gen(stringer, asyncBlockWriter)
│ └── stringer.d.ts
└── utils/ # Utility functions (most return values for use in chain())
├── take.js # take(n, finalValue) — take N items, then stop
├── takeWhile.js # takeWhile(fn, finalValue) — take while predicate is true
├── takeWithSkip.js # takeWithSkip(n, skip, finalValue) — skip then take
├── skip.js # skip(n) — skip N items
├── skipWhile.js # skipWhile(fn) — skip while predicate is true
├── fold.js # fold(fn, initial) — reduce stream to single value at end
├── reduce.js # Alias for fold
├── scan.js # scan(fn, initial) — running accumulator, emits each step
├── batch.js # batch(size) — group items into fixed-size arrays
├── readableFrom.js # readableFrom({iterable}) — iterable/iterator to Node Readable
├── readableWebStreamFrom.js # readableWebStreamFrom({iterable}) — iterable/iterator to Web ReadableStream
├── reduceStream.js # reduceStream(fn, initial) — reduce as Node Writable (.accumulator)
├── reduceWebStream.js # reduceWebStream(fn, initial) — reduce as Web WritableStream ({writable, result, accumulator})
├── fixUtf8Stream.js # fixUtf8Stream() — repartition chunks for valid UTF-8
├── lines.js # lines() — split byte stream into lines
├── streamPuller.js # makeStreamPuller(readable) — wrap Node Readable as non-destructive async iterator
├── webStreamPuller.js # makeWebStreamPuller(readable) — wrap Web ReadableStream as non-destructive async iterator
├── drain.js # drain(asyncIter) — await an async iterable, return its last value
├── pipe.js # pipe(...stages) — one-shot single-value gen driver with end-of-input flush
├── asyncBlockReader.js # asyncBlockReader({readBlockSize?}) → (path) async-yields UTF-8 blocks (Node-only)
├── asyncBlockWriter.js # asyncBlockWriter(path, {writeBlockSize?}) — flushable file-block sink (Node-only)
└── *.d.ts # TypeScript declarations for each utility
tests/ # Test files organized by environment (tape-six)
├── core/ # Substrate-agnostic — runs in browser AND CLI (uses /web chain internally via runChain helper)
├── web/ # Web Streams substrate (asWebStream, /web chain, webStreamPuller) — runs in browser AND CLI
├── node/ # Node Streams substrate (asStream, JSONL via fs+zlib, streamPuller, etc.) — runs only in CLI
├── helpers.js # Node-stream test helpers (re-exports web-helpers)
├── web-helpers.js # Pure + Web Streams helpers (delay, webStreamToArray, writeAndCollect, runChain)
├── data/ # Test fixtures (used by tests/node/test-jsonl-*.js)
└── manual/ # Manual test scripts (not part of the automated suite)
bench/ # Benchmarks (chain-1-stage, chain-2-stage, raw-streams, gen-opt, fun-opt, …)
wiki/ # GitHub wiki documentation (git submodule)
.github/ # CI workflows, Dependabot config
```
## Core concepts
### How chain() works (`/node` and `/web`)
1. User calls `chain(fns, options)` with an array of functions, streams, and/or arrays.
2. The array is flattened (nested arrays are inlined, falsy values removed).
3. Unless `noGrouping: true` (`/node` only), consecutive functions are grouped together using `gen()` for efficiency and wrapped into a single stream stage via `asStream()` (`/node`) or `asWebStream()` (`/web`).
4. All resulting stages are piped together sequentially — `Duplex.pipe()` in `/node`, `ReadableStream.pipeTo()` in `/web`.
5. A wrapper is created (Node `Duplex` for `/node`, plain `{readable, writable}` object for `/web`) that delegates writes to the first stage and reads from the last stage.
6. (`/node` only) Error events from all internal stages are forwarded to the wrapper unless `skipEvents: true`. (`/web` propagates errors via `pipeTo`'s default abort-on-error semantics.)
7. The wrapper exposes `.streams` (all internal stages), `.input` (first), and `.output` (last).
### How chain() works (`/core`)
1. Same flattening + function-list inlining (via `gen()`) as `/node`.
2. Returns a callable: `(input?) => AsyncGenerator<R>`. Input handling:
- `null` / `undefined` → empty output.
- String → passed through as a single value (strings are technically iterable, but treating them as a stream-of-characters is almost always the wrong intent).
- Anything without `Symbol.iterator` / `Symbol.asyncIterator` (numbers, booleans, plain objects, …) → passed through as a single value.
- Otherwise → iterated, with each yielded value driven through the composed pipeline.
3. No streams; `.streams` / `.input` / `.output` are `null` for parity with the substrate variants.
### Special return values (defs.js)
Functions in a chain can return special values to control flow:
| Value | Symbol | Effect |
| ------------------- | --------------------- | -------------------------------------------------------------- |
| `none` | `object-stream.none` | Skip — no value passed downstream |
| `null`/`undefined` | — | Same as `none` in `asStream()`/`asWebStream()`/`chain()` |
| `stop` | `object-stream.stop` | Skip and terminate the generator (gen/fun and stream wrappers) |
| `many(values)` | `object-stream.many` | Emit multiple values from a single input |
| `finalValue(value)` | `object-stream.final` | Skip remaining chain steps, emit value directly |
| `flushable(fn)` | `object-stream.flush` | Mark function to be called at stream end with `none` |
**Note on `null`/`undefined`:** `gen()` and `fun()` are general-purpose compositors that pass any value through the pipeline, including `null` and `undefined`. `asStream()`, `asWebStream()`, and `chain()` treat `null` and `undefined` as `none` (skip) because streams reserve these values for end-of-stream signaling.
**Convention: generators yield plain values.** Generator functions (sync `function*` and async `async function*`) must NOT yield `none`, `stop`, `many(...)`, or `finalValue(...)`. Express those semantics with the language: skip with `continue`, terminate with `return`, emit multiple via separate `yield`s. The special markers are for regular-function returns only. See [wiki/defs.md § Convention: generators yield plain values](https://github.com/uhop/stream-chain/wiki/defs#convention-generators-yield-plain-values).
### exec() — the shared executor
`exec.js` is the single engine that threads a value through a function-list and emits terminal values through a `push` callback. It is **not** an `async function` — it returns `undefined` when a value traversed the whole list synchronously, or a `Promise` when it had to suspend. It stays fully synchronous until the first real promise appears (an async stage, a thenable value, or a backpressuring push), then chains the remainder. This "sync-when-possible, value-or-promise" discipline is what lets purely synchronous pipelines avoid a per-item microtask.
It duck-types each returned value the way 1.x `fun()` did: thenable → chain and resume; `many()` → expand; an object with `.next` → iterate as a generator (an async generator is just one whose `next()` returns a promise — no special case); `none`/`null` → drop; `stop` → throw `Stop`; `finalValue` → emit and short-circuit.
Crucially, the **`push` return value is honored**: when `push` returns a Promise (a downstream backpressure signal), the executor suspends _at that push_ and chains the rest, keeping the queue bounded even when one input expands to a chunk-sized `many()`. Both the `many()` and generator paths resume via a `step` closure allocated **per actual suspension**, not per element — so live allocation stays O(1) in the array/generator length under backpressure.
`exec.js` is internal (no public export); the four public compositors are thin adapters over `exec.next` / `exec.flush`:
- **`gen()`** — a push→pull bridge: `exec` drives a producer whose pushes park on a promise the consumer resolves as it pulls, so production stays one item ahead.
- **`fun()`** — collects every `push` into a `Many`.
- **`asStream()`** / **`asWebStream()`** — drive `exec.next` on write and `exec.flush` on end, with `push` = the stream's backpressure-aware enqueue.
### gen() — async generator pipeline
`gen(...fns)` takes multiple functions and returns a single async generator function. It is a push→pull bridge over the shared executor (`exec.next`, or `exec.flush` on `none`). The returned generator:
1. Processes each input value through the function pipeline sequentially.
2. Handles all special return values (`none`, `stop`, `many`, `finalValue`).
3. Supports regular, async, generator, and async generator functions.
4. Calls flushable functions with `none` when the input is exhausted.
5. Tags the result with a function list (`fListSymbol`) so `chain()` can inline it.
### fun() — function pipeline (sync-first)
`fun(...fns)` is like `gen()` but returns a function instead of a generator. Generator results are collected into `many()` arrays. For purely synchronous pipelines it returns a synchronous result; for asynchronous pipelines it returns a `Promise`.
**Memory caveat:** `fun()` collects the entire output of a single input into one `Many` before returning. Its memory footprint scales with output-per-input — unsafe for pipelines that produce unbounded values from a single input. `gen()` is the safe default; reach for `fun()` only when output-per-input is bounded and small.
For this reason `fun()` is intentionally NOT on the default `stream-chain` / `/node` export — it requires an explicit import from `stream-chain/fun.js`. It is re-exported (and attached to `chain`) from the `/web` and `/core` subpaths where the output-size discipline is closer to the user's mental model.
### asStream() — function to Node Duplex
`asStream(fn[, options])` wraps any function (regular, async, generator, async generator) as a `Duplex` stream. Per-item backpressure: every `stream.push()` is awaited if it returned `false`, keeping the readable queue at `hwm + 1` regardless of how many output values one input chunk produces.
### asWebStream() — function to Web Streams duplex pair
`asWebStream(fn[, options])` wraps any function as a `{readable, writable}` Web Streams duplex pair. NOT a `TransformStream` — `transform()` can't suspend mid-call for per-item backpressure. Per-item backpressure: when `controller.desiredSize <= 0` after an `enqueue`, the next push returns a Promise that resolves when `pull()` fires.
### Stream-type detection (`/node` chain)
`chain()` in `/node` detects stream types to decide how to integrate them:
- **Node streams**: `isReadableNodeStream`, `isWritableNodeStream`, `isDuplexNodeStream` (local to `src/index.js`).
- **Web streams**: `isReadableWebStream`, `isWritableWebStream`, `isDuplexWebStream` (canonical in `src/defs.js`, re-exported from `src/index.js`, `src/asWebStream.js`, `src/web/index.js`).
- Web streams passed to the `/node` chain are adapted via `Readable.fromWeb()`, `Writable.fromWeb()`, `Duplex.fromWeb()` with `{objectMode: true}`.
### Async-iterator wrappers (`makeStreamPuller` / `makeWebStreamPuller`)
`makeStreamPuller(readable)` wraps a Node `Readable` as a non-destructive async iterator — `stream.iterator({destroyOnReturn: false})` under the hood. Preserves the original `'error'` value, synthesizes `Error('Premature close')` on destroy-without-end, leaves the source alive when iteration ends early.
`makeWebStreamPuller(readable)` wraps a Web `ReadableStream` similarly — `stream[Symbol.asyncIterator]({preventCancel: true})` plus a `cancel(reason)` extension method (the iterator-protocol `return()` can't carry a cancel reason cleanly).
Both intended for downstream consumers (stream-join, stream-sorting) that need original-error preservation and non-destructive break.
### JSONL support
- `parser(reviver?)` — returns a `gen()` pipeline: `fixUtf8Stream → lines → JSON.parse`. Each emitted record is `{key, value}` where `key` is the zero-based line index. Empty lines are dropped. Error handling: `ignoreErrors: true` drops failed lines but the counter still bumps (gappy keys; back-compat); `errorIndicator` (presence-checked) replaces failed lines with the value or with a function-form `(error, input, reviver) => unknown` result — `undefined` return drops without bumping the counter, so keys stay sequential. `errorIndicator` wins when both are set.
- `parserStream(options?)` — wraps `parser()` with `asStream()`; threads `errorIndicator` through.
- `parserWebStream(options?)` — wraps `parser()` with `asWebStream()`; threads `errorIndicator` through.
- Raw export: `jsonlParser(options?)` (the per-line factory without the `fixUtf8Stream → lines` front, for callers whose chunks already arrive line-aligned).
- `stringer(options?)` (`src/jsonl/stringer.js`) — function-pipeline flushable that serializes values to JSONL fragments. Used as the canonical building block; `stringerStream` (Node Transform) and `stringerWebStream` (Web TransformStream) keep their existing Transform-shape contracts for stream consumers.
- `stringerStream(options?)` — Duplex stream that serializes objects to JSONL format.
- Factory-bundled entries (in `src/node/jsonl/` and `src/web/jsonl/`): one factory carrying `.asStream` (Node `Duplex`, node entry only) and `.asWebStream` (Web pair) as methods — `jsonlParser.asStream()`, `jsonlStringer.asWebStream()`, etc. The node parser/stringer attach both adapters; the web entries omit `.asStream` and never import `node:stream` (browser-safe). The two `index.js` barrels export `{jsonlParser, jsonlStringer}` and are exposed as the `stream-chain/node/jsonl` and `stream-chain/web/jsonl` subpaths. These delegate to the suffixed adapters above; they exist so stream-json's deprecated JSONL can be migrated to stream-chain by changing only the import specifier.
- File-edge composites (Node-only, in `src/jsonl/file/`): `parseFile(options)` returns `gen(asyncBlockReader, parser)` — drive with `pipe(...)` and a path; `stringerToFile(path, options)` returns `gen(stringer, asyncBlockWriter)` — drive with `pipe(...)` so the writer's flushable closes the file. Round-trip via `pipe(parseFile(), r => r.value, stringerToFile(out))` is ~40% faster than the equivalent `fs streams + parserStream + stringerStream` pipeline on 50k-row fixtures (see `bench/jsonl-file.js`). The gain comes from collapsing the per-chunk Transform/Writable boundaries into one fused executor.
### Utility functions
All utilities return functions or constructors suitable for use in `chain()`:
- **Slicing**: `take`, `takeWhile`, `takeWithSkip`, `skip`, `skipWhile`
- **Folding**: `fold` (reduce to single value at end), `scan` (emit running accumulator), `reduce` (alias for fold), `reduceStream` (Writable stream with `.accumulator`)
- **Batching**: `batch(size)` — group items into arrays
- **Stream helpers**: `readableFrom` (iterable → Readable), `fixUtf8Stream` (UTF-8 repartitioning), `lines` (byte stream → line stream)
- **Async-iterator wrappers**: `makeStreamPuller` (Node Readable), `makeWebStreamPuller` (Web ReadableStream)
## Module dependency graph
```
src/index.js (= /node) ── src/defs.js, src/gen.js, src/asStream.js, src/asWebStream.js
src/node/index.js ── src/index.js (thin re-export)
src/web/index.js ── src/defs.js, src/gen.js, src/fun.js, src/asWebStream.js
src/core/index.js ── src/defs.js, src/gen.js, src/fun.js
src/exec.js ── src/defs.js # shared sync-when-possible executor
src/asStream.js ── src/defs.js, src/exec.js
src/asWebStream.js ── src/defs.js, src/exec.js
src/gen.js ── src/defs.js, src/exec.js
src/fun.js ── src/defs.js, src/exec.js
src/jsonl/parser.js ── src/gen.js, src/utils/fixUtf8Stream.js, src/utils/lines.js
src/jsonl/parserStream.js ── src/jsonl/parser.js, src/asStream.js
src/jsonl/stringerStream.js (standalone Duplex)
src/utils/* ── src/defs.js (most utilities use none, stop, many, flushable)
src/utils/streamPuller.js ── (just wraps stream.iterator())
src/utils/webStreamPuller.js ── (just wraps stream[Symbol.asyncIterator]())
```
## Testing
- **Framework**: tape-six (`tape6`)
- **Run all**: `npm test` (parallel workers via `tape6 --flags FO`)
- **Run single file**: `node tests/test-<name>.js`
- **Run with Bun**: `npm run test:bun`
- **Run with Deno**: `npm run test:deno`
- **Run sequential**: `npm run test:seq` (also `test:seq:bun`, `test:seq:deno`)
- **TypeScript check**: `npm run ts-check`
- **JavaScript type check (dual tsconfig)**: `npm run js-check`
- **TypeScript tests**: `npm run ts-test` (also `ts-test:bun`, `ts-test:deno`)
- **Lint**: `npm run lint` (Prettier check)
- **Lint fix**: `npm run lint:fix` (Prettier write)
## Benchmarks
Benchmarks use [nano-benchmark](https://www.npmjs.com/package/nano-benchmark). Run a benchmark by specifying its file:
```bash
npm run bench -- bench/<name>.js
```
### Key benchmark files
| File | What it measures |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `bench/chain-1-stage.js` | 1-stage chain: `/node` (`asStream(gen(...))`) vs `/web` (`asWebStream(gen(...))`) on the same function pipeline. Shows substrate overhead. |
| `bench/chain-2-stage.js` | Same as above but 2 stages. Shows per-composition overhead. |
| `bench/raw-streams.js` | Raw Node Duplex vs raw Web Streams duplex (`{readable, writable}`). Substrate baseline without `stream-chain` involvement. |
| `bench/raw-streams-burst.js` | Web-streams burst-enqueue behavior: synchronous enqueue-then-drain vs per-item drain. |
| `bench/core-chain.js` | `/core` chain throughput — no substrate cost; isolates `gen()`'s overhead. |
| `bench/gen-fun-stream.js` | Compares `gen()`, `fun()`, and `chain(asStream(...))` on the same pipeline of sync functions. |
| `bench/gen-fun.js` | Head-to-head `gen()` vs `fun()` without stream overhead. |
| `bench/gen-opt.js` | `gen()` function-list inlining optimization: flat vs nested-with-inlining vs nested-with-`clearFunctionList()`. |
| `bench/fun-opt.js` | Same as `gen-opt.js` but for `fun()`. |
All benchmarks use a pipeline of simple sync arithmetic functions (`x => x - 2`, `x => x + 1`, etc.) to isolate framework overhead from application logic.
## Import paths
```js
// Main API (default = /node; ESM)
import chain from 'stream-chain';
import {chain, none, stop, many, gen, asStream, asWebStream, dataSource} from 'stream-chain';
// CJS — destructure required (no bare-callable fallback in 4.x)
const {chain} = require('stream-chain');
// Substrate variants
import chain from 'stream-chain/node'; // same as the default
import chain from 'stream-chain/web'; // native Web Streams chain
import chain from 'stream-chain/core'; // substrate-free async-iterable chain
// Individual modules
import gen from 'stream-chain/gen.js';
import fun from 'stream-chain/fun.js';
import asStream from 'stream-chain/asStream.js';
import asWebStream from 'stream-chain/asWebStream.js';
import {none, stop, many, finalValue, flushable, isReadableWebStream} from 'stream-chain/defs.js';
// Utilities
import take from 'stream-chain/utils/take.js';
import fold from 'stream-chain/utils/fold.js';
import batch from 'stream-chain/utils/batch.js';
import makeStreamPuller from 'stream-chain/utils/streamPuller.js';
import makeWebStreamPuller from 'stream-chain/utils/webStreamPuller.js';
// JSONL
import parser from 'stream-chain/jsonl/parser.js';
import parserStream from 'stream-chain/jsonl/parserStream.js';
import stringerStream from 'stream-chain/jsonl/stringerStream.js';
// TypeScript helpers
import {TypedTransform} from 'stream-chain/typed-streams.js';
```