@shutterstock/p-map-iterable
Version:
Set of classes used for async prefetching with backpressure (IterableMapper) and async flushing with backpressure (IterableQueueMapper, IterableQueueMapperSimple)
453 lines • 19.5 kB
JavaScript
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.IterableMapper = void 0;
//
// 2021-08-25 - Initially based on: https://raw.githubusercontent.com/sindresorhus/p-map/main/index.js
//
// eslint-disable-next-line @typescript-eslint/no-var-requires
const AggregateError = require('aggregate-error');
const iterable_queue_1 = require("./iterable-queue");
/**
* Iterates over a source iterable / generator with specified `concurrency`,
* calling the `mapper` on each iterated item, and storing the
* `mapper` result in a queue of `maxUnread` size, before
* being iterated / read by the caller.
*
* @remarks
*
* ### Typical Use Case
* - Prefetching items from an async I/O source
* - In the simple sequential (`concurrency: 1`) case, allows items to be prefetched async, preserving order, while caller processes an item
* - Can allow parallel prefetches for sources that allow for out of order reads (`concurrency: 2+`)
* - Prevents the producer from racing ahead of the consumer if `maxUnread` is reached
*
* ### Error Handling
* The mapper should ideally handle all errors internally to enable error handling
* closest to where they occur. However, if errors do escape the mapper:
*
* When `stopOnMapperError` is true (default):
* - First error immediately stops processing
* - Error is thrown from the `AsyncIterator`'s next() call
*
* When `stopOnMapperError` is false:
* - Processing continues despite errors
* - All errors are collected and thrown together
* - Errors are thrown as `AggregateError` after all items complete
*
* ### Usage
* - Items are exposed to the `mapper` via an iterator or async iterator (this includes generator and async generator functions)
* - IMPORTANT: `mapper` method not be invoked when `maxUnread` is reached, until items are consumed
* - The iterable will set `done` when the `input` has indicated `done` and all `mapper` promises have resolved
*
* @example
*
* ### Typical Processing Loop without `IterableMapper`
*
* ```typescript
* const source = new SomeSource();
* const sourceIds = [1, 2,... 1000];
* const sink = new SomeSink();
* for (const sourceId of sourceIds) {
* const item = await source.read(sourceId); // takes 300 ms of I/O wait, no CPU
* const outputItem = doSomeOperation(item); // takes 20 ms of CPU
* await sink.write(outputItem); // takes 500 ms of I/O wait, no CPU
* }
* ```
*
* Each iteration takes 820ms total, but we waste time waiting for I/O.
* We could prefetch the next read (300ms) while processing (20ms) and writing (500ms),
* without changing the order of reads or writes.
*
* @example
*
* ### Using `IterableMapper` as Prefetcher with Blocking Sequential Writes
*
* `concurrency: 1` on the prefetcher preserves the order of the reads and and writes are sequential and blocking (unchanged).
*
* ```typescript
* const source = new SomeSource();
* const sourceIds = [1, 2,... 1000];
* // Pre-reads up to 8 items serially and releases in sequential order
* const sourcePrefetcher = new IterableMapper(sourceIds,
* async (sourceId) => source.read(sourceId),
* { concurrency: 1, maxUnread: 10 }
* );
* const sink = new SomeSink();
* for await (const item of sourcePrefetcher) { // may not block for fast sources
* const outputItem = doSomeOperation(item); // takes 20 ms of CPU
* await sink.write(outputItem); // takes 500 ms of I/O wait, no CPU
* }
* ```
*
* This reduces iteration time to 520ms by overlapping reads with processing/writing.
*
* @example
*
* ### Using `IterableMapper` as Prefetcher with Background Sequential Writes with `IterableQueueMapperSimple`
*
* `concurrency: 1` on the prefetcher preserves the order of the reads.
* `concurrency: 1` on the flusher preserves the order of the writes, but allows the loop to iterate while last write is completing.
*
* ```typescript
* const source = new SomeSource();
* const sourceIds = [1, 2,... 1000];
* const sourcePrefetcher = new IterableMapper(sourceIds,
* async (sourceId) => source.read(sourceId),
* { concurrency: 1, maxUnread: 10 }
* );
* const sink = new SomeSink();
* const flusher = new IterableQueueMapperSimple(
* async (outputItem) => sink.write(outputItem),
* { concurrency: 1 }
* );
* for await (const item of sourcePrefetcher) { // may not block for fast sources
* const outputItem = doSomeOperation(item); // takes 20 ms of CPU
* await flusher.enqueue(outputItem); // will periodically block for portion of write time
* }
* // Wait for all writes to complete
* await flusher.onIdle();
* // Check for errors
* if (flusher.errors.length > 0) {
* // ...
* }
* ```
*
* This reduces iteration time to about `max((max(readTime, writeTime) - cpuOpTime, cpuOpTime))`
* by overlapping reads and writes with the CPU processing step.
* In this contrived example, the loop time is reduced to 500ms - 20ms = 480ms.
* In cases where the CPU usage time is higher, the impact can be greater.
*
* @example
*
* ### Using `IterableMapper` as Prefetcher with Out of Order Reads and Background Out of Order Writes with `IterableQueueMapperSimple`
*
* For maximum throughput, allow out of order reads and writes with
* `IterableQueueMapper` (to iterate results with backpressure when too many unread items) or
* `IterableQueueMapperSimple` (to handle errors at end without custom iteration and applying backpressure to block further enqueues when `concurrency` items are in process):
*
* ```typescript
* const source = new SomeSource();
* const sourceIds = [1, 2,... 1000];
* const sourcePrefetcher = new IterableMapper(sourceIds,
* async (sourceId) => source.read(sourceId),
* { concurrency: 10, maxUnread: 20 }
* );
* const sink = new SomeSink();
* const flusher = new IterableQueueMapperSimple(
* async (outputItem) => sink.write(outputItem),
* { concurrency: 10 }
* );
* for await (const item of sourcePrefetcher) { // typically will not block
* const outputItem = doSomeOperation(item); // takes 20 ms of CPU
* await flusher.enqueue(outputItem); // typically will not block
* }
* // Wait for all writes to complete
* await flusher.onIdle();
* // Check for errors
* if (flusher.errors.length > 0) {
* // ...
* }
* ```
*
* This reduces iteration time to about 20ms by overlapping reads and writes with the CPU processing step.
* In this contrived (but common) example we would get a 41x improvement in throughput, removing 97.5% of
* the time to process each item and fully utilizing the CPU time available in the JS event loop.
*
* @category Iterable Input
*/
class IterableMapper {
/**
* Create a new `IterableMapper`
*
* @param input Iterated over concurrently, or serially, in the `mapper` function.
* @param mapper Function called for every item in `input`. Returns a `Promise` or value.
* @param options IterableMapper options
*
* @see {@link IterableQueueMapper} for full class documentation
*/
constructor(input, mapper, options = {}) {
this._errors = [];
this._asyncIterator = false;
this._isRejected = false;
this._isIterableDone = false;
this._activeRunners = 0;
this._resolvingCount = 0;
this._currentIndex = 0;
this._initialRunnersCreated = false;
const { concurrency = 4, stopOnMapperError = true, maxUnread = 8 } = options;
this._mapper = mapper;
this._options = { concurrency, stopOnMapperError, maxUnread };
if (typeof mapper !== 'function') {
throw new TypeError('Mapper function is required');
}
// Avoid undefined errors on options
if (this._options.concurrency === undefined ||
this._options.stopOnMapperError === undefined ||
this._options.maxUnread === undefined) {
throw new TypeError('Options are malformed after init');
}
// Validate concurrency option
if (!((Number.isSafeInteger(this._options.concurrency) ||
this._options.concurrency === Number.POSITIVE_INFINITY) &&
this._options.concurrency >= 1)) {
throw new TypeError(`Expected \`concurrency\` to be an integer from 1 and up or \`Infinity\`, got \`${concurrency}\` (${typeof concurrency})`);
}
// Validate maxUnread option
if (!((Number.isSafeInteger(this._options.maxUnread) ||
this._options.maxUnread === Number.POSITIVE_INFINITY) &&
this._options.maxUnread >= 1)) {
throw new TypeError(`Expected \`maxUnread\` to be an integer from 1 and up or \`Infinity\`, got \`${maxUnread}\` (${typeof maxUnread})`);
}
// Validate relationship between maxUnread and concurrency
if (this._options.maxUnread < this._options.concurrency) {
throw new TypeError(`Expected \`maxUnread\` to be greater than or equal to \`concurrency\`, got \`${maxUnread}\` < \`${concurrency}\``);
}
this._unreadQueue = new iterable_queue_1.IterableQueue({ maxUnread });
// Setup the source iterator
if (input[Symbol.asyncIterator] !== undefined) {
// We've got an async iterable
this._iterator = input[Symbol.asyncIterator]();
this._asyncIterator = true;
}
else {
this._iterator = input[Symbol.iterator]();
}
// Create the initial concurrent runners in a detached (non-awaited)
// promise. We need this so we can await the next() calls
// to stop creating runners before hitting the concurrency limit
// if the iterable has already been marked as done.
void (async () => {
for (let index = 0; index < concurrency; index++) {
// Setup the detached runner
this._activeRunners++;
// This only waits for the next source item to be iterated
// It does NOT wait for the mapper to be called for for a consumer to pickup
// the result out of the unread queue.
await this.sourceNext();
if (this._isIterableDone || this._isRejected) {
break;
}
}
// Signal that the next() function should now create runners if it sees too few of them
this._initialRunnersCreated = true;
})();
}
[Symbol.asyncIterator]() {
return this;
}
/**
* Used by the iterator returned from [Symbol.asyncIterator]
* Called every time an item is needed
*
* @returns Iterator result
*/
async next() {
// Bail out and release all waiters if there are no more items coming
const done = this.areWeDone();
if (done) {
if (!this._options.stopOnMapperError && this._errors.length > 0) {
// throw the errors as an aggregate exception
this._isRejected = true;
throw new AggregateError(this._errors);
}
return { value: undefined, done };
}
// Check if queue has an item
const item = await this._unreadQueue.dequeue();
if (item === undefined) {
// We finished - There were no more items
this.bubbleUpErrors();
return { value: undefined, done: true };
}
this.startARunnerIfNeeded();
return { value: this.throwIfError(item), done: false };
}
bubbleUpErrors() {
if (!this._options.stopOnMapperError && this._errors.length > 0) {
// throw the errors as an aggregate exception
throw new AggregateError(this._errors);
}
}
startARunnerIfNeeded() {
// If there are items left AND there are not enough runners running,
// start one more runner - each subsequent read will check this and start more runners
// as items are pulled from the queue
if (this._initialRunnersCreated) {
// The init loop has finished - we don't create runners until that loop
// has finished else we'll end up with too many runners
if (!this._isIterableDone && !this._isRejected) {
// We only create more runners if the source iterable is not already done
if (this._activeRunners < this._options.concurrency) {
// We only create runners if we're under the concurrency limit
if (this._unreadQueue.length + this._activeRunners <= this._options.maxUnread) {
// We only create runners if the number of runners + unread items will not
// exceed the unread queue length
// Start another source runner, but do not await it
this.startAnotherRunner();
}
}
}
}
}
startAnotherRunner() {
if (this._activeRunners === this._options.concurrency) {
throw new TypeError('active runners would be greater than concurrency limit');
}
if (this._activeRunners + this._unreadQueue.length > this._options.maxUnread) {
throw new TypeError('active runners would overflow the read queue limit');
}
if (this._isIterableDone) {
throw new TypeError('runner should not be started when iterable is already done');
}
if (this._activeRunners < 0) {
throw new TypeError('active runners is less than 0');
}
// We only create runners if the number of runners + unread items will not
// exceed the unread queue length
this._activeRunners++;
// Start another source runner, but do not await it
void this.sourceNext();
}
areWeDone() {
if (this._isIterableDone) {
// The source iterable has no more items
if (this._resolvingCount === 0) {
// There are no more resolvers running
if (this._unreadQueue.length === 0) {
// There are no unread items left
this._unreadQueue.done();
return true;
}
}
}
return false;
}
/**
* Throw an exception if the wrapped NewElement is an Error
*
* @returns Element if no error
*/
throwIfError(item) {
if (item.error !== undefined) {
throw item.error;
}
else if (item.element === undefined) {
throw new TypeError('no element was returned');
}
return item.element;
}
/**
* Get the next item from the `input` iterable.
*
* @remarks
*
* This is called up to `concurrency` times in parallel.
*
* If the read queue is not full, and there are source items to read,
* each instance of this will keep calling a new instance of itself
* that detaches and runs asynchronously (keeping the same number
* of instances running).
*
* If the read queue + runners = max read queue length then the runner
* will exit and will be restarted when an item is read from the queue.
*/
async sourceNext() {
if (this._isRejected) {
this._activeRunners--;
return;
}
// Note: do NOT await a non-async iterable as it will cause next() to be
// pushed into the event loop, slowing down iteration of non-async iterables.
let nextItem;
try {
if (this._asyncIterator) {
nextItem = await this._iterator.next();
}
else {
nextItem = this._iterator.next();
}
}
catch (error) {
// Iterator protocol / Iterables can throw exceptions - If this happens we have to just stop
// regardless of stopOnMapperError since we can't iterate any additional items
this._isRejected = true;
this._activeRunners--;
// Push the error onto the unread queue, to be rethrown by next()
await this._unreadQueue.enqueue({ error });
return;
}
const index = this._currentIndex;
this._currentIndex++;
if (nextItem.done) {
this._isIterableDone = true;
// If there are no active resolvers, then release all the waiters
if (this._resolvingCount === 0) {
// At this point the only waiters in the queue are not going to get an item
// as there are no source items left
this._unreadQueue.done();
}
this._activeRunners--;
return;
}
this._resolvingCount++;
// This is created as a detached, non-awaited async
// to allow next() to return while the async mapper is awaited.
// More next() calls will be made up to the concurrency limit.
void (async () => {
//
// Push an item or error into the read queue
// Note: once we push an item we end this try/catch as any subsequent errors
// are errors in this class and not errors thrown by the mapper function.
// Once we've pushed an item we can't also push an error...
//
try {
const element = nextItem.value;
if (this._isRejected) {
this._activeRunners--;
return;
}
const value = await this._mapper(element, index);
this._resolvingCount--;
// if (value === pMapSkip) {
// skippedIndexes.push(index);
// } else {
// result[index] = value;
// }
// Push item onto the ready queue
await this._unreadQueue.enqueue({ element: value });
// eslint-disable-next-line @typescript-eslint/no-explicit-any
}
catch (error) {
if (this._options.stopOnMapperError) {
this._isRejected = true;
this._activeRunners--;
// Push the error onto the unread queue, to be rethrown by next()
await this._unreadQueue.enqueue({ error });
// Fall through to release a reader
}
else {
// Collect the error but do not stop iterating
// These will be thrown in an AggregateError at the end
this._errors.push(error);
this._resolvingCount--;
await this.sourceNext();
// Return so we don't release a reader since we didn't push an item
return;
}
}
//
// Tasks below are not related to the mapper
//
// Bail if read queue length + active runners will hit max unread
if (this._unreadQueue.length + this._activeRunners > this._options.maxUnread) {
this._activeRunners--;
return;
}
// Start myself again
// Note: this will bail out if it reaches the end of the source iterable
await this.sourceNext();
})();
}
}
exports.IterableMapper = IterableMapper;
//# sourceMappingURL=iterable-mapper.js.map