@convex-dev/workpool

Version:

A Convex component for managing async work.

74 lines (62 loc) • 3.16 kB

Markdown

# Workpool: implementation notes and high-level architecture Concepts: - `segment`: A slice of time to process work. All work is bucketed into one. This enables us to batch work and avoid database conflicts. - `generation`: A monotonically increasing counter to ensure the loop is only running one instance. If two loops start with the same generation, one will successfully increase it, the other will retry and find that the generation has changed and fail out. - "Retention" is used to refer to situations where a query might have to read over a lot of "tombstones" - deleted data that hasn't been vacuumed from the underlying database yet. If there are frequent deletions, scanning across them can delay a query. Because of our delete-heavy queuing strategy, we have to be careful. Strategies are below. - Cursors: A pointer to the last processed place in a table. In our case, they might allow data to be written before them if out-of-order writes happen, so we need to account for finding those "missed" writes on some granularity. We choose to wait until there isn't any immediate work to do before those scans. They help avoid retention issues. ## Data state machine ```mermaid flowchart LR Client -->|enqueue| pendingStart Client -->|cancel| pendingCancelation complete --> |success or failure| pendingCompletion pendingCompletion -->|retry| pendingStart pendingStart --> workerRunning["worker running"] workerRunning -->|worker finished| complete workerRunning --> |recovery| complete successfulCancel["AND"]@{shape: delay} --> |canceled| complete pendingStart --> successfulCancel pendingCancelation --> successfulCancel ``` Notably: - The pending\* states are written by outside sources. - The main loop federates changes to/from "running" - Canceling only impacts pending and retrying jobs. ## Loop state machine ```mermaid flowchart TD idle -->|enqueue| running running-->|"all started, leftover capacity"| scheduled scheduled -->|"enqueue, cancel, saveResult, recovery"| running running -->|"maxed out"| saturated saturated -->|"cancel, saveResult, recovery"| running running-->|"all done"| idle ``` - While the loop is running, the runStatus doesn't change, making it safer to read from clients without database conflicts. - The "saturated" state is concretely "running" or "scheduled" at max parallelism. There is a boolean set on "scheduled" to avoid clients from kicking the main loop on enqueueing, which is unlikely to be productive, since the next action needs to be something terminating. ## Retention optimization strategy - Producers (Client, Worker, Recovery) write to a future "segment". - Consumers (main) read the current segment. - On conflicts, producers will write to progressively higher segments, while the main loop will continue to read the segment originally called with. This means conflicts are less likely on each retry. - Patch singletons to avoid tombstones. - Use segements & cursors to bound reads to latest data. - Do scans outside of the critical path (during load). - Do point reads otherwise.