@convex-dev/workpool
Version:
A Convex component for managing async work.
74 lines (62 loc) • 3.16 kB
Markdown
# Workpool: implementation notes and high-level architecture
Concepts:
- `segment`: A slice of time to process work. All work is bucketed into one.
This enables us to batch work and avoid database conflicts.
- `generation`: A monotonically increasing counter to ensure the loop is
only running one instance. If two loops start with the same generation,
one will successfully increase it, the other will retry and find that the
generation has changed and fail out.
- "Retention" is used to refer to situations where a query might have to read
over a lot of "tombstones" - deleted data that hasn't been vacuumed from the
underlying database yet. If there are frequent deletions, scanning across them
can delay a query. Because of our delete-heavy queuing strategy, we have to be
careful. Strategies are below.
- Cursors: A pointer to the last processed place in a table. In our case, they
might allow data to be written before them if out-of-order writes happen, so
we need to account for finding those "missed" writes on some granularity. We
choose to wait until there isn't any immediate work to do before those scans.
They help avoid retention issues.
## Data state machine
```mermaid
flowchart LR
Client -->|enqueue| pendingStart
Client -->|cancel| pendingCancelation
complete --> |success or failure| pendingCompletion
pendingCompletion -->|retry| pendingStart
pendingStart --> workerRunning["worker running"]
workerRunning -->|worker finished| complete
workerRunning --> |recovery| complete
successfulCancel["AND"]@{shape: delay} --> |canceled| complete
pendingStart --> successfulCancel
pendingCancelation --> successfulCancel
```
Notably:
- The pending\* states are written by outside sources.
- The main loop federates changes to/from "running"
- Canceling only impacts pending and retrying jobs.
## Loop state machine
```mermaid
flowchart TD
idle -->|enqueue| running
running-->|"all started, leftover capacity"| scheduled
scheduled -->|"enqueue, cancel, saveResult, recovery"| running
running -->|"maxed out"| saturated
saturated -->|"cancel, saveResult, recovery"| running
running-->|"all done"| idle
```
- While the loop is running, the runStatus doesn't change, making it safer to
read from clients without database conflicts.
- The "saturated" state is concretely "running" or "scheduled" at max
parallelism. There is a boolean set on "scheduled" to avoid clients from
kicking the main loop on enqueueing, which is unlikely to be productive, since
the next action needs to be something terminating.
## Retention optimization strategy
- Producers (Client, Worker, Recovery) write to a future "segment".
- Consumers (main) read the current segment.
- On conflicts, producers will write to progressively higher segments, while
the main loop will continue to read the segment originally called with.
This means conflicts are less likely on each retry.
- Patch singletons to avoid tombstones.
- Use segements & cursors to bound reads to latest data.
- Do scans outside of the critical path (during load).
- Do point reads otherwise.