fib-flow
Version:
A robust workflow management system for fibjs with task orchestration, state management, and distributed execution capabilities
679 lines (570 loc) • 25.1 kB
Markdown
# API Reference
The API is designed to be intuitive and developer-friendly, allowing you to quickly implement task scheduling in your applications while maintaining full control over task lifecycles and execution parameters.
## Table of Contents
- [TaskManager](#taskmanager)
- [Constructor](#constructor)
- [Task Registration](#task-registration)
- [Task Creation](#task-creation)
- [Task Control](#task-control)
- [Task Query](#task-query)
- [Audit Query](#audit-query)
- [Handler Audit](#handler-audit)
- [Task Lifecycle](#task-lifecycle)
## TaskManager
The TaskManager is the core component responsible for managing task lifecycles, scheduling, and execution.
### Constructor
```javascript
/**
* Create a task manager instance
* @param {Object} options Configuration options
* @param {string|object} options.dbConnection Required database connection string or object
* @param {string} [options.dbType] Database type ('sqlite', 'mysql', or 'postgres')
* @param {number} [options.poll_interval=1000] Poll interval in milliseconds
* @param {number} [options.max_retries=3] Default maximum retry attempts
* @param {number} [options.retry_interval=0] Default retry interval in seconds
* @param {number} [options.timeout=60] Default task timeout in seconds
* @param {number} [options.max_concurrent_tasks=10] Maximum concurrent tasks
* @param {number} [options.task_heartbeat_interval=5000] Running task heartbeat interval in milliseconds
* @param {number} [options.task_heartbeat_timeout=30000] Timeout window in milliseconds before a running task is treated as stalled
* @param {string} [options.worker_id] Unique identifier for this worker instance (auto-generated if not provided)
* @param {string} [options.pod_id] Stable logical node identifier used for worker recovery and peer fencing
* @param {number} [options.worker_heartbeat_interval=5000] Worker registry heartbeat interval in milliseconds
* @param {number} [options.worker_heartbeat_timeout=30000] Worker liveness timeout window in milliseconds
* @param {boolean} [options.recover_running_jobs=true] Whether startup and peer scans reclaim running jobs owned by dead or superseded workers
* @param {number} [options.expire_time=86400] Time in seconds after which completed/failed tasks are deleted (1 day)
* @param {Object} [options.retention] Explicit retention policy for expired terminal tasks
* @param {number} [options.retention.expire_time] Expiration window in seconds
* @param {Array<string>} [options.retention.statuses] Terminal statuses eligible for cleanup; defaults to ['completed', 'permanently_failed']
*/
new TaskManager(options)
```
Worker recovery notes:
- `worker_id` represents a single process instance, not a stable node identity.
- `pod_id` groups multiple worker instances that belong to the same logical node across restarts.
- When `pod_id` is configured, fib-flow maintains a `fib_flow_workers` registry and can reclaim `running` tasks owned by dead or superseded workers without waiting for task timeout.
- Running-task writes are ownership-fenced by `worker_id`, so stale workers cannot safely write back after a task has been recovered.
### Task Registration
Tasks must be registered with handlers before they can be executed. The TaskManager provides flexible handler registration through the `use()` method, and handlers can be updated or removed at runtime.
Runtime semantics:
- A task that is already executing keeps the handler version captured when that execution attempt started.
- A paused or suspended task that resumes later is claimed again and uses the latest registered handler.
- Child tasks created by a running parent are resolved against the live handler registry at creation time.
#### Function Form Registration
```javascript
/**
* Register a task handler using function form
* @param {string} taskName Task type identifier
* @param {Function} handler Async function(task, next) to handle task execution
*/
use(taskName, handler)
```
Example:
```javascript
taskManager.use('processImage', async (task) => {
const { path } = task.payload;
// Process single image
return { processed: true };
});
```
#### Object Form Registration
```javascript
/**
* Register a task handler using object form with options
* @param {string} taskName Task type identifier
* @param {Object} config Handler configuration object
* @param {Function} config.handler Async function(task, next) to handle task execution
* @param {number} [config.timeout] Default timeout in seconds for this task type
* @param {number} [config.max_retries] Default maximum retry attempts for this task type
* @param {number} [config.retry_interval] Default retry interval in seconds for this task type
* @param {number} [config.priority] Default priority level for this task type
*/
use(taskName, config)
```
Example:
```javascript
taskManager.use('processImage', {
// Handler function implementation
handler: async (task) => {
const { path } = task.payload;
// Process single image
return { processed: true };
},
// Task type specific defaults
timeout: 120, // 2 minutes timeout
max_retries: 2, // Maximum 2 retries
retry_interval: 30, // Retry every 30 seconds
priority: 5 // Higher priority tasks
});
```
#### Bulk Task Registration
```javascript
/**
* Register multiple task handlers at once
* @param {Object} handlers Object mapping task types to handlers/configs
*/
use(handlersMap)
```
Example:
```javascript
taskManager.use({
// Function form handlers
processText: async (task) => {
return { processed: true };
},
// Object form handlers with options
processImage: {
handler: async (task) => {
return { processed: true };
},
timeout: 120,
max_retries: 2
},
processVideo: {
handler: async (task) => {
return { processed: true };
},
timeout: 300,
priority: 3
}
});
```
#### Handler Removal
```javascript
/**
* Unregister one or more task handlers
* @param {string|string[]} taskName Task type identifier or identifiers
* @returns {number} Number of handlers removed
*/
unuse(taskName)
```
Example:
```javascript
taskManager.unuse('processImage');
taskManager.unuse([
'processVideo',
'processAudio'
]);
```
`unuse()` only affects future work selection. It does not interrupt a task attempt that is already executing.
#### Handler Options
When registering a task handler using the object form, you can specify the following options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| handler | Function | Required | The async function that processes the task |
| timeout | Number | 60 | Task execution timeout in seconds |
| max_retries | Number | 3 | Maximum total attempts for tasks (including initial attempt) |
| retry_interval | Number | 0 | Delay between retries in seconds |
| priority | Number | - | Default priority for all tasks of this type |
| max_concurrent_tasks | Number | - | Maximum number of concurrent tasks of this type |
Notes:
- Options specified during handler registration become the defaults for that task type
- These defaults can be overridden when creating individual tasks
- Handler options take precedence over global TaskManager options
- If a handler is registered as a function, it will use the global TaskManager options
- When max_concurrent_tasks is set, the system will ensure no more than that many tasks of this type run simultaneously
- Handler schema metadata is not supported; validate payloads inside the handler when needed
### Task Options
Task execution can be configured through three levels:
1. **Global Configuration** (TaskManager level)
```javascript
const taskManager = new TaskManager({
poll_interval: 1000, // Poll interval in milliseconds
max_retries: 3, // Maximum total attempts (including initial attempt)
retry_interval: 0, // No delay between retries
timeout: 60, // Default task timeout in seconds
max_concurrent_tasks: 10, // Maximum concurrent tasks
task_heartbeat_interval: 5000, // Running task heartbeat interval
task_heartbeat_timeout: 30000, // Running task heartbeat timeout window
pod_id: 'scheduler-a', // Stable logical node identity for worker recovery
worker_heartbeat_interval: 5000, // Worker registry heartbeat interval
worker_heartbeat_timeout: 30000, // Worker liveness timeout window in milliseconds
recover_running_jobs: true, // Reclaim running jobs from dead or superseded workers
expire_time: 86400, // Backward-compatible retention shortcut
retention: {
expire_time: 86400,
statuses: ['completed', 'permanently_failed']
}
});
```
2. **Task Type Configuration** (Handler registration level)
```javascript
taskManager.use('processImage', {
handler: async (task) => { /* ... */ },
timeout: 120, // 2 minutes timeout
max_retries: 2, // Maximum 2 total attempts
retry_interval: 30, // 30 seconds retry interval
priority: 5, // Higher priority tasks
max_concurrent_tasks: 5 // Max 5 concurrent tasks of this type
});
```
3. **Task Instance Configuration** (Task creation level)
```javascript
taskManager.async('processImage', payload, {
timeout: 180, // Override timeout for this task
max_retries: 5, // Override retry attempts
retry_interval: 60 // Override retry interval
});
```
Configuration Priority (highest to lowest):
1. Task Instance Options
2. Task Type (Handler) Options
3. Global TaskManager Options
### Task Creation
Tasks can be created in two modes: async (one-time) tasks and cron (scheduled) tasks. Each task can be configured with specific execution parameters.
```javascript
/**
* Create an async task
* @param {string} taskName Task type
* @param {Object} payload Task data
* @param {Object} options Task options
* @param {number} [options.delay] Delay in seconds
* @param {number} [options.priority] Priority level
* @param {number} [options.timeout] Timeout in seconds
* @param {number} [options.max_retries] Max retry attempts
* @param {number} [options.retry_interval] Retry interval in seconds
* @param {string} [options.tag] Task tag for categorization
*/
async(taskName, payload, options)
/**
* Create a cron task
* @param {string} taskName Task type
* @param {string} cronExpr Cron expression
* @param {Object} payload Task data
* @param {Object} options Same as async task options
*/
cron(taskName, cronExpr, payload, options)
```
### Task Control
Task control methods provide ways to manage the TaskManager instance and individual task execution.
```javascript
/**
* Start the TaskManager and begin processing tasks
* Initializes task polling and monitoring
* @throws {Error} If TaskManager is already stopped
*/
start()
/**
* Stop the TaskManager and cleanup resources
* Waits for running tasks to complete and closes database connections
*/
stop()
/**
* Pause task processing without stopping the TaskManager
* Tasks in progress will complete, but new tasks won't be started
*/
pause()
/**
* Resume task processing after a pause
*/
resume()
/**
* Resume a specific paused task by ID
* @param {string} taskId Task ID
*/
resumeTask(taskId)
/**
* Pause a specific running task by ID
* @param {string} taskId Task ID
*/
pauseTask(taskId)
/**
* Run retention cleanup for expired terminal tasks and their audit records
* @param {Object} [policy] Optional retention policy override
*/
runRetention(policy)
```
`expire_time` remains supported as a backward-compatible shortcut. Prefer `retention` when you need explicit control over retention statuses or want to make the cleanup policy obvious in configuration.
### Task Query
Query methods allow you to retrieve task information and monitor task status across the system.
```javascript
// Get tasks with multiple filter conditions
getTasks(filters)
// Get a specific task
getTask(taskId)
// Get tasks by name
getTasksByName(name)
// Get tasks by status
getTasksByStatus(status)
// Get child tasks
getChildTasks(parentId)
// Get tasks by tag
getTasksByTag(tag)
// Get task statistics by tag
getTaskStatsByTag(tag, status)
// Delete tasks with multiple filter conditions
deleteTasks(filters)
```
`getTasks()` remains the lightweight snapshot query API. When you need pagination metadata or workflow-scoped task views, use `queryTasks()` from the audit query section.
#### getTasks
The `getTasks` method provides flexible task querying with multiple filter conditions:
```javascript
/**
* Get tasks with multiple filter conditions
* @param {Object} filters Filter conditions
* @param {string} [filters.tag] Filter by tag
* @param {string} [filters.status] Filter by status ("pending", "running", "completed", etc)
* @param {string} [filters.name] Filter by task name
* @returns {Array<Object>} Array of matching tasks
*/
getTasks(filters)
```
Examples:
```javascript
// Get tasks with a specific tag
const taggedTasks = taskManager.getTasks({ tag: "image-processing" });
// Get pending tasks for a specific task type
const pendingImageTasks = taskManager.getTasks({
name: "processImage",
status: "pending"
});
// Complex filtering with multiple conditions
const tasks = taskManager.getTasks({
tag: "batch-1",
status: "running",
name: "videoProcess"
});
// Get all tasks (empty filter)
const allTasks = taskManager.getTasks({});
```
Filter Priority:
- Multiple filters are combined with AND logic
- If a filter is not provided, that condition is not applied
- Empty filters object returns all tasks
- Invalid filter values will throw an error for status, but be ignored for tag and name
Status Values:
- pending: Task waiting to be executed
- running: Task currently being executed
- completed: Task finished successfully
- failed: Task execution failed
- timeout: Task exceeded timeout duration
- permanently_failed: Failed task that exceeded retry attempts
- paused: Task manually paused
- suspended: Parent task waiting for children
### Audit Query
Execution audit APIs expose persisted events, attempts, and structured task/workflow audit views.
For a detailed event catalog and semantics matrix, see [Execution Audit Events](execution-audit-events.md).
For retention scope and current cleanup semantics, see [Audit Retention Policy](audit-retention-policy.md).
```javascript
// Query task events with pagination metadata
queryTaskEvents(taskId, {
event_type,
event_types,
worker_id,
attempt,
stage,
started_after,
started_before,
limit,
offset,
order
})
// Query workflow events with the same filters
queryWorkflowEvents(rootId, filters)
// Query attempts for a task
queryTaskAttempts(taskId, {
worker_id,
outcome, // e.g. completed, failed, timeout, suspended, interrupted
started_after,
started_before,
ended_after,
ended_before,
open_only,
limit,
offset,
order
})
// Query attempts for all tasks in a workflow
queryWorkflowAttempts(rootId, {
worker_id,
outcome, // e.g. completed, failed, timeout, suspended, interrupted
started_after,
started_before,
ended_after,
ended_before,
open_only,
limit,
offset,
order
})
// Query tasks with pagination metadata
queryTasks({
name,
status,
type,
tag,
worker_id,
parent_id,
root_id,
workflow_root_id,
limit,
offset,
order
})
// Structured audit views
getTaskAudit(taskId, {
events: { limit: 50, order: 'asc' },
attempts: { limit: 20, order: 'asc' }
})
getWorkflowAudit(rootId, {
tasks: { limit: 100, order: 'asc' },
events: { limit: 200, order: 'asc' }
})
// Aggregate workflow-level diagnosis
getWorkflowAuditSummary(rootId)
```
Paged audit APIs return this shape:
```javascript
{
items: [...],
total: 42,
limit: 10,
offset: 0,
has_more: true
}
```
`getTaskAudit()` returns the current task snapshot together with paged `events` and `attempts`.
`getWorkflowAudit()` returns the root task snapshot together with paged `tasks` and `events` for the workflow.
`getWorkflowAuditSummary()` returns a platform-oriented aggregate view including status counts, attempt outcome counts, workers, timing boundaries, root workflow stage timings, failed tasks, a best-effort critical path estimate, and the slowest workflow attempts.
Recommended query patterns:
- Use `queryTaskEvents()` when diagnosing one task execution round, especially with `attempt`, `event_type`, and `order: 'asc'`.
- Use `queryWorkflowEvents()` when reconstructing workflow timelines across parent and child tasks. Prefer `event_type` or `event_types` plus pagination rather than loading every event into an operator-facing page.
- Use `queryTaskAttempts()` or `queryWorkflowAttempts()` when the question is about worker rounds, retry cadence, open executions, or duration analysis. Prefer attempt queries over deriving rounds from event sequences.
- Use `getTaskAudit()` and `getWorkflowAudit()` for operator drill-down pages. Use `getWorkflowAuditSummary()` for aggregate diagnosis, not for exact replay.
Summary field semantics:
- `timing.first_started_at`: the earliest `started_at` among workflow attempts.
- `timing.last_ended_at`: the latest non-null `ended_at` among workflow attempts.
- `timing.last_event_time`: the latest workflow event time currently visible in the event table.
- `timing.workflow_duration_seconds`: `last_ended_at - root_task.created_at` when both values exist; otherwise `null`.
- `stage_timings`: derived from root task attempts plus root task `task_started` / `task_retry_started` events. Pending workflows or workflows without attempts return an empty array.
- `failed_tasks`: terminal workflow tasks currently in `failed`, `timeout`, `permanently_failed`, or `paused` status. This is a latest-snapshot view, not a historical list of every failed round.
- `critical_path`: a best-effort path built from the longest persisted representative attempt per task plus the longest child branch. When sibling branches tie, the implementation falls back to deterministic task id ordering.
- `slowest_attempts`: the top 5 attempts sorted by persisted duration, then start time.
Operational boundaries:
- `getWorkflowAuditSummary()` currently reads the full task, event, and attempt set for the workflow before aggregating in memory. It is intended for platform diagnosis, not for arbitrarily large workflow scans in hot paths.
- Because persisted timing is second-granularity, very short or same-second attempts may collapse to equal durations. In those cases `critical_path`, `stage_timings`, and `slowest_attempts` remain deterministic but should be read as approximations.
- After retention deletes historical rows, audit and summary APIs describe only the remaining retained data. The platform does not promise long-term completeness after deletion-based retention has run.
The current `critical_path` is an estimate derived from persisted attempt durations along the workflow tree. It is useful for platform diagnosis, but it should not be treated as a perfect replacement for a distributed trace.
Because persisted task timing is currently stored in whole seconds, very short stages or sibling tasks may collapse to the same duration and rely on deterministic tie-breaking.
### Handler Audit
Handlers can emit structured checkpoint events during execution through `task.audit()`.
```javascript
taskManager.use('import_user', async (task) => {
task.audit('payload_validated', {
message: 'Payload validated',
metadata: { source: task.payload.source }
});
task.audit({
code: 'remote_call_started',
message: 'Remote call started',
metadata: { provider: 'crm' }
});
return { imported: true };
});
```
Checkpoint events are written as `task_checkpoint` audit events and automatically include the current task, workflow, worker, and open attempt context.
Naming conventions:
- `checkpoint.code` should use lowercase `snake_case`, for example `payload_validated` or `remote_call_started`.
- `message` is optional, but when provided it should be display text rather than another identifier.
- `metadata` should remain structured and machine-readable.
Handlers can also update the task snapshot with lightweight progress state through `task.progress()`.
```javascript
taskManager.use('import_user', async (task) => {
task.progress('Downloading source data', {
stage_name: 'download',
progress_percent: 20,
metadata: { chunk: 1 }
});
task.progress({
stage_name: 'transform',
progress_text: 'Transforming records',
progress_percent: 75,
message: 'Transform stage running',
metadata: { transformed: 15 }
});
return { imported: true };
});
```
`task.progress()` writes a `task_progress` event and updates these task snapshot fields when provided: `current_stage_name`, `progress_text`, `progress_percent`. All audit events also keep `last_event_time` and `last_event_type` in the main task snapshot for lightweight platform queries.
These snapshot fields are convenience cache only. Platform replay, audit diagnosis, and historical reconstruction should use the persisted event and attempt records as the source of truth.
Progress conventions:
- `stage_name` should use lowercase `snake_case`, for example `download_phase` or `waiting_children`.
- `progress_text` should be short user-facing text.
- `progress_percent` should describe coarse operator-visible progress, not sub-second execution precision.
#### deleteTasks
The `deleteTasks` method provides flexible task deletion with multiple filter conditions:
```javascript
/**
* Delete tasks with multiple filter conditions
* @param {Object} filters Filter conditions
* @param {string} [filters.tag] Filter by tag
* @param {string} [filters.status] Filter by status ("pending", "running", "completed", etc)
* @param {string} [filters.name] Filter by task name
* @returns {number} Number of tasks deleted
* @throws {Error} If status is invalid
*/
deleteTasks(filters)
```
Examples:
```javascript
// Delete tasks with a specific tag
const deletedCount = taskManager.deleteTasks({ tag: "cleanup" });
// Delete completed tasks
const deletedCompleted = taskManager.deleteTasks({ status: "completed" });
// Delete tasks of a specific type
const deletedByName = taskManager.deleteTasks({ name: "processImage" });
// Delete tasks matching multiple conditions
const deletedMulti = taskManager.deleteTasks({
tag: "batch-1",
status: "failed",
name: "videoProcess"
});
// Delete all tasks (empty filter)
const deletedAll = taskManager.deleteTasks({});
```
Filter Behavior:
- Multiple filters are combined with AND logic
- If a filter is not provided, that condition is not applied
- Empty filters object deletes all tasks
- Invalid filter values will throw an error for status, but be ignored for tag and name
Status Values:
- pending: Task waiting to be executed
- running: Task currently being executed
- completed: Task finished successfully
- failed: Task execution failed
- timeout: Task exceeded timeout duration
- permanently_failed: Failed task that exceeded retry attempts
- paused: Task manually paused
- suspended: Parent task waiting for children
### Task Lifecycle
Task handlers receive task objects that contain comprehensive information about the task and provide methods for controlling task execution.
Task Status Values:
- `pending`: Task is waiting to be executed
- `running`: Task is currently being executed
- `completed`: Task has finished successfully
- `failed`: Task execution has failed
- `timeout`: Task exceeded its configured timeout duration
- `permanently_failed`: Async task that has failed and exceeded retry attempts
- `paused`: Cron task that has failed and exceeded retry attempts
- `suspended`: Parent task waiting for child tasks to complete
Task Stage:
- Stage is a numeric value starting from 0
- Stage automatically increments during task execution
- Used for controlling multi-phase task processing
- Enables conditional task creation and execution based on current stage
```javascript
// Task handler receives a task object
taskManager.use('myTask', async (task) => {
// Access task information
console.log(task.id); // Unique task ID
console.log(task.name); // Task type name
console.log(task.payload); // Task data
console.log(task.status); // Current status
console.log(task.parent_id); // Parent task ID (if any)
console.log(task.stage); // Current execution stage
// Task control methods
task.checkTimeout(); // Check if task has timed out
task.setProgress(50); // Update progress percentage
// Return value becomes task result
return { success: true };
});
```