fluxforge
Version:
Enterprise-grade file chunking & concurrent processing library with Web Workers, automatic retry, real-time progress tracking, and MD5 integrity validation for modern browsers. Perfect for large file uploads, streaming, and data processing pipelines.
372 lines (276 loc) • 10.3 kB
Markdown
# FluxForge
[](https://www.npmjs.com/package/fluxforge)
[](https://www.typescriptlang.org/)
[](https://opensource.org/licenses/MIT)
Enterprise-grade file chunking & concurrent processing library with Web Workers, automatic retry, real-time progress tracking, and MD5 integrity validation for modern browsers. Perfect for large file uploads, streaming, and data processing pipelines.
English | [简体中文](README-zh-CN.md)
## Live Demo
[Live Demo](https://joygqz.github.io/fluxforge/)
## Key Features
### 🚀 High-Performance Architecture
- **Multi-threaded Processing**: Leverages Web Workers for true parallel chunking, maximizing CPU utilization
- **Zero-Copy Streaming**: Memory-efficient chunk processing without loading entire files into memory
- **Intelligent Resource Management**: Auto-detects hardware capabilities and optimizes thread allocation
### 🛡️ Enterprise-Grade Reliability
- **Automatic Retry Logic**: Built-in exponential backoff strategy for transient failures
- **Task Lifecycle Management**: Comprehensive pause/resume/cancel controls with graceful cleanup
- **Signal-Based Cancellation**: AbortSignal integration for immediate task termination
- **Type-Safe API**: 100% TypeScript with strict type checking and comprehensive interfaces
### 🔧 Advanced Task Scheduling
- **Configurable Concurrency**: Fine-tune parallel execution limits based on system constraints
- **Real-time Progress Tracking**: Granular progress callbacks for UI responsiveness
- **Backpressure Handling**: Intelligent queuing prevents memory overflow in high-throughput scenarios
### 🔐 Data Integrity
- **Chunk-level MD5 Hashing**: Per-chunk integrity validation using SparkMD5
- **File-level Hash Computation**: Aggregate hash calculation for complete file verification
- **Deterministic Processing**: Guaranteed chunk order preservation across parallel operations
## Installation
```bash
npm install fluxforge
```
## Quick Start
```typescript
import { calculateFileHash, chunkFile, processChunks } from 'fluxforge'
// Create chunk promises with optimal settings
const chunkPromises = chunkFile(file, {
chunkSize: 4 * 1024 * 1024 // 4MB chunks for optimal performance
})
// Process chunks with advanced configuration
const controller = processChunks(
chunkPromises,
async (chunk, signal) => {
// Handle cancellation gracefully
if (signal.aborted)
throw new Error('Operation cancelled')
// Your business logic here (upload, transform, etc.)
await uploadChunk(chunk.blob, chunk.index)
// Optional: Listen for cancellation during long operations
signal.addEventListener('abort', () => {
// Cleanup resources, cancel network requests, etc.
})
},
{
concurrency: 6, // Optimal for most scenarios
onProgress: (completed, total) => {
const percentage = Math.round((completed / total) * 100)
updateProgressBar(percentage)
}
}
)
// Advanced task control
controller.pause() // Gracefully pause all processing
controller.resume() // Resume from where it left off
controller.cancel() // Immediately abort all operations
// Wait for completion
try {
await controller.promise
console.log('All chunks processed successfully')
}
catch (error) {
if (error.message === 'Task cancelled') {
console.log('Processing was cancelled by user')
}
else {
console.error('Processing failed:', error)
}
}
// Verify file integrity
const fileHash = await calculateFileHash(chunkPromises)
console.log('File MD5:', fileHash)
```
## Advanced Usage Patterns
### Large File Upload with Progress Tracking
```typescript
async function uploadLargeFile(file: File, uploadUrl: string) {
const chunkPromises = chunkFile(file, { chunkSize: 8 * 1024 * 1024 })
const controller = processChunks(
chunkPromises,
async (chunk, signal) => {
const formData = new FormData()
formData.append('chunk', chunk.blob)
formData.append('index', chunk.index.toString())
formData.append('hash', chunk.hash)
const response = await fetch(`${uploadUrl}/chunk`, {
method: 'POST',
body: formData,
signal // Automatic request cancellation
})
if (!response.ok) {
throw new Error(`Upload failed: ${response.statusText}`)
}
},
{
concurrency: 4, // Conservative for network operations
onProgress: (completed, total) => {
const progress = (completed / total) * 100
console.log(`Upload progress: ${progress.toFixed(1)}%`)
}
}
)
return controller
}
```
### Data Processing Pipeline
```typescript
// Define your custom processed chunk type
interface ProcessedChunk {
data: any
originalHash: string
index: number
}
async function processFileData(file: File) {
const chunkPromises = chunkFile(file)
const processedChunks: ProcessedChunk[] = []
await processChunks(
chunkPromises,
async (chunk, signal) => {
// Transform chunk data
const processedData = await transformChunkData(chunk.blob, signal)
// Store result with preserved order
processedChunks[chunk.index] = {
data: processedData,
originalHash: chunk.hash,
index: chunk.index
}
},
{
concurrency: navigator.hardwareConcurrency || 4,
onProgress: (completed, total) => {
console.log(`Processing: ${completed}/${total} chunks`)
}
}
)
return processedChunks
}
```
### Error Handling and Retry Strategies
```typescript
const controller = processChunks(
chunkPromises,
async (chunk, signal) => {
// The library automatically handles retries with exponential backoff
// Your processor just needs to throw on failure
const result = await riskyOperation(chunk.blob)
if (!result.success) {
throw new Error(`Processing failed for chunk ${chunk.index}`)
}
return result
},
{
concurrency: 8,
onProgress: (completed, total) => {
// This callback is only called after successful processing
console.log(`Successfully processed: ${completed}/${total}`)
}
}
)
// Automatic retry with exponential backoff:
// Retry 1: 0 second delay (immediate)
// Retry 2: 1 second delay
// Retry 3: 2 second delay
// Retry 4: 3 second delay
// ...
// Max delay: 5 seconds
```
## API Reference
### Core Functions
#### `chunkFile(file: File, options?: Options): Promise<Chunk>[]`
Splits a file into an array of chunk promises, each processed in parallel by Web Workers.
**Parameters:**
- `file`: The File object to be chunked
- `options.chunkSize`: Chunk size in bytes (default: `Math.min(1024 * 1024, file.size)`)
**Returns:** Array of promises that resolve to `Chunk` objects
**Performance Notes:**
- Automatically determines optimal worker count based on `navigator.hardwareConcurrency`
- Workers are automatically terminated when all chunks are processed
- Chunk order is guaranteed despite parallel processing
#### `processChunks(chunkPromises, processor, options?): ProcessController`
Processes chunk promises with configurable concurrency and automatic retry logic.
**Parameters:**
- `chunkPromises`: Array of chunk promises from `chunkFile()`
- `processor`: Function that processes each chunk
- `options.concurrency`: Max concurrent processors (default: 6)
- `options.onProgress`: Progress callback function
**Returns:** `ProcessController` with pause/resume/cancel capabilities
**Processor Function:**
```typescript
type ChunkProcessor = (chunk: Chunk, signal: AbortSignal) => void | Promise<void>
```
The processor receives:
- `chunk`: The resolved chunk with blob data and metadata
- `signal`: AbortSignal for cancellation handling
#### `collectChunks(chunkPromises: Promise<Chunk>[]): Promise<Chunk[]>`
Waits for all chunk promises to resolve and returns them in original order.
#### `calculateFileHash(chunkPromises: Promise<Chunk>[]): Promise<string>`
Computes the MD5 hash of the entire file by aggregating individual chunk hashes.
### Types
```typescript
interface Chunk {
blob: Blob // The chunk data
hash: string // MD5 hash of this chunk
index: number // Zero-based chunk index
start: number // Start byte position in file
end: number // End byte position in file
}
interface Options {
chunkSize?: number // Chunk size in bytes
}
interface ProcessOptions {
concurrency?: number // Max concurrent processors
onProgress?: (completed: number, total: number) => void
}
interface ProcessController {
pause: () => void // Pause processing
resume: () => void // Resume processing
cancel: () => void // Cancel all processing
promise: Promise<void> // Completion promise
}
```
## Performance Considerations
### Optimal Chunk Sizes
- **Small files (<10MB)**: Use default chunk size for simplicity
- **Medium files (10MB-1GB)**: 2-8MB chunks for balanced memory/performance
- **Large files (>1GB)**: 8-16MB chunks to minimize overhead
### Concurrency Guidelines
- **CPU-intensive processing**: Use `navigator.hardwareConcurrency`
- **Network operations**: 3-6 concurrent requests to avoid overwhelming servers
- **Memory-constrained environments**: Reduce concurrency to prevent OOM
### Memory Management
- The library uses streaming processing to minimize memory footprint
- Only active chunks are kept in memory
- Automatic garbage collection of processed chunks
## Error Handling
The library provides robust error handling with automatic retry mechanisms:
1. **Transient Failures**: Automatically retried with exponential backoff
2. **Cancellation**: Clean termination via AbortSignal
3. **Fatal Errors**: Immediate failure propagation
```typescript
try {
await controller.promise
}
catch (error) {
if (error.message === 'Task cancelled') {
// User-initiated cancellation
}
else {
// Actual processing error after all retries exhausted
}
}
```
## Browser Compatibility
- **Chrome 51+** (Web Workers, AbortSignal)
- **Firefox 54+** (Web Workers, AbortSignal)
- **Safari 10+** (Web Workers support)
- **Edge 79+** (Chromium-based)
## License
MIT License - see [LICENSE](LICENSE) for details.