UNPKG

orlando-transducers

Version:

Compositional data transformation with transducers

514 lines (386 loc) β€’ 18.1 kB
# Orlando: High-Performance Transducers for JavaScript > Transform transformations, not data. Compositional data processing via WebAssembly. Orlando brings the power of **transducers** to JavaScript and TypeScript through a blazing-fast Rust/WebAssembly implementation. Named after the bridger characters in Greg Egan's *Diaspora*, who embodied transformation at fundamental levels. [![npm version](https://img.shields.io/npm/v/orlando-transducers.svg)](https://www.npmjs.com/package/orlando-transducers) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## What Are Transducers? **Transducers compose transformations, not data.** Traditional JavaScript array methods create intermediate arrays at each step: ```javascript // ❌ Traditional approach - creates 2 intermediate arrays const result = data .map(x => x * 2) // intermediate array 1 .filter(x => x > 10) // intermediate array 2 .slice(0, 5); // final result // For 1M items, this allocates ~24MB of intermediate memory ``` Orlando transducers execute transformations in a **single pass** with **zero intermediate allocations**: ```javascript // βœ… Orlando - single pass, no intermediates import init, { Pipeline } from 'orlando-transducers'; await init(); const pipeline = new Pipeline() .map(x => x * 2) .filter(x => x > 10) .take(5); const result = pipeline.toArray(data); // For 1M items, stops after finding 5 matches! // Memory: ~40 bytes (just the 5-element result) ``` ### Performance Benefits - **πŸš€ No intermediate allocations** - Single pass over data - **⚑ Early termination** - Stops processing as soon as possible - **πŸ”§ Composable** - Build complex pipelines from simple operations - **πŸ’ͺ WASM-powered** - Native performance via WebAssembly - **⚑ Automatic fusion** - Mapβ†’Filter chains automatically optimized - **πŸ“¦ Tiny** - <50KB compressed WASM bundle ## Performance Real-world benchmarks show **3-19x speedup** over native JavaScript array methods: | Scenario | JavaScript Arrays | Orlando Transducers | Speedup | |----------|------------------|---------------------|---------| | Map β†’ Filter β†’ Take 10 (100K items) | 2.3ms | 0.6ms | **3.8x faster** | | Complex pipeline (10 operations, 50K items) | 8.7ms | 2.1ms | **4.1x faster** | | Early termination (find first 5 in 1M items) | 15.2ms | 0.8ms | **19x faster** πŸ”₯ | **Why is Orlando faster?** 1. **Zero intermediate arrays** - Array methods create a new array at each step 2. **Early termination** - Orlando stops processing immediately when conditions are met 3. **WASM execution** - Native performance via WebAssembly 4. **SIMD optimizations** - Vectorized operations for numeric data (when available) [Run benchmarks in your browser β†’](examples/performance.html) ## Installation ```bash npm install orlando-transducers # or yarn add orlando-transducers # or pnpm add orlando-transducers ``` **Using from CDN:** ```html <script type="module"> import init, { Pipeline } from 'https://unpkg.com/orlando-transducers'; await init(); // Use Pipeline... </script> ``` ## Quick Start ```javascript import init, { Pipeline } from 'orlando-transducers'; // Initialize WASM (once per application) await init(); // Create a reusable pipeline const pipeline = new Pipeline() .map(x => x * 2) .filter(x => x % 3 === 0) .take(5); // Execute on data const data = Array.from({ length: 100 }, (_, i) => i + 1); const result = pipeline.toArray(data); console.log(result); // [6, 12, 18, 24, 30] ``` **TypeScript with full type safety:** ```typescript import init, { Pipeline } from 'orlando-transducers'; await init(); interface User { id: number; name: string; active: boolean; } const activeUserEmails = new Pipeline() .filter((user: User) => user.active) .map((user: User) => user.email) .take(100); const emails = activeUserEmails.toArray(users); ``` ## API Reference All methods return a new `Pipeline` instance, allowing for fluent method chaining. ### Transformations | Method | Description | Example | |--------|-------------|---------| | `map(fn)` | Transform each element | `.map(x => x * 2)` | | `filter(predicate)` | Keep only matching elements | `.filter(x => x > 5)` | | `take(n)` | Take first n elements (early termination!) | `.take(10)` | | `takeWhile(predicate)` | Take while predicate is true | `.takeWhile(x => x < 100)` | | `drop(n)` | Skip first n elements | `.drop(5)` | | `dropWhile(predicate)` | Skip while predicate is true | `.dropWhile(x => x < 10)` | | `tap(fn)` | Execute side effects without modifying values | `.tap(x => console.log(x))` | | `flatMap(fn)` | Transform and flatten nested arrays | `.flatMap(x => [x, x * 2])` | | `reject(predicate)` | Remove matching elements (inverse of filter) | `.reject(x => x < 0)` | | `chunk(n)` | Group elements into chunks of size n | `.chunk(3)` | | `unique()` | Remove consecutive duplicates | `.unique()` | | `scan(fn, initial)` | Accumulate with intermediate results | `.scan((a, b) => a + b, 0)` | ### Terminal Operations (Collectors) These execute the pipeline and return a result: | Method | Description | Example | |--------|-------------|---------| | `toArray(source)` | Collect results into an array | `pipeline.toArray(data)` | | `reduce(source, reducer, initial)` | Custom reduction | `pipeline.reduce(data, (a,b) => a+b, 0)` | | `find(source, predicate)` | Find first matching element | `find(pipeline, data, x => x > 10)` | | `partition(source, predicate)` | Split into [matching, non-matching] | `partition(pipeline, data, isValid)` | | `groupBy(source, keyFn)` | Group elements by key function | `groupBy(pipeline, data, x => x.type)` | | `frequencies(source)` | Count occurrences of each element | `frequencies(data)` | | `topK(source, k)` | Get k largest elements | `topK(scores, 10)` | ### Statistical Operations | Function | Description | Example | |----------|-------------|---------| | `product(array)` | Multiply all numbers | `product([2, 3, 4])` β†’ 24 | | `mean(array)` | Arithmetic mean (average) | `mean([1, 2, 3, 4, 5])` β†’ 3 | | `median(array)` | Middle value | `median([1, 2, 3, 4, 5])` β†’ 3 | | `min(array)` / `max(array)` | Minimum/maximum value | `max([1, 5, 3])` β†’ 5 | | `minBy(array, keyFn)` / `maxBy(array, keyFn)` | Min/max by key function | `maxBy(users, u => u.score)` | | `variance(array)` | Sample variance | `variance([2, 4, 6, 8])` | | `stdDev(array)` | Standard deviation | `stdDev([2, 4, 6, 8])` | | `quantile(array, p)` | P-th quantile (0-1) | `quantile(data, 0.95)` | | `mode(array)` | Most frequent value | `mode([1, 2, 2, 3])` β†’ 2 | ### Collection Utilities | Function | Description | Example | |----------|-------------|---------| | `sortBy(array, keyFn)` | Sort by key function | `sortBy(users, u => u.age)` | | `sortWith(array, cmpFn)` | Sort with comparator | `sortWith(nums, (a,b) => a - b)` | | `reverse(array)` | Reverse order | `reverse([1, 2, 3])` β†’ [3, 2, 1] | | `range(start, end, step)` | Generate numeric sequence | `range(0, 10, 2)` β†’ [0, 2, 4, 6, 8] | | `repeat(value, n)` | Repeat value N times | `repeat('x', 3)` β†’ ['x', 'x', 'x'] | | `cycle(array, n)` | Repeat array N times | `cycle([1, 2], 3)` β†’ [1, 2, 1, 2, 1, 2] | | `unfold(seed, fn, limit)` | Generate from seed | `unfold(1, x => x * 2, 5)` β†’ [2, 4, 8, 16, 32] | | `path(obj, pathArray)` | Safe deep property access | `path(user, ['profile', 'email'])` | | `pathOr(obj, path, default)` | Path with default value | `pathOr(config, ['port'], 8080)` | | `evolve(obj, transforms)` | Nested transformations | `evolve(user, { age: n => n + 1 })` | ### Logic Functions Predicate combinators and conditional transformations for cleaner conditional logic: | Function | Description | Example | |----------|-------------|---------| | `both(p1, p2)` | Combine predicates with AND | `both(isPositive, isEven)` | | `either(p1, p2)` | Combine predicates with OR | `either(isSmall, isLarge)` | | `complement(pred)` | Negate a predicate | `complement(isEven)` | | `allPass(predicates)` | All predicates must pass | `allPass([isValid, isActive])` | | `anyPass(predicates)` | Any predicate must pass | `anyPass([isZero, isDivisibleBy10])` | | `When(pred, fn)` | Transform only when predicate is true | `new When(x => x > 0, x => x * 2)` | | `Unless(pred, fn)` | Transform only when predicate is false | `new Unless(x => x > 0, _ => 0)` | | `IfElse(pred, onTrue, onFalse)` | Branch on condition | `new IfElse(x => x >= 0, double, halve)` | ### Multi-Input Operations Operations for combining and comparing multiple arrays: | Function | Description | Example | |----------|-------------|---------| | `merge(arrays)` | Interleave multiple arrays | `merge([a, b, c])` | | `zip(a, b)` | Combine into pairs | `zip([1,2], ['a','b'])` | | `zipLongest(a, b, fill)` | Zip with fill for different lengths | `zipLongest(a, b, null)` | | `intersection(a, b)` | Elements in both arrays | `intersection(a, b)` | | `union(a, b)` | Unique elements from both | `union(a, b)` | | `difference(a, b)` | Elements in a but not b | `difference(a, b)` | | `cartesianProduct(a, b)` | All possible pairs | `cartesianProduct(colors, sizes)` | **Full API documentation:** [docs/api/JAVASCRIPT.md](docs/api/JAVASCRIPT.md) ## Real-World Examples ### Pagination ```javascript function paginate(data, page, pageSize) { return new Pipeline() .drop((page - 1) * pageSize) .take(pageSize) .toArray(data); } const page2 = paginate(users, 2, 20); // Get page 2 (items 21-40) ``` [Try the interactive pagination demo β†’](examples/pagination.html) ### Data Processing Pipeline ```javascript // Filter active users, normalize emails, find company addresses const companyEmails = new Pipeline() .filter(user => user.active) .map(user => ({ id: user.id, email: user.email.toLowerCase() })) .filter(user => user.email.endsWith('@company.com')) .take(100); const result = companyEmails.toArray(users); ``` ### Product Search with Multiple Filters ```javascript const searchProducts = (products, { category, minPrice, maxPrice, minRating }) => { return new Pipeline() .filter(p => p.category === category) .filter(p => p.price >= minPrice && p.price <= maxPrice) .filter(p => p.rating >= minRating) .filter(p => p.inStock) .take(20) .toArray(products); }; const results = searchProducts(catalog, { category: 'electronics', minPrice: 50, maxPrice: 500, minRating: 4.0 }); ``` ### Early Termination for Performance ```javascript // Find first 10 prime numbers in a large dataset const isPrime = n => { if (n < 2) return false; for (let i = 2; i <= Math.sqrt(n); i++) { if (n % i === 0) return false; } return true; }; const pipeline = new Pipeline() .filter(isPrime) .take(10); // Stops immediately after finding 10 primes! // Traditional .filter().slice(0,10) would check ALL numbers const firstTenPrimes = pipeline.toArray(hugeRange); ``` ### Debugging with Tap ```javascript const pipeline = new Pipeline() .tap(x => console.log('Input:', x)) .map(x => x * 2) .tap(x => console.log('After doubling:', x)) .filter(x => x > 10) .tap(x => console.log('After filter:', x)); const result = pipeline.toArray(data); ``` **More examples:** - [Interactive Demo](examples/index.html) - Build and test pipelines in your browser - [Real-World Data Processing](examples/data-processing.html) - ETL, log analysis, analytics - [Performance Benchmarks](examples/performance.html) - Compare against native arrays - [Library Comparison](examples/benchmark-comparison.html) - vs Underscore, Ramda, Lodash, Lazy.js - [Migration Guide](docs/api/MIGRATION.md) - Convert from array methods to Orlando ## Benchmarks Orlando has been benchmarked against popular JavaScript libraries to demonstrate real-world performance advantages. ### Libraries Compared - **Native Array methods** - Built-in JavaScript - **Underscore.js** - Classic utility library - **Ramda** - Functional programming library - **Lodash** - Modern utility library - **Lazy.js** - Lazy evaluation library ### Key Results Based on benchmarks across multiple scenarios: | Scenario | Orlando vs Native | Winner | |----------|------------------|--------| | Map β†’ Filter β†’ Take (100K) | **4.8x faster** | Orlando πŸ† | | Complex Pipeline (10 ops) | **3.2x faster** | Orlando πŸ† | | Early Termination (1M) | **18.7x faster** πŸ”₯ | Orlando πŸ† | | Object Processing (500K) | **2.8x faster** | Orlando πŸ† | | Simple Map (1M) | 1.3x slower | Native Array | **Early termination provides the biggest wins** - Orlando stops processing as soon as conditions are met, while native arrays must complete all operations first. ### Running Benchmarks **Node.js:** ```bash npm install npm run build:nodejs npm run bench:all # Full benchmark suite npm run bench:quick # Quick benchmarks ``` **Browser:** - Open [examples/benchmark-comparison.html](examples/benchmark-comparison.html) - Click "Run Benchmark" to compare against all libraries - See visual comparison with speedup calculations **Detailed results:** See [benchmarks/BENCHMARK_RESULTS.md](benchmarks/BENCHMARK_RESULTS.md) for complete data. ## When Should You Use Orlando? ### βœ… Great for: - **Large datasets** (>1000 elements) - More data = bigger performance wins - **Complex pipelines** (3+ operations) - Single-pass execution shines - **Early termination** scenarios - `take`, `takeWhile`, find first N - **Memory-constrained environments** - No intermediate allocations - **Performance-critical code** - WASM-powered native speed - **Reusable transformation logic** - Define pipelines once, use many times ### ⚠️ Consider array methods for: - **Small datasets** (<100 elements) - Overhead may not be worth it - **Single operations** - `array.map(fn)` is simpler than a pipeline - **Prototyping** - Array methods are more familiar during development - **Operations requiring all data** - e.g., `sort`, `reverse` (Orlando doesn't optimize these) ## Documentation - **[JavaScript/TypeScript API](docs/api/JAVASCRIPT.md)** - Complete API reference - **[Hybrid Composition Guide](docs/HYBRID_COMPOSITION.md)** - Combining transducers with multi-input operations - **[Migration Guide](docs/api/MIGRATION.md)** - Convert from array methods to Orlando - **[WASM Boundary Performance](docs/WASM_BOUNDARY_PERFORMANCE.md)** - Deep dive: Why every instruction counts - **[Optimization Guide](docs/OPTIMIZATIONS.md)** - Performance optimizations and best practices - **[Fusion Optimization](docs/FUSION_OPTIMIZATION.md)** - How Mapβ†’Filter chains are automatically optimized - **[Examples](examples/)** - Interactive demos and real-world use cases ## Category Theory Foundation For those interested in the mathematical underpinnings: Transducers are **natural transformations between fold functors**. A transducer transforms a reducing function: ``` βˆ€Acc. ((Acc, Out) -> Acc) -> ((Acc, In) -> Acc) ``` This foundation guarantees: - **Identity law**: `id ∘ f = f ∘ id = f` - **Associativity**: `(f ∘ g) ∘ h = f ∘ (g ∘ h)` The library includes comprehensive property-based tests verifying these laws. ## Development ### For Rust Developers Orlando can also be used as a native Rust library: ```toml [dependencies] orlando = "0.1.0" ``` ```rust use orlando::*; let pipeline = Map::new(|x: i32| x * 2) .compose(Filter::new(|x: &i32| *x % 3 == 0)) .compose(Take::new(5)); let result = to_vec(&pipeline, 1..100); // result: [6, 12, 18, 24, 30] ``` **Rust collectors:** `to_vec`, `sum`, `count`, `first`, `last`, `every`, `some` ### Building from Source ```bash # Clone repository git clone https://github.com/yourusername/orlando.git cd orlando # Install Git hooks (optional but recommended) ./scripts/setup-hooks.sh # Run tests cargo test --target x86_64-unknown-linux-gnu # Build WASM for JavaScript wasm-pack build --target web # Build optimized WASM wasm-pack build --target web --release ``` ### Project Structure ``` orlando/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ lib.rs # Core library β”‚ β”œβ”€β”€ step.rs # Step monad (early termination) β”‚ β”œβ”€β”€ transducer.rs # Transducer trait & composition β”‚ β”œβ”€β”€ transforms.rs # Map, Filter, Take, etc. β”‚ β”œβ”€β”€ collectors.rs # Terminal operations β”‚ β”œβ”€β”€ simd.rs # SIMD optimizations β”‚ └── pipeline.rs # JavaScript WASM API β”œβ”€β”€ docs/api/ # API documentation β”œβ”€β”€ examples/ # Interactive HTML examples β”œβ”€β”€ tests/ # Integration & property tests └── benches/ # Performance benchmarks ``` ## Browser Compatibility Orlando works in all modern browsers with WebAssembly support: - βœ… Chrome 57+ - βœ… Firefox 52+ - βœ… Safari 11+ - βœ… Edge 16+ - βœ… Node.js 12+ (with WASM support) ## Contributing Contributions welcome! Areas we'd love help with: - Additional transformations (partition, chunk, etc.) - More SIMD optimizations - Performance benchmarks - Documentation improvements - Real-world example applications See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. ## License MIT License - see [LICENSE](LICENSE) for details. ## Why "Orlando"? Named after the bridger characters in Greg Egan's science fiction novel *Diaspora*, who facilitated transformation and change at fundamental levels. Transducers similarly transform the very nature of how we compose data operations. ## Inspiration & Resources - **Clojure's Transducers** - Original inspiration ([docs](https://clojure.org/reference/transducers)) - **Rich Hickey's Talk** - "Transducers" ([video](https://www.youtube.com/watch?v=6mTbuzafcII)) - **Category Theory for Programmers** - Mathematical foundations ([book](https://github.com/hmemcpy/milewski-ctfp-pdf)) - **wasm-bindgen** - Rust/WASM interop ([repo](https://github.com/rustwasm/wasm-bindgen)) --- <p align="center"> <strong>Transform transformations, not data.</strong> πŸš€ <br> <sub>Built with Rust β€’ Powered by WebAssembly β€’ Inspired by Category Theory</sub> </p>