syncguard

Version:

Functional TypeScript library for distributed locking across microservices. Prevents race conditions with Redis, PostgreSQL, Firestore, and custom backends. Features automatic lock management, timeout handling, and extensible architecture.

kriasoft.com/syncguard/

kriasoft/syncguard

625 lines (467 loc) • 21.2 kB

Markdown

# SyncGuard [![npm version](https://badge.fury.io/js/syncguard.svg)](https://badge.fury.io/js/syncguard) [![npm downloads](https://img.shields.io/npm/dm/syncguard.svg)](https://npmjs.com/package/syncguard) [![Coverage](https://codecov.io/gh/kriasoft/syncguard/branch/main/graph/badge.svg)](https://app.codecov.io/gh/kriasoft/syncguard) [![Discord](https://img.shields.io/discord/643523529131950086?label=Discord&logo=discord&logoColor=white)](https://discord.gg/EnbEa7Gsxg) TypeScript distributed lock library that prevents race conditions across services. Supports Redis, PostgreSQL, and Firestore backends with automatic cleanup, fencing tokens, and bulletproof concurrency control. ## Documentation - **Docs site:** https://kriasoft.com/syncguard/ - **Backend guides:** [Redis](./redis/README.md) · [PostgreSQL](./postgres/README.md) · [Firestore](./firestore/README.md) ## Requirements - **Node.js** ≥20.0.0 (targets AsyncDisposable/`await using`; older runtimes require try/finally plus a polyfill, but official support is 20+) ## Compatibility | Runtime / Backend | Support | | ------------------ | --------------------------------------------------------------------------- | | Node.js | 20+ (native AsyncDisposable/`await using`) | | Bun | 1.0+ (used for `bun test`) | | Redis backend | Redis 6+ with `ioredis` ^5 peer dependency | | PostgreSQL backend | PostgreSQL 12+ with `postgres` ^3 peer dependency | | Firestore backend | `@google-cloud/firestore` ^8 peer dependency (emulator supported for tests) | ## Installation SyncGuard is backend-agnostic. Install the base package plus any backends you need: ```bash # Base package (always required) npm install syncguard # Choose one or more backends (optional peer dependencies): npm install syncguard ioredis # Redis backend npm install syncguard postgres # PostgreSQL backend npm install syncguard @google-cloud/firestore # Firestore backend ``` Only install the backend packages you actually use. If you attempt to use a backend without its package installed, you'll get a clear error message. ## Usage ### Quick Start (Redis) ```typescript import { createLock } from "syncguard/redis"; import Redis from "ioredis"; const redis = new Redis(); const lock = createLock(redis); // Prevent duplicate payment processing await lock( async () => { const payment = await getPayment(paymentId); if (payment.status === "pending") { await processPayment(payment); await updatePaymentStatus(paymentId, "completed"); } }, { key: `payment:${paymentId}`, ttlMs: 60000 }, ); ``` ### Using PostgreSQL ```typescript import { createLock, setupSchema } from "syncguard/postgres"; import postgres from "postgres"; const sql = postgres("postgresql://localhost:5432/myapp"); // Setup schema (once, during initialization) await setupSchema(sql); // Create lock function (synchronous) const lock = createLock(sql); await lock( async () => { // Your critical section }, { key: "resource:123" }, ); ``` ### Using Firestore ```typescript import { createLock } from "syncguard/firestore"; import { Firestore } from "@google-cloud/firestore"; const db = new Firestore(); const lock = createLock(db); await lock( async () => { // Your critical section }, { key: "resource:123" }, ); ``` ### Manual Lock Control with Automatic Cleanup Node.js 20+ supports `await using` natively; for older runtimes, drop to try/finally (see below). Use `await using` for automatic cleanup on all code paths (Node.js ≥20): ```typescript import { createRedisBackend } from "syncguard/redis"; import Redis from "ioredis"; const redis = new Redis(); const backend = createRedisBackend(redis); // Lock automatically released on scope exit { await using lock = await backend.acquire({ key: "batch:daily-report", ttlMs: 300000, // 5 minutes }); if (lock.ok) { // TypeScript narrows lock to include handle methods after ok check const { lockId, fence } = lock; // lockId for ownership checks, fence for stale lock protection await generateDailyReport(fence); // Extend lock for long-running tasks await lock.extend(300000); await sendReportEmail(); // Lock released automatically here } else { console.log("Resource is locked by another process"); } } ``` **For older runtimes (Node.js <20)**, use try/finally: ```typescript const result = await backend.acquire({ key: "batch:daily-report", ttlMs: 300000, }); if (result.ok) { try { const { lockId, fence } = result; await generateDailyReport(fence); const extended = await backend.extend({ lockId, ttlMs: 300000 }); if (!extended.ok) { throw new Error("Failed to extend lock"); } await sendReportEmail(); } finally { await backend.release({ lockId: result.lockId }); } } else { console.log("Resource is locked by another process"); } ``` **Error callbacks** for disposal failures: ```typescript const backend = createRedisBackend(redis, { onReleaseError: (error, context) => { logger.error("Failed to release lock", { error, lockId: context.lockId, key: context.key, }); }, }); // All acquisitions automatically use the error callback await using lock = await backend.acquire({ key: "resource", ttlMs: 30000 }); ``` **Note:** SyncGuard provides a safe-by-default error handler that automatically logs disposal failures in development mode (`NODE_ENV !== 'production'`). In production, enable logging with `SYNCGUARD_DEBUG=true` or provide a custom `onReleaseError` callback integrated with your observability stack. ## Configuration ### Lock Options ```typescript import { createLock } from "syncguard/redis"; import Redis from "ioredis"; const redis = new Redis(); const lock = createLock(redis); // All lock options shown with their defaults await lock( async () => { // Your critical section }, { key: "resource:123", // Required: unique identifier ttlMs: 30000, // Lock duration in milliseconds (default: 30s) acquisition: { timeoutMs: 5000, // Max acquisition wait (default: 5s) maxRetries: 10, // Retry attempts (default: 10) retryDelayMs: 100, // Initial retry delay (default: 100ms) backoff: "exponential", // Strategy: "exponential" | "fixed" (default: "exponential") jitter: "equal", // Strategy: "equal" | "full" | "none" (default: "equal") }, }, ); ``` **Backoff & Jitter Strategy:** - `backoff: "exponential"` - Double the delay each retry (100ms → 200ms → 400ms...). Recommended for handling contention gracefully. - `backoff: "fixed"` - Keep delay constant (100ms → 100ms → 100ms). - `jitter: "equal"` - Add ±50% random variance. Prevents thundering herd in high-contention scenarios. - `jitter: "full"` - Add 0-100% random variance. Maximum randomization. - `jitter: "none"` - No randomization. **Timeout Behavior:** The `timeoutMs` is a hard limit for the entire acquisition loop. If the lock hasn't been acquired within `timeoutMs` milliseconds, `AcquisitionTimeout` error is thrown. ### Backend Configuration ```typescript // Redis const lock = createLock(redis, { keyPrefix: "myapp", // Default: "syncguard" }); // PostgreSQL await setupSchema(sql, { tableName: "app_locks", fenceTableName: "app_fences", }); const lock = createLock(sql, { tableName: "app_locks", // Default: "syncguard_locks" fenceTableName: "app_fences", // Default: "syncguard_fence_counters" // ⚠️ Use the same table names in both setupSchema and createLock }); // Firestore const lock = createLock(db, { collection: "app_locks", // Default: "locks" fenceCollection: "app_fences", // Default: "fence_counters" }); ``` ### Backend-Specific Setup **PostgreSQL:** Call `setupSchema(sql)` once during initialization to create required tables and indexes. **Firestore:** Ensure the single-field index on `lockId` remains enabled (Firestore creates these by default). If you have disabled single-field indexes, add one: ```bash gcloud firestore indexes create --collection-group=locks --field-config=field-path=lockId,order=ASCENDING ``` See [Firestore setup guide](./firestore/README.md) for details. ## Error Handling ```typescript import { LockError } from "syncguard"; try { await lock( async () => { // Critical section }, { key: "resource:123" }, ); } catch (error) { if (error instanceof LockError) { console.error(`Lock error [${error.code}]:`, error.message); // Handle specific error codes switch (error.code) { case "AcquisitionTimeout": // Retry timeout exceeded break; case "ServiceUnavailable": // Backend temporarily unavailable break; // ... other cases } } } ``` ### Error Codes SyncGuard throws `LockError` with one of these error codes: | Code | Meaning | When Thrown | | -------------------- | -------------------------------------- | --------------------------------------------------------- | | `AcquisitionTimeout` | Retry loop exceeded timeoutMs | Lock acquisition didn't succeed within the timeout window | | `ServiceUnavailable` | Backend service unavailable | Network failures, service unreachable, 5xx responses | | `AuthFailed` | Authentication or authorization failed | Invalid credentials, insufficient permissions for backend | | `InvalidArgument` | Invalid argument or malformed request | Invalid key/lockId format, bad configuration | | `RateLimited` | Rate limit exceeded on backend | Quota exceeded, throttling applied by backend | | `NetworkTimeout` | Network operation timed out | Client-side or network timeouts to backend | | `Aborted` | Operation cancelled via AbortSignal | User-initiated cancellation during acquisition | | `Internal` | Unexpected backend error | Unclassified failures, server-side errors | ## Common Patterns ### Preventing Duplicate Job Processing ```typescript const processJob = async (jobId: string) => { await lock( async () => { const job = await getJob(jobId); if (job.status === "pending") { await executeJob(job); await markJobComplete(jobId); } }, { key: `job:${jobId}`, ttlMs: 300000 }, ); }; ``` ### Rate Limiting ```typescript const backend = createRedisBackend(redis); const checkRateLimit = async (userId: string) => { const result = await backend.acquire({ key: `rate:${userId}`, ttlMs: 60000, // 1 minute window }); if (!result.ok) { throw new Error("Rate limit exceeded"); } // Note: Intentionally NOT releasing the lock here! // The lock auto-expires after ttlMs, preventing the same user from // acquiring another lock within that window. This implements basic // rate limiting without manual release overhead. return performOperation(userId); }; ``` **Important:** This pattern intentionally doesn't release the lock. It's appropriate for rate-limiting because: - The lock auto-expires after `ttlMs`, naturally cleaning up without needing explicit release - Other users trying to acquire the same key before expiration will fail, enforcing the rate limit - This is different from critical section protection, where you always release the lock after the operation completes ## Key Concepts ### Fencing Tokens Fencing tokens are monotonic counters that prevent stale writes in distributed systems. Each successful lock acquisition returns a `fence` token—a 15-digit zero-padded decimal string that increases with each new acquisition of the same resource. **Why use them?** In distributed systems, a slow operation might complete after its lock has expired and been acquired by another process. Without fencing tokens, a stale write from the old operation could corrupt data. With fencing tokens, the backend/application can reject stale writes. **Usage pattern:** ```typescript const { fence } = await backend.acquire({ key: "resource:123", ttlMs: 30000 }); // Use fence when performing operations on the backend await updateResource(resourceId, newValue, fence); // Backend verifies fence before accepting ``` **Ownership checking (diagnostic use only):** ```typescript import { owns, getByKey } from "syncguard"; // Check if you still own the lock (for monitoring/diagnostics) const stillOwned = await owns(backend, lockId); // Get lock info by resource key // Returns { lockId, fence, expiresAtMs } or undefined if not locked const info = await getByKey(backend, "resource:123"); if (info) { console.log(`Lock expires in ${info.expiresAtMs - Date.now()}ms`); } ``` **⚠️ Important:** Don't check ownership before calling `release()` or `extend()`. These operations are safe to call without pre-checking—they return `{ ok: false }` if the lock was already released or expired. ### Lock TTL (Time-to-Live) The `ttlMs` parameter controls automatic lock expiration. Locks automatically expire if not released before the TTL elapses, providing critical protection against process crashes. **Recommended TTL values:** - **Short operations** (< 10 seconds): `ttlMs: 30000` (30 seconds) - **Medium operations** (10-60 seconds): `ttlMs: 120000` (2 minutes) - **Long-running tasks** (> 1 minute): Use `extend()` to periodically renew instead of setting a very long TTL **For long-running operations, use heartbeat pattern:** ```typescript const { lockId, fence } = await backend.acquire({ key: "batch:long-report", ttlMs: 30000, // Short TTL }); try { // Periodically extend the lock every 20 seconds const heartbeat = setInterval(async () => { const extended = await backend.extend({ lockId, ttlMs: 30000 }); if (!extended.ok) { console.warn("Failed to extend lock, stopping operation"); clearInterval(heartbeat); } }, 20000); await performLongRunningOperation(); clearInterval(heartbeat); } finally { await backend.release({ lockId }); } ``` ### Time Authority and Clock Synchronization Different backends use different time sources, which affects consistency guarantees: **Redis (Server Time Authority):** - Uses Redis server time for all lock expirations - Highest consistency guarantee—no client-side clock synchronization needed - Ideal for high-consistency use cases and multi-region deployments **PostgreSQL (Server Time Authority):** - Uses PostgreSQL server time for lock expirations - Similar consistency to Redis - No client-side clock sync required **Firestore (Client Time Authority):** - Uses client-side `Date.now()` for lock expirations - ⚠️ Requires NTP synchronization on all clients (critical!) - If client clocks drift >1000ms, locks may behave unexpectedly - Operational monitoring of client clock health is essential for production deployments ## Features - 🔒 **Bulletproof concurrency** - Atomic operations prevent race conditions - 🛡️ **Fencing tokens** - Monotonic counters protect against stale writes - 🧹 **Automatic cleanup** - TTL-based expiration + `await using` (AsyncDisposable) support - 🔄 **Backend flexibility** - Redis (performance), PostgreSQL (zero overhead), or Firestore (serverless) - 🔁 **Smart retries** - Exponential backoff with jitter handles contention - 💙 **TypeScript-first** - Full type safety with compile-time guarantees - 📊 **Optional telemetry** - Opt-in observability via decorator pattern ## Troubleshooting ### Locks Not Being Released **Symptoms:** Locks remain held longer than expected, or onReleaseError callbacks show disposal errors. **Diagnosis:** 1. Check that `await using` blocks complete or try/finally blocks execute `release()` 2. Look for infinite loops or unhandled promise rejections 3. Review `onReleaseError` callback logs for specific errors **Solutions:** ```typescript // ✓ Correct: Lock always released await using lock = await backend.acquire({ key: "resource", ttlMs: 30000 }); // Lock released here even if error occurs // ✗ Wrong: Infinite loop prevents release await using lock = await backend.acquire({ key: "resource", ttlMs: 30000 }); while (true) { // Never exits! await someOperation(); } ``` ### Lock Acquisition Times Out **Symptoms:** `AcquisitionTimeout` errors when trying to acquire locks. **Diagnosis:** 1. Is the `timeoutMs` value too short for your contention level? 2. Is the resource legitimately locked by another process? 3. Are there network issues to the backend? **Solutions:** ```typescript // Increase timeout for high-contention resources await lock(fn, { key: "hot-resource", ttlMs: 30000, acquisition: { timeoutMs: 30000, // Was 5000, now 30 seconds maxRetries: 20, }, }); // Or use exponential backoff with jitter (default) // This handles contention more gracefully ``` ### Backend Connection Failures **Symptoms:** `ServiceUnavailable` or `NetworkTimeout` errors. **Diagnosis:** 1. Verify backend is running and accessible 2. Check network connectivity from your application 3. Review backend logs for errors **For Redis:** Verify `redis-cli ping` returns PONG **For PostgreSQL:** Verify `psql -U user -h localhost` connects successfully **For Firestore:** Verify credentials and emulator status if using emulator ### Multiple Processes Acquiring the Same Lock **Symptoms:** Two processes seem to hold the same lock simultaneously. **Diagnosis:** 1. Are all processes using the same lock key? 2. Did the first process release the lock? 3. Did the TTL expire before release? **Verify lock ownership:** ```typescript import { owns } from "syncguard"; // Check who currently owns this lock const stillOwned = await owns(backend, lockId); if (!stillOwned) { console.warn("Lock was released or expired"); } // Get lock info by key const info = await getByKey(backend, "payment:123"); if (info && info.expiresAtMs > Date.now()) { console.log("Lock is currently held"); } ``` ### Performance Issues **Symptoms:** Lock operations are slow or cause high backend load. **Considerations:** - **Contention:** High contention naturally causes retries. Consider sharding keys. - **TTL:** Very long TTLs increase backend memory usage - **Polling:** Frequent `isLocked()` or `getByKey()` calls can overload the backend **Solutions:** ```typescript // Shard hot keys across multiple lock instances const shardIndex = hashUserId(userId) % 10; const lock = await backend.acquire({ key: `payment:${shardIndex}:${userId}`, ttlMs: 30000, }); // Use smart retry strategy acquisition: { backoff: 'exponential', // Reduce retry frequency over time jitter: 'equal', // Prevent thundering herd maxRetries: 20, retryDelayMs: 100, } ``` ## Development - `bun test test/unit` — fast unit tests - `bun test test/contracts test/e2e` — contracts + e2e suite - `npm run build` — type-check and emit `dist/` - `npm run redis` / `npm run firestore` — spin up local Redis or Firestore emulator for tests ## Contributing We welcome contributions! Here's how you can help: - 🐛 **Bug fixes** - Include test cases - 🚀 **New backends** - Follow [docs/specs/interface.md](./docs/specs/interface.md) - 📖 **Documentation** - Examples, guides, troubleshooting - 📋 **Spec reviews** - Validate specs match implementation, propose improvements - ✅ **Tests** - Improve coverage See [CONTRIBUTING.md](.github/CONTRIBUTING.md) for detailed guidelines. ## Support & Documentation - **Docs**: [Full documentation](https://kriasoft.com/syncguard/) - **Specs**: [Technical specifications](./docs/specs/) - Architecture decisions and backend requirements - **Discord**: [Join our community](https://discord.gg/EnbEa7Gsxg) - **Issues**: [GitHub Issues](https://github.com/kriasoft/syncguard/issues) ## Backers <a href="https://reactstarter.com/b/1"><img src="https://reactstarter.com/b/1.png" height="60" /></a>  <a href="https://reactstarter.com/b/2"><img src="https://reactstarter.com/b/2.png" height="60" /></a>  <a href="https://reactstarter.com/b/3"><img src="https://reactstarter.com/b/3.png" height="60" /></a>  <a href="https://reactstarter.com/b/4"><img src="https://reactstarter.com/b/4.png" height="60" /></a>  <a href="https://reactstarter.com/b/5"><img src="https://reactstarter.com/b/5.png" height="60" /></a>  <a href="https://reactstarter.com/b/6"><img src="https://reactstarter.com/b/6.png" height="60" /></a>  <a href="https://reactstarter.com/b/7"><img src="https://reactstarter.com/b/7.png" height="60" /></a>  <a href="https://reactstarter.com/b/8"><img src="https://reactstarter.com/b/8.png" height="60" /></a> ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.