UNPKG

@ai2070/l0

Version:

L0: The Missing Reliability Substrate for AI

1,277 lines (1,032 loc) 74.7 kB
# L0 - Deterministic Streaming Execution Substrate (DSES) for AI ### The missing reliability and observability layer for all AI streams. ![L0: The Missing AI Reliability Substrate](img/l0-banner.jpg) <p align="center"> <a href="https://www.npmjs.com/package/@ai2070/l0"> <img src="https://img.shields.io/npm/v/@ai2070/l0?color=brightgreen&label=npm" alt="npm version"> </a> <a href="https://bundlephobia.com/package/@ai2070/l0"> <img src="https://img.shields.io/bundlephobia/minzip/@ai2070/l0?label=minzipped" alt="minzipped size"> </a> <a href="https://packagephobia.com/result?p=@ai2070/l0"> <img src="https://packagephobia.com/badge?p=@ai2070/l0" alt="install size"> </a> <img src="https://img.shields.io/badge/types-included-blue?logo=typescript&logoColor=white" alt="Types Included"> <a href="https://github.com/ai-2070/l0/actions"> <img src="https://img.shields.io/github/actions/workflow/status/ai-2070/l0/ci.yml?label=tests" alt="CI status"> </a> <img src="https://img.shields.io/badge/license-Apache_2.0-green" alt="MIT License"> </p> > LLMs produce high-value reasoning over a low-integrity transport layer. > Streams stall, drop tokens, reorder events, violate timing guarantees, and expose no deterministic contract. > > This breaks retries. It breaks supervision. It breaks reproducibility. > It makes reliable AI systems impossible to build on top of raw provider streams. > > **L0 is the deterministic execution substrate that fixes the transport - with guardrails designed for the streaming layer itself: stream-neutral, pattern-based, loop-safe, and timing-aware.** L0 adds deterministic execution, fallbacks, retries, network protection, guardrails, drift detection, and tool tracking to any LLM stream - turning raw model output into production-grade behavior. It works with **OpenAI**, **Vercel AI SDK**, **Mastra AI**, and **custom adapters**. Supports **multimodal streams**, tool calls, and provides full deterministic replay. ```bash npm install @ai2070/l0 ``` _Production-grade reliability. Just pass your stream. L0'll take it from here._ L0 includes 2,600+ tests covering all major reliability features. ``` Any AI Stream L0 Layer Your App ───────────────── ┌──────────────────────────────────────┐ ───────────── │ │ Vercel AI SDK │ Retry · Fallback · Resume │ Reliable OpenAI / Mastra ──▶│ Guardrails · Timeouts · Consensus │─────▶ Output Custom Streams │ Full Observability │ │ │ └──────────────────────────────────────┘ ───────────────── ───────────── text / image / L0 = Token-Level Reliability video / audio ``` **Upcoming versions:** - **1.0.0** - API freeze + Python version **Bundle sizes (minified):** | Import | Size | Gzipped | Description | | ----------------------- | ----- | ------- | ------------------------ | | `@ai2070/l0` (full) | 191KB | 56KB | Everything | | `@ai2070/l0/core` | 71KB | 21KB | Runtime + retry + errors | | `@ai2070/l0/structured` | 61KB | 18KB | Structured output | | `@ai2070/l0/consensus` | 72KB | 21KB | Multi-model consensus | | `@ai2070/l0/parallel` | 58KB | 17KB | Parallel/race operations | | `@ai2070/l0/window` | 62KB | 18KB | Document chunking | | `@ai2070/l0/guardrails` | 18KB | 6KB | Validation rules | | `@ai2070/l0/monitoring` | 27KB | 7KB | OTel/Sentry | | `@ai2070/l0/drift` | 4KB | 2KB | Drift detection | Dependency-free. Tree-shakeable subpath exports for minimal bundles. > Most applications should simply use `import { l0 } from "@ai2070/l0"`. > Only optimize imports if you're targeting edge runtimes or strict bundle constraints. ## Features | Feature | Description | | ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **🔁 Smart Retries** | Model-aware retries with fixed-jitter backoff. Automatic retries for zero-token output, network stalls, SSE disconnects, and provider overloads. | | **🌐 Network Protection** | Automatic recovery from dropped streams, slow responses, backgrounding, 429/503 load shedding, DNS errors, and partial chunks. | | **🔀 Model Fallbacks** | Automatically fallback to secondary models (e.g., 4o → 4o-mini → Claude/Gemini) with full retry logic. | | **💥 Zero-Token/Stall Protection** | Detects when model produces nothing or stalls mid-stream. Automatically retries or switches to fallbacks. | | 📍 **Last-Known-Good Token Resumption** | When a stream interrupts, L0 resumes generation from the last structurally valid token (Opt-in). | | **🧠 Drift Detection** | Detects tone shifts, duplicated sentences, entropy spikes, markdown collapse, and meta-AI patterns before corruption. | | **🧱 Structured Output** | Guaranteed-valid JSON with Zod (v3/v4), Effect Schema, or JSON Schema. Auto-corrects missing braces, commas, and markdown fences. | | **🩹 JSON Auto-Healing + Markdown Fence Repair** | Automatic correction of truncated or malformed JSON (missing braces, brackets, quotes), and repair of broken Markdown code fences. Ensures clean extraction of structured data from noisy LLM output. | | **🛡️ Guardrails** | JSON, Markdown, LaTeX, and pattern validation with fast/slow path execution. Delta-only checks run sync; full-content scans defer to async to never block streaming. | | **⚡ Race: Fastest-Model Wins** | Run multiple models or providers in parallel and return the fastest valid stream. Ideal for ultra-low-latency chat and high-availability systems. | | **🌿 Parallel: Fan-Out / Fan-In** | Start multiple streams simultaneously and collect structured or summarized results. Perfect for agent-style multi-model workflows. | | **🔗 Pipe: Streaming Pipelines** | Compose multiple streaming steps (e.g., summarize → refine → translate) with safe state passing and guardrails between each stage. | | **🧩 Consensus: Agreement Across Models** | Combine multiple model outputs using unanimous, weighted, or best-match consensus. Guarantees high-confidence generation for safety-critical tasks. | | **📄 Document Windows** | Built-in chunking (token, paragraph, sentence, character). Ideal for long documents, transcripts, or multi-page processing. | | **🎨 Formatting Helpers** | Extract JSON/code from markdown fences, strip thinking tags, normalize whitespace, and clean LLM output for downstream processing. | | **📊 Monitoring** | Built-in integrations with OpenTelemetry and Sentry for metrics, tracing, and error tracking. | | **🔔 Lifecycle Callbacks** | `onStart`, `onComplete`, `onError`, `onEvent`, `onViolation`, `onRetry`, `onFallback`, `onToolCall` - full observability into every stream phase. | | **📡 Streaming-First Runtime** | Thin, deterministic wrapper over `streamText()` with unified event types (`token`, `error`, `complete`) for easy UIs. | | **📼 Atomic Event Logs** | Record every token, retry, fallback, and guardrail check as immutable events. Full audit trail for debugging and compliance. | | **🔄 Byte-for-Byte Replays** | Deterministically replay any recorded stream to reproduce exact output. Perfect for testing, and time-travel debugging. | | **⛔ Safety-First Defaults** | Continuation off by default. Structured objects never resumed. No silent corruption. Integrity always preserved. | | **⚡ Tiny & Explicit** | 21KB gzipped core. Tree-shakeable with subpath exports (`/core`, `/structured`, `/consensus`, `/parallel`, `/window`). No frameworks, no heavy abstractions. | | **🔌 Custom Adapters (BYOA)** | Bring your own adapter for any LLM provider. Built-in adapters for Vercel AI SDK, OpenAI, and Mastra. | | **🖼️ Multimodal Support** | Build adapters for image/audio/video generation (FLUX.2, Stable Diffusion, Veo 3, CSM). Progress tracking, data events, and state management for non-text outputs. | | **🧪 Battle-Tested** | 2,600+ unit tests and 250+ integration tests validating real streaming, retries, and advanced behavior. | ## Quick Start ### With Vercel AI SDK: Minimal Usage ```typescript import { l0, recommendedGuardrails, recommendedRetry } from "@ai2070/l0"; import { streamText } from "ai"; import { openai } from "@ai-sdk/openai"; const result = await l0({ // Primary model stream stream: () => streamText({ model: openai("gpt-5-mini"), prompt, }), }); // Read the stream for await (const event of result.stream) { ``` ### Vercel AI SDK: Expanded ```typescript import { l0, recommendedGuardrails, recommendedRetry } from "@ai2070/l0"; import { streamText } from "ai"; import { openai } from "@ai-sdk/openai"; const result = await l0({ // Primary model stream stream: () => streamText({ model: openai("gpt-5-mini"), prompt, }), // Optional: Fallback models fallbackStreams: [() => streamText({ model: openai("gpt-5-mini"), prompt })], // Optional: Guardrails, default: none guardrails: recommendedGuardrails, // Other presets: // minimalGuardrails // jsonRule, zeroOutputRule // recommendedGuardrails // jsonRule, markdownRule, zeroOutputRule, patternRule // strictGuardrails // jsonRule, markdownRule, latexRule, patternRule, zeroOutputRule // jsonOnlyGuardrails // jsonRule, zeroOutputRule // markdownOnlyGuardrails // markdownRule, zeroOutputRule // latexOnlyGuardrails // latexRule, zeroOutputRule // Optional: Retry configuration, default as follows retry: { attempts: 3, // LLM errors only maxRetries: 6, // Total (LLM + network) baseDelay: 1000, maxDelay: 10000, backoff: "fixed-jitter", // "exponential" | "linear" | "fixed" | "full-jitter" }, // Or use presets: // minimalRetry // { attempts: 2, maxRetries: 4, backoff: "linear" } // recommendedRetry // { attempts: 3, maxRetries: 6, backoff: "fixed-jitter" } // strictRetry // { attempts: 3, maxRetries: 6, backoff: "full-jitter" } // exponentialRetry // { attempts: 4, maxRetries: 8, backoff: "exponential" } // Optional: Timeout configuration, default as follows timeout: { initialToken: 5000, // 5s to first token interToken: 10000, // 10s between tokens }, // Optional: Guardrail check intervals, default as follows checkIntervals: { guardrails: 5, // Check every N tokens drift: 10, checkpoint: 10, }, // Optional: User context (attached to all observability events) context: { requestId: "req_123", userId: "user_456" }, // Optional: Abort signal signal: abortController.signal, // Optional: Enable telemetry monitoring: { enabled: true }, // Optional: Lifecycle callbacks (all are optional) onStart: (attempt, isRetry, isFallback) => {}, onComplete: (state) => {}, onError: (error, willRetry, willFallback) => {}, onViolation: (violation) => {}, onRetry: (attempt, reason) => {}, onFallback: (index, reason) => {}, onToolCall: (toolName, toolCallId, args) => {}, }); // Read the stream for await (const event of result.stream) { if (event.type === "token") { process.stdout.write(event.value); } } ``` **See Also: [API.md](./API.md) - Complete API reference** ### With OpenAI SDK ```typescript import OpenAI from "openai"; import { l0, openaiStream, recommendedGuardrails } from "@ai2070/l0"; const openai = new OpenAI(); const result = await l0({ stream: openaiStream(openai, { model: "gpt-4o", messages: [{ role: "user", content: "Generate a haiku about coding" }], }), guardrails: recommendedGuardrails, }); for await (const event of result.stream) { if (event.type === "token") process.stdout.write(event.value); } ``` ### With Mastra AI ```typescript import { Agent } from "@mastra/core/agent"; import { l0, mastraStream, recommendedGuardrails } from "@ai2070/l0"; const agent = new Agent({ name: "haiku-writer", instructions: "You are a poet who writes haikus", model: "openai/gpt-4o", }); const result = await l0({ stream: mastraStream(agent, "Generate a haiku about coding"), guardrails: recommendedGuardrails, }); for await (const event of result.stream) { if (event.type === "token") process.stdout.write(event.value); } ``` ## Core Features | Feature | Description | | --------------------------------------------------------------------- | --------------------------------------------------------------- | | [Streaming Runtime](#streaming-runtime) | Token-by-token normalization, checkpoints, resumable generation | | [Retry Logic](#retry-logic) | Smart retries with backoff, network vs model error distinction | | [Network Protection](#network-protection) | Auto-recovery from 12+ network failure types | | [Structured Output](#structured-output) | Guaranteed valid JSON with Zod, Effect Schema, or JSON Schema | | [Fallback Models](#fallback-models) | Sequential fallback when primary model fails | | [Document Windows](#document-windows) | Automatic chunking for long documents | | [Formatting Helpers](#formatting-helpers) | Context, memory, tools, and output formatting utilities | | [Last-Known-Good Token Resumption](#last-known-good-token-resumption) | Resume from last checkpoint on retry/fallback (opt-in) | | [Guardrails](#guardrails) | JSON, Markdown, LaTeX validation, pattern detection | | [Consensus](#consensus) | Multi-model agreement with voting strategies | | [Parallel Operations](#parallel-operations) | Race, batch, pool patterns for concurrent LLM calls | | [Type-Safe Generics](#type-safe-generics) | Forward output types through all L0 functions | | [Custom Adapters (BYOA)](#custom-adapters-byoa) | Bring your own adapter for any LLM provider | | [Multimodal Support](#multimodal-support) | Image, audio, video generation with progress tracking | | [Lifecycle Callbacks](#lifecycle-callbacks) | Full observability into every stream phase | | [Event Sourcing](#event-sourcing) | Record/replay streams for testing and audit trails | | [Error Handling](#error-handling) | Typed errors with categorization and recovery hints | | [Monitoring](#monitoring) | Built-in OTel and Sentry integrations | | [Testing](#testing) | 2,600+ tests covering all features and SDK adapters | --- ## Streaming Runtime L0 wraps `streamText()` with deterministic behavior: ```typescript const result = await l0({ stream: () => streamText({ model, prompt }), // Optional: Timeouts (ms) timeout: { initialToken: 5000, // 5s to first token interToken: 10000, // 10s between tokens }, signal: abortController.signal, }); // Unified event format for await (const event of result.stream) { switch (event.type) { case "token": console.log(event.value); break; case "complete": console.log("Complete"); break; case "error": console.error(event.error, event.reason); // reason: ErrorCategory break; } } // Access final state console.log(result.state.content); // Full accumulated content console.log(result.state.tokenCount); // Total tokens received console.log(result.state.checkpoint); // Last stable checkpoint ``` ⚠️ Free and low-priority models may take **3–7 seconds** before emitting the first token and **10 seconds** between tokens. --- ## Retry Logic Smart retry system that distinguishes network errors from model errors: ```typescript const result = await l0({ stream: () => streamText({ model, prompt }), retry: { attempts: 3, // Model errors only (default: 3) maxRetries: 6, // Absolute cap across all error types (default: 6) baseDelay: 1000, maxDelay: 10000, backoff: "fixed-jitter", // or "exponential", "linear", "fixed", "full-jitter" // Optional: specify which error types to retry on, defaults to all recoverable errors retryOn: [ "zero_output", "guardrail_violation", "drift", "incomplete", "network_error", "timeout", "rate_limit", "server_error", ], // Custom delays per error type (overrides baseDelay) errorTypeDelays: { connectionDropped: 2000, timeout: 1500, dnsError: 5000, }, }, }); ``` ### Retry Behavior | Error Type | Category | Retries | Counts Toward `attempts` | Counts Toward `maxRetries` | | -------------------- | ----------- | ------- | ------------------------ | -------------------------- | | Network disconnect | `NETWORK` | Yes | No | Yes | | Zero output | `CONTENT` | Yes | **Yes** | Yes | | Timeout | `TRANSIENT` | Yes | No | Yes | | 429 rate limit | `TRANSIENT` | Yes | No | Yes | | 503 server error | `TRANSIENT` | Yes | No | Yes | | Guardrail violation | `CONTENT` | Yes | **Yes** | Yes | | Drift detected | `CONTENT` | Yes | **Yes** | Yes | | Model error | `MODEL` | Yes | **Yes** | Yes | | Auth error (401/403) | `FATAL` | No | - | - | | Invalid config | `INTERNAL` | No | - | - | --- ## Network Protection Automatic detection and recovery from network failures: ```typescript import { isNetworkError, analyzeNetworkError } from "@ai2070/l0"; try { await l0({ stream, retry: recommendedRetry }); } catch (error) { if (isNetworkError(error)) { const analysis = analyzeNetworkError(error); console.log(analysis.type); // "connection_dropped", "timeout", etc. console.log(analysis.retryable); // true/false console.log(analysis.suggestion); // Recovery suggestion } } ``` Detected error types: connection dropped, fetch errors, ECONNRESET, ECONNREFUSED, SSE aborted, DNS errors, timeouts, mobile background throttle, and more. --- ## Structured Output Guaranteed valid JSON matching your schema. Supports **Zod** (v3/v4), **Effect Schema**, and **JSON Schema**: ### With Zod ```typescript import { structured } from "@ai2070/l0"; import { z } from "zod"; const schema = z.object({ name: z.string(), age: z.number(), email: z.string().email(), }); const result = await structured({ schema, stream: () => streamText({ model, prompt: "Generate user data as JSON" }), autoCorrect: true, // Fix trailing commas, missing braces, etc. }); // Type-safe access console.log(result.data.name); // string console.log(result.data.age); // number console.log(result.corrected); // true if auto-corrected ``` ### With Effect Schema ```typescript import { structured, registerEffectSchemaAdapter, wrapEffectSchema, } from "@ai2070/l0"; import { Schema } from "effect"; // Register the adapter once at app startup registerEffectSchemaAdapter({ decodeUnknownSync: (schema, data) => Schema.decodeUnknownSync(schema)(data), decodeUnknownEither: (schema, data) => { try { return { _tag: "Right", right: Schema.decodeUnknownSync(schema)(data) }; } catch (error) { return { _tag: "Left", left: { _tag: "ParseError", issue: error, message: error.message }, }; } }, formatError: (error) => error.message, }); // Define schema with Effect const schema = Schema.Struct({ name: Schema.String, age: Schema.Number, email: Schema.String, }); // Use with structured() const result = await structured({ schema: wrapEffectSchema(schema), stream: () => streamText({ model, prompt: "Generate user data as JSON" }), autoCorrect: true, }); console.log(result.data.name); // string - fully typed ``` ### With JSON Schema ```typescript import { structured, registerJSONSchemaAdapter, wrapJSONSchema, } from "@ai2070/l0"; import Ajv from "ajv"; // Or any JSON Schema validator // Register adapter once at app startup (example with Ajv) const ajv = new Ajv({ allErrors: true }); registerJSONSchemaAdapter({ validate: (schema, data) => { const validate = ajv.compile(schema); const valid = validate(data); if (valid) return { valid: true, data }; return { valid: false, errors: (validate.errors || []).map((e) => ({ path: e.instancePath || "/", message: e.message || "Validation failed", keyword: e.keyword, params: e.params, })), }; }, formatErrors: (errors) => errors.map((e) => `${e.path}: ${e.message}`).join(", "), }); // Define schema with JSON Schema const schema = { type: "object", properties: { name: { type: "string" }, age: { type: "number" }, email: { type: "string", format: "email" }, }, required: ["name", "age", "email"], }; // Use with structured() const result = await structured({ schema: wrapJSONSchema<{ name: string; age: number; email: string }>(schema), stream: () => streamText({ model, prompt: "Generate user data as JSON" }), autoCorrect: true, }); console.log(result.data.name); // string - typed via generic ``` ### Helper Functions ```typescript import { structuredObject, structuredArray, structuredStream } from "@ai2070/l0"; // Quick object schema const result = await structuredObject({ name: z.string(), age: z.number() }, { stream }); // Quick array schema const result = await structuredArray( z.object({ name: z.string() }), { stream } ); // Streaming with end validation const { stream, result, abort } = await structuredStream({ schema, stream: () => streamText({ model, prompt }) }); for await (const event of stream) { if (event.type === 'token') console.log(event.value); } const validated = await result; ``` ### Structured Output Presets ```typescript import { minimalStructured, recommendedStructured, strictStructured } from "@ai2070/l0"; // minimalStructured: { autoCorrect: false, retry: { attempts: 1 } } // recommendedStructured: { autoCorrect: true, retry: { attempts: 2 } } // strictStructured: { autoCorrect: true, strictMode: true, retry: { attempts: 3 } } const result = await structured({ schema, stream, ...recommendedStructured }); ``` --- ## Fallback Models Sequential fallback when primary model fails: ```typescript const result = await l0({ stream: () => streamText({ model: openai("gpt-4o"), prompt }), fallbackStreams: [ () => streamText({ model: openai("gpt-5-nano"), prompt }), () => streamText({ model: anthropic("claude-3-haiku"), prompt }), ], }); // Check which model succeeded console.log(result.state.fallbackIndex); // 0 = primary, 1+ = fallback ``` --- ## Document Windows Process documents that exceed context limits: ```typescript import { createWindow } from "@ai2070/l0"; const window = createWindow(longDocument, { size: 2000, // Tokens per chunk overlap: 200, // Overlap between chunks strategy: "paragraph", // or "token", "sentence", "char" }); // Process all chunks const results = await window.processAll((chunk) => ({ stream: () => streamText({ model, prompt: `Summarize: ${chunk.content}`, }), })); // Or navigate manually const first = window.current(); const next = window.next(); ``` --- ## Formatting Helpers Utilities for context, memory, output instructions, and tool definitions: ```typescript import { formatContext, formatMemory, formatTool, formatJsonOutput } from "@ai2070/l0"; // Wrap documents with XML/Markdown/bracket delimiters const context = formatContext(document, { label: "Documentation", delimiter: "xml" }); // Format conversation history (conversational, structured, or compact) const memory = formatMemory(messages, { style: "conversational", maxEntries: 10 }); // Define tools with JSON schema, TypeScript, or natural language const tool = formatTool({ name: "search", description: "Search", parameters: [...] }); // Request strict JSON output const instruction = formatJsonOutput({ strict: true, schema: "..." }); ``` See [FORMATTING.md](./FORMATTING.md) for complete API reference. --- ## Last-Known-Good Token Resumption When a stream fails mid-generation, L0 can resume from the last known good checkpoint instead of starting over. This preserves already-generated content and reduces latency on retries. ```typescript const result = await l0({ stream: () => streamText({ model, prompt }), retry: { attempts: 3 }, // Enable continuation from last checkpoint (opt-in) continueFromLastKnownGoodToken: true, }); // Check if continuation was used console.log(result.state.resumed); // true if resumed from checkpoint console.log(result.state.resumePoint); // The checkpoint content console.log(result.state.resumeFrom); // Character offset where resume occurred ``` ### How It Works 1. L0 maintains a checkpoint of successfully received tokens (every N tokens, configurable via `checkIntervals.checkpoint`) 2. When a retry or fallback is triggered, the checkpoint is validated against guardrails and drift detection 3. If validation passes, the checkpoint content is emitted first to the consumer 4. The `buildContinuationPrompt` callback (if provided) is called to allow updating the prompt for continuation 5. Telemetry tracks whether continuation was enabled, used, and the checkpoint details ### Using buildContinuationPrompt To have the LLM actually continue from where it left off (rather than just replaying tokens locally), use `buildContinuationPrompt` to modify the prompt: ```typescript let continuationPrompt = ""; const originalPrompt = "Write a detailed analysis of..."; const result = await l0({ stream: () => streamText({ model: openai("gpt-4o"), prompt: continuationPrompt || originalPrompt, }), continueFromLastKnownGoodToken: true, buildContinuationPrompt: (checkpoint) => { // Update the prompt to tell the LLM to continue from checkpoint continuationPrompt = `${originalPrompt}\n\nContinue from where you left off:\n${checkpoint}`; return continuationPrompt; }, retry: { attempts: 3 }, }); ``` When LLMs continue from a checkpoint, they often repeat words from the end. L0 automatically detects and removes this overlap (enabled by default). See [API Reference](./API.md#smart-continuation-deduplication) for configuration options. ### Example: Resuming After Network Error ```typescript const result = await l0({ stream: () => streamText({ model: openai("gpt-4o"), prompt: "Write a detailed analysis of...", }), fallbackStreams: [() => streamText({ model: openai("gpt-5-nano"), prompt })], retry: { attempts: 3 }, continueFromLastKnownGoodToken: true, checkIntervals: { checkpoint: 10 }, // Save checkpoint every 10 tokens monitoring: { enabled: true }, }); for await (const event of result.stream) { if (event.type === "token") { process.stdout.write(event.value); } } // Check telemetry for continuation usage if (result.telemetry?.continuation?.used) { console.log( "\nResumed from checkpoint of length:", result.telemetry.continuation.checkpointLength, ); } ``` ### Checkpoint Validation Before using a checkpoint for continuation, L0 validates it: - **Guardrails**: All configured guardrails are run against the checkpoint content - **Drift Detection**: If enabled, checks for format drift in the checkpoint - **Fatal Violations**: If any guardrail returns a fatal violation, the checkpoint is discarded and retry starts fresh ### Important Limitations > ⚠️ **Do NOT use `continueFromLastKnownGoodToken` with structured output or `streamObject()`.** > > Continuation works by prepending checkpoint content to the next generation. For JSON/structured output, this can corrupt the data structure because: > > - The model may not properly continue the JSON syntax > - Partial objects could result in invalid JSON > - Schema validation may fail on malformed output > > For structured output, let L0 retry from scratch to ensure valid JSON. ```typescript // ✅ GOOD - Text generation with continuation const result = await l0({ stream: () => streamText({ model, prompt: "Write an essay..." }), continueFromLastKnownGoodToken: true, }); // ❌ BAD - Do NOT use with structured output const result = await structured({ schema: mySchema, stream: () => streamText({ model, prompt }), continueFromLastKnownGoodToken: true, // DON'T DO THIS }); ``` --- ## Guardrails Pure functions that validate streaming output without rewriting it: ```typescript import { jsonRule, markdownRule, zeroOutputRule, patternRule, customPatternRule, } from "@ai2070/l0"; const result = await l0({ stream: () => streamText({ model, prompt }), guardrails: [ jsonRule(), // Validates JSON structure markdownRule(), // Validates Markdown fences/tables zeroOutputRule(), // Detects empty output patternRule(), // Detects "As an AI..." patterns customPatternRule([/forbidden/i], "Custom violation"), ], }); ``` ### Presets ```typescript import { minimalGuardrails, // jsonRule, zeroOutputRule recommendedGuardrails, // jsonRule, markdownRule, zeroOutputRule, patternRule strictGuardrails, // jsonRule, markdownRule, latexRule, patternRule, zeroOutputRule jsonOnlyGuardrails, // jsonRule, zeroOutputRule markdownOnlyGuardrails, // markdownRule, zeroOutputRule latexOnlyGuardrails, // latexRule, zeroOutputRule } from "@ai2070/l0"; ``` | Preset | Rules Included | | ------------------------ | ------------------------------------------------------------ | | `minimalGuardrails` | `jsonRule`, `zeroOutputRule` | | `recommendedGuardrails` | `jsonRule`, `markdownRule`, `zeroOutputRule`, `patternRule` | | `strictGuardrails` | `jsonRule`, `markdownRule`, `latexRule`, `patternRule`, `zeroOutputRule` | ### Fast/Slow Path Execution L0 uses a two-path strategy to avoid blocking the streaming loop: | Path | When | Behavior | | -------- | ------------------------ | ------------------------------------------- | | **Fast** | Delta < 1KB, total < 5KB | Synchronous check, immediate result | | **Slow** | Large content | Deferred via `setImmediate()`, non-blocking | For long outputs, tune the check frequency: ```typescript await l0({ stream, guardrails: recommendedGuardrails, checkIntervals: { guardrails: 50, // Check every 50 tokens (default: 5) }, }); ``` See [GUARDRAILS.md](./GUARDRAILS.md) for full documentation. --- ## Consensus Multi-generation consensus for high-confidence results: ```typescript import { consensus } from "@ai2070/l0"; const result = await consensus({ streams: [ () => streamText({ model, prompt }), () => streamText({ model, prompt }), () => streamText({ model, prompt }), ], strategy: "majority", // or "unanimous", "weighted", "best" threshold: 0.8, }); console.log(result.consensus); // Agreed output console.log(result.confidence); // 0-1 confidence score console.log(result.agreements); // What they agreed on console.log(result.disagreements); // Where they differed ``` --- ## Parallel Operations Run multiple LLM calls concurrently with different patterns: ### Race - First Response Wins ```typescript import { race } from "@ai2070/l0"; const result = await race([ { stream: () => streamText({ model: openai("gpt-4o"), prompt }) }, { stream: () => streamText({ model: anthropic("claude-3-opus"), prompt }) }, { stream: () => streamText({ model: google("gemini-pro"), prompt }) }, ]); // Returns first successful response, cancels others console.log(result.winnerIndex); // 0-based index of winning stream console.log(result.state.content); // Content from winning stream ``` ### Parallel with Concurrency Control ```typescript import { parallel } from "@ai2070/l0"; const results = await parallel( [ { stream: () => streamText({ model, prompt: "Task 1" }) }, { stream: () => streamText({ model, prompt: "Task 2" }) }, { stream: () => streamText({ model, prompt: "Task 3" }) }, ], { concurrency: 2, // Max 2 concurrent failFast: false, // Continue on errors }, ); console.log(results.successCount); console.log(results.results[0]?.state.content); ``` ### Fall-Through vs Race | Pattern | Execution | Cost | Best For | | ------------ | --------------------------- | ------------------ | --------------------------------- | | Fall-through | Sequential, next on failure | Low (pay for 1) | High availability, cost-sensitive | | Race | Parallel, first wins | High (pay for all) | Low latency, speed-critical | ```typescript // Fall-through: Try models sequentially const result = await l0({ stream: () => streamText({ model: openai("gpt-4o"), prompt }), fallbackStreams: [ () => streamText({ model: openai("gpt-5-nano"), prompt }), () => streamText({ model: anthropic("claude-3-haiku"), prompt }), ], }); // Race: All models simultaneously, first wins const result = await race([ { stream: () => streamText({ model: openai("gpt-4o"), prompt }) }, { stream: () => streamText({ model: anthropic("claude-3-opus"), prompt }) }, ]); ``` ### Operation Pool For dynamic workloads, use `OperationPool` to process operations with a shared concurrency limit: ```typescript import { createPool } from "@ai2070/l0"; const pool = createPool(3); // Max 3 concurrent operations // Add operations dynamically const result1 = pool.execute({ stream: () => streamText({ model, prompt: "Task 1" }) }); const result2 = pool.execute({ stream: () => streamText({ model, prompt: "Task 2" }) }); // Wait for all operations to complete await pool.drain(); // Pool methods pool.getQueueLength(); // Pending operations pool.getActiveWorkers(); // Currently executing ``` --- ## Type-Safe Generics All L0 functions support generic type parameters to forward your output types: ```typescript import { l0, parallel, race, consensus } from "@ai2070/l0"; // Typed output (compile-time type annotation) interface UserProfile { name: string; age: number; email: string; } const result = await l0<UserProfile>({ stream: () => streamText({ model, prompt }), }); // result is L0Result<UserProfile> - generic enables type inference in callbacks // Works with all parallel operations const raceResult = await race<UserProfile>([ { stream: () => streamText({ model: openai("gpt-4o"), prompt }) }, { stream: () => streamText({ model: anthropic("claude-3-opus"), prompt }) }, ]); const parallelResults = await parallel<UserProfile>(operations); // parallelResults.results[0]?.state is typed // Consensus with type inference const consensusResult = await consensus<typeof schema>({ streams: [stream1, stream2, stream3], schema, }); ``` --- ## Custom Adapters (BYOA) L0 supports custom adapters for integrating any LLM provider. Built-in adapters include `openaiAdapter`, `mastraAdapter`, and `anthropicAdapter` (reference implementation). ### Explicit Adapter Usage ```typescript import { l0, openaiAdapter } from "@ai2070/l0"; import OpenAI from "openai"; const openai = new OpenAI(); const result = await l0({ stream: () => openai.chat.completions.create({ model: "gpt-4", messages: [{ role: "user", content: "Hello!" }], stream: true, }), adapter: openaiAdapter, }); ``` ### Building Custom Adapters ```typescript import { toL0Events, type L0Adapter } from "@ai2070/l0"; interface MyChunk { text?: string; } const myAdapter: L0Adapter<AsyncIterable<MyChunk>> = { name: "myai", // Optional: Enable auto-detection detect(input): input is AsyncIterable<MyChunk> { return !!input && typeof input === "object" && "__myMarker" in input; }, // Convert provider stream to L0 events wrap(stream) { return toL0Events(stream, (chunk) => chunk.text ?? null); }, }; ``` ### Adapter Invariants Adapters MUST: - Preserve text exactly (no trimming, no modification) - Include timestamps on every event - Convert errors to error events (never throw) - Emit complete event exactly once at end See [CUSTOM_ADAPTERS.md](./CUSTOM_ADAPTERS.md) for complete guide including helper functions, registry API, and testing patterns. --- ## Multimodal Support L0 supports image, audio, and video generation with progress tracking and data events: ```typescript import { l0, toMultimodalL0Events, type L0Adapter } from "@ai2070/l0"; const fluxAdapter: L0Adapter<FluxStream> = { name: "flux", wrap: (stream) => toMultimodalL0Events(stream, { extractProgress: (chunk) => chunk.type === "progress" ? { percent: chunk.percent } : null, extractData: (chunk) => chunk.type === "image" ? { contentType: "image", mimeType: "image/png", base64: chunk.image, metadata: { width: chunk.width, height: chunk.height, seed: chunk.seed, }, } : null, }), }; const result = await l0({ stream: () => fluxGenerate({ prompt: "A cat in space" }), adapter: fluxAdapter, }); for await (const event of result.stream) { if (event.type === "progress") console.log(`${event.progress?.percent}%`); if (event.type === "data") saveImage(event.data?.base64); } // All generated images available in state console.log(result.state.dataOutputs); ``` See [MULTIMODAL.md](./MULTIMODAL.md) for complete guide. --- ## Lifecycle Callbacks L0 provides callbacks for every phase of stream execution, giving you full observability into the streaming lifecycle: ```typescript const result = await l0({ stream: () => streamText({ model, prompt }), fallbackStreams: [() => streamText({ model: fallbackModel, prompt })], guardrails: recommendedGuardrails, continueFromLastKnownGoodToken: true, retry: { attempts: 3 }, // Called when a new execution attempt begins onStart: (attempt, isRetry, isFallback) => { console.log(`Starting attempt ${attempt}`); if (isRetry) console.log(" (retry)"); if (isFallback) console.log(" (fallback model)"); }, // Called when stream completes successfully onComplete: (state) => { console.log(`Completed with ${state.tokenCount} tokens`); console.log(`Duration: ${state.duration}ms`); }, // Called when an error occurs (before retry/fallback decision) onError: (error, willRetry, willFallback) => { console.error(`Error: ${error.message}`); if (willRetry) console.log(" Will retry..."); if (willFallback) console.log(" Will try fallback..."); }, // Called for every L0 event onEvent: (event) => { if (event.type === "token") { process.stdout.write(event.value || ""); } }, // Called when a guardrail violation is detected onViolation: (violation) => { console.warn(`Violation: ${violation.rule}`); console.warn(` ${violation.message}`); }, // Called when a retry is triggered onRetry: (attempt, reason) => { console.log(`Retrying (attempt ${attempt}): ${reason}`); }, // Called when switching to a fallback model onFallback: (index, reason) => { console.log(`Switching to fallback ${index}: ${reason}`); }, // Called when resuming from checkpoint onResume: (checkpoint, tokenCount) => { console.log(`Resuming from checkpoint (${tokenCount} tokens)`); }, // Called when a checkpoint is saved onCheckpoint: (checkpoint, tokenCount) => { console.log(`Checkpoint saved (${tokenCount} tokens)`); }, // Called when a timeout occurs onTimeout: (type, elapsedMs) => { console.log(`Timeout: ${type} after ${elapsedMs}ms`); }, // Called when the stream is aborted onAbort: (tokenCount, contentLength) => { console.log(`Aborted after ${tokenCount} tokens (${contentLength} chars)`); }, // Called when drift is detected onDrift: (types, confidence) => { console.log( `Drift detected: ${types.join(", ")} (confidence: ${confidence})`, ); }, // Called when a tool call is detected onToolCall: (toolName, toolCallId, args) => { console.log(`Tool call: ${toolName} (${toolCallId})`); console.log(` Args: ${JSON.stringify(args)}`); }, }); ``` ## Deterministic Lifecycle Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ L0 LIFECYCLE FLOW │ └─────────────────────────────────────────────────────────────────────────────┘ ┌──────────┐ │ START │ └────┬─────┘ │ ▼ ┌──────────────────────────────┐ │ onStart(attempt, false, false) │ └──────────────┬───────────────┘ │ ▼ ┌────────────────────────────────────────────────────────────────────────────┐ │ STREAMING PHASE │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ onEvent(event) │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ During streaming, these callbacks fire as conditions occur: │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ onCheckpoint │ │ onToolCall │ │ onDrift │ │ onTimeout │ │ │ │ (checkpoint, │ │ (toolName, │ │ (types, │ │ (type, │ │ │ │ tokenCount) │ │ id, args) │ │ confidence) │ │ elapsedMs) │ │ │ └──────────────┘ └──────────────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ └────────┬─────────┘ │ │ │ triggers retry │ └──────────────────────────────────────────────────────┼─────────────────────┘ │ ┌────────────────────────────────────────┼────────────────┐ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐ │ SUCCESS │ │ ERROR │ │VIOLATION │ │ ABORT │ └────┬────┘ └─────┬─────┘ └────┬─────┘ └────┬────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────┐ ┌───────────┐ │ │ │ onViolation │ │ onAbort │ │ │ └──────┬──────┘ │(tokenCount│ │ │ │ │ contentLen)│ │ ▼ ▼ └───────────┘ │ ┌────────────────────────────────┐ │ │ onError(error, willRetry, │ │ │ willFallback) │ │ └──────────────┬─────────────────┘ │ │ │ ┌───────────┼───────────┐ │ │ │ │ │ ▼ ▼ ▼ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ RETRY │ │ FALLBACK │ │ FATAL │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ ▼ ▼ │ │ ┌───────────┐ ┌───────────┐ │ │ │ onRetry() │ │onFallback │ │ │ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ │ ┌────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────────────┐ │ │ │ Has checkpoint? │ │ │ └──────────┬──────────┘ │ │ YES │ NO │ │ ┌────┴────┐ │ │ ▼ ▼ │ │ ┌──────────┐ │ │ │ │ onResume │ │ │ │ └────┬─────┘ │ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────────────────┐ │ │ │onStart(attempt, isRetry,│ │ │ │ isFallback) │─────┼──► Back to STREAMING │ └─────────────────────────┘ │ │ │ ▼ ▼ ┌─────────────┐ ┌──────────┐ │ onComplete │ │ THROW │ │ (state) │ │ ERROR │ └─────────────┘ └──────────┘ ``` ### Callback Reference | Callback | When Called | Signature | | -------------- | -------------------------------------- | ------------------------------------------------------------------------------- | | `onStart