UNPKG

mixpart

Version:

High-performance streaming multipart/mixed parser for Node.js

310 lines (221 loc) 9.83 kB
# mixpart High-performance streaming multipart/mixed parser for Node.js applications. ## Features - **True Streaming Processing**: Parses multipart messages as they arrive and yields them immediately - **Zero Memory Buffering**: Each message payload is a ReadableStream - no buffering of entire responses - **Memory Safety**: Built-in protection against unbounded memory growth from malformed data - **Memory Efficient**: Uses async generators and streams for optimal memory usage even with large payloads - **High Performance**: Optimized search algorithms using Node.js Buffer.indexOf for fast boundary detection - **Robust Error Handling**: Graceful handling of malformed parts and network errors - **HTTP-Compliant Headers**: Proper ISO-8859-1 header decoding per HTTP standards - **Generic API**: Returns all headers and streaming payload without assumptions about content - **TypeScript Support**: Full TypeScript support with comprehensive type definitions ## Installation This is a private workspace package used internally by other packages in this monorepo. ## Usage ### Basic Usage ```typescript import { parseMultipartStream, MultipartMessage } from "mixpart"; // Parse a multipart/mixed response const response = await fetch("https://api.example.com/multipart-endpoint"); for await (const message of parseMultipartStream(response)) { console.log("Headers:", Array.from(message.headers.entries())); // Access specific headers const contentType = message.headers.get("content-type"); const customHeader = message.headers.get("x-custom-header"); console.log("Content-Type:", contentType); // Stream the payload - no buffering in memory const reader = message.payload.getReader(); while (true) { const { done, value } = await reader.read(); if (done) break; console.log("Received chunk:", value.length, "bytes"); // Process each chunk immediately without buffering } } ``` ### Memory Safety Configuration The parser includes configurable safety limits to prevent unbounded memory growth: ```typescript import { parseMultipartStream, ParserOptions } from "mixpart"; const options: ParserOptions = { maxHeaderSize: 128000, // 128KB header limit (default: 64KB) maxBoundaryBuffer: 16384, // 16KB boundary buffer (default: 8KB) }; for await (const message of parseMultipartStream(response, options)) { // Process messages safely } ``` **Safety Limits:** - **Header Limit**: Prevents malformed headers from consuming unlimited memory - **Boundary Buffer Limit**: Prevents fake partial boundaries from accumulating - **Automatic Detection**: Throws descriptive errors when limits are exceeded ### Processing Text/JSON Payloads For convenience when you need the complete payload: ```typescript import { parseMultipartStream } from "mixpart"; async function readStreamToText( stream: ReadableStream<Uint8Array>, ): Promise<string> { const reader = stream.getReader(); const chunks: Uint8Array[] = []; while (true) { const { done, value } = await reader.read(); if (done) break; chunks.push(value); } const totalLength = chunks.reduce((sum, chunk) => sum + chunk.length, 0); const result = new Uint8Array(totalLength); let offset = 0; for (const chunk of chunks) { result.set(chunk, offset); offset += chunk.length; } return new TextDecoder().decode(result); } for await (const message of parseMultipartStream(response)) { if (message.headers.get("content-type") === "application/json") { const jsonText = await readStreamToText(message.payload); const data = JSON.parse(jsonText); console.log("JSON data:", data); } } ``` ### Header Encoding Support The parser properly handles **HTTP-compliant** header encoding using ISO-8859-1, which is the standard encoding for HTTP headers. **Features:** - **ISO-8859-1 Decoding**: Headers are decoded using ISO-8859-1 for proper HTTP compliance - **Full Byte Range Support**: Can handle any byte sequence (0-255) in header values - **Standards Compliant**: Follows HTTP/1.1 specification for header encoding - **Backwards Compatibility**: ASCII headers continue to work exactly as before **Example:** ```typescript import { parseMultipartStream } from "mixpart"; // ISO-8859-1 headers are automatically decoded for await (const message of parseMultipartStream(response)) { // Headers with accented characters work properly const subject = message.headers.get("subject"); // "Hello World" const name = message.headers.get("x-name"); // "José" or "Café" console.log("Subject:", subject); console.log("Name:", name); } ``` **Standards Compliance:** Per RFC 7230, HTTP header values should use ISO-8859-1 encoding: 1. Headers are decoded using ISO-8859-1 for proper byte-to-character mapping 2. Any byte sequence (0-255) is valid and will decode properly 3. Maintains compatibility with the Headers Web API requirements 4. Never fails due to encoding issues This ensures robust parsing of all HTTP-compliant multipart content. ### Application-Specific Usage Since this is a generic parser, applications can extract their own specific headers: ```typescript import { parseMultipartStream } from "mixpart"; const response = await fetch("https://api.example.com/messages", { headers: { Accept: "multipart/mixed", Authorization: "Bearer TOKEN", }, }); for await (const part of parseMultipartStream(response)) { // Extract application-specific headers using Headers API const messageId = part.headers.get("x-message-id"); const timestamp = part.headers.get("x-timestamp"); const contentType = part.headers.get("content-type"); if (messageId && timestamp) { console.log("Message:", { id: messageId, timestamp: timestamp, contentType: contentType, }); // Stream process the payload without loading it all into memory const reader = part.payload.getReader(); const chunks: Uint8Array[] = []; while (true) { const { done, value } = await reader.read(); if (done) break; chunks.push(value); } // Only combine chunks when ready to process const totalLength = chunks.reduce((sum, chunk) => sum + chunk.length, 0); const fullPayload = new Uint8Array(totalLength); let offset = 0; for (const chunk of chunks) { fullPayload.set(chunk, offset); offset += chunk.length; } if (contentType === "application/json") { const messageData = JSON.parse(new TextDecoder().decode(fullPayload)); console.log("JSON data:", messageData); } } } ``` ## API Reference ### `parseMultipartStream(response: Response, options?: ParserOptions)` Returns an async generator that yields `MultipartMessage` objects as they are parsed from the stream. **Parameters:** - `response`: A `Response` object with a multipart/mixed body - `options`: Optional configuration for memory safety limits **Returns:** - `AsyncGenerator<MultipartMessage, void, unknown>` ### `ParserOptions` Configuration interface for memory safety: ```typescript interface ParserOptions { maxHeaderSize?: number; // Maximum header buffer size (default: 64KB) maxBoundaryBuffer?: number; // Maximum boundary buffer size (default: 8KB) } ``` ### `MultipartMessage` Interface representing a parsed multipart message: ```typescript interface MultipartMessage { headers: Headers; // Proper Headers object with get(), has(), etc. payload: ReadableStream<Uint8Array>; // Streaming payload - no buffering } ``` **Key Benefits:** - **No Memory Buffering**: The payload is streamed, not loaded into memory - **Immediate Processing**: Messages are yielded as soon as headers are parsed - **Scalable**: Can handle arbitrarily large payloads without memory issues - **Memory Safe**: Protected against malformed data attacks ### `extractBoundary(contentType: string)` Utility function to extract the boundary parameter from a Content-Type header. **Parameters:** - `contentType`: Content-Type header value **Returns:** - `string`: The boundary string ### `MultipartParseError` Error class thrown when multipart parsing fails: ```typescript class MultipartParseError extends Error { name: "MultipartParseError"; } ``` ## Performance Characteristics - **Memory Usage**: O(boundary_length + chunk_size) - only buffers incomplete boundaries/headers - **Processing Speed**: ~50% faster than pure JavaScript implementations via native Buffer operations - **Throughput**: Can handle high-volume message streams with minimal latency - **Streaming**: True streaming - messages yielded immediately, payloads never buffered - **Memory Safety**: Protected against unbounded growth with configurable limits - **Error Recovery**: Continues processing after individual part failures ## Streaming Architecture The parser uses a sophisticated streaming architecture with memory safety: 1. **Immediate Yielding**: Messages are yielded as soon as headers are parsed 2. **Streaming Payloads**: Each message payload is a `ReadableStream<Uint8Array>` 3. **Zero Buffering**: Never buffers complete payloads in memory 4. **Chunk Processing**: Processes data in chunks as they arrive from the network 5. **Memory Limits**: Only buffers incomplete boundaries/headers with strict size limits 6. **Safety Guards**: Automatic detection and prevention of memory exhaustion attacks **Memory Safety Features:** - Headers limited to 64KB by default (configurable) - Boundary buffers limited to 8KB by default (configurable) - Descriptive error messages for debugging malformed data - Protection against malicious multipart streams This design allows processing of arbitrarily large multipart responses without memory constraints while protecting against malformed data. ## Requirements - Node.js >= 22.0.0 - TypeScript >= 5.0.0 (for development) ## License MIT