mixpart
Version:
High-performance streaming multipart/mixed parser for Node.js
310 lines (221 loc) • 9.83 kB
Markdown
# mixpart
High-performance streaming multipart/mixed parser for Node.js applications.
## Features
- **True Streaming Processing**: Parses multipart messages as they arrive and yields them immediately
- **Zero Memory Buffering**: Each message payload is a ReadableStream - no buffering of entire responses
- **Memory Safety**: Built-in protection against unbounded memory growth from malformed data
- **Memory Efficient**: Uses async generators and streams for optimal memory usage even with large payloads
- **High Performance**: Optimized search algorithms using Node.js Buffer.indexOf for fast boundary detection
- **Robust Error Handling**: Graceful handling of malformed parts and network errors
- **HTTP-Compliant Headers**: Proper ISO-8859-1 header decoding per HTTP standards
- **Generic API**: Returns all headers and streaming payload without assumptions about content
- **TypeScript Support**: Full TypeScript support with comprehensive type definitions
## Installation
This is a private workspace package used internally by other packages in this monorepo.
## Usage
### Basic Usage
```typescript
import { parseMultipartStream, MultipartMessage } from "mixpart";
// Parse a multipart/mixed response
const response = await fetch("https://api.example.com/multipart-endpoint");
for await (const message of parseMultipartStream(response)) {
console.log("Headers:", Array.from(message.headers.entries()));
// Access specific headers
const contentType = message.headers.get("content-type");
const customHeader = message.headers.get("x-custom-header");
console.log("Content-Type:", contentType);
// Stream the payload - no buffering in memory
const reader = message.payload.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log("Received chunk:", value.length, "bytes");
// Process each chunk immediately without buffering
}
}
```
### Memory Safety Configuration
The parser includes configurable safety limits to prevent unbounded memory growth:
```typescript
import { parseMultipartStream, ParserOptions } from "mixpart";
const options: ParserOptions = {
maxHeaderSize: 128000, // 128KB header limit (default: 64KB)
maxBoundaryBuffer: 16384, // 16KB boundary buffer (default: 8KB)
};
for await (const message of parseMultipartStream(response, options)) {
// Process messages safely
}
```
**Safety Limits:**
- **Header Limit**: Prevents malformed headers from consuming unlimited memory
- **Boundary Buffer Limit**: Prevents fake partial boundaries from accumulating
- **Automatic Detection**: Throws descriptive errors when limits are exceeded
### Processing Text/JSON Payloads
For convenience when you need the complete payload:
```typescript
import { parseMultipartStream } from "mixpart";
async function readStreamToText(
stream: ReadableStream<Uint8Array>,
): Promise<string> {
const reader = stream.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const totalLength = chunks.reduce((sum, chunk) => sum + chunk.length, 0);
const result = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of chunks) {
result.set(chunk, offset);
offset += chunk.length;
}
return new TextDecoder().decode(result);
}
for await (const message of parseMultipartStream(response)) {
if (message.headers.get("content-type") === "application/json") {
const jsonText = await readStreamToText(message.payload);
const data = JSON.parse(jsonText);
console.log("JSON data:", data);
}
}
```
### Header Encoding Support
The parser properly handles **HTTP-compliant** header encoding using ISO-8859-1, which is the standard encoding for HTTP headers.
**Features:**
- **ISO-8859-1 Decoding**: Headers are decoded using ISO-8859-1 for proper HTTP compliance
- **Full Byte Range Support**: Can handle any byte sequence (0-255) in header values
- **Standards Compliant**: Follows HTTP/1.1 specification for header encoding
- **Backwards Compatibility**: ASCII headers continue to work exactly as before
**Example:**
```typescript
import { parseMultipartStream } from "mixpart";
// ISO-8859-1 headers are automatically decoded
for await (const message of parseMultipartStream(response)) {
// Headers with accented characters work properly
const subject = message.headers.get("subject"); // "Hello World"
const name = message.headers.get("x-name"); // "José" or "Café"
console.log("Subject:", subject);
console.log("Name:", name);
}
```
**Standards Compliance:**
Per RFC 7230, HTTP header values should use ISO-8859-1 encoding:
1. Headers are decoded using ISO-8859-1 for proper byte-to-character mapping
2. Any byte sequence (0-255) is valid and will decode properly
3. Maintains compatibility with the Headers Web API requirements
4. Never fails due to encoding issues
This ensures robust parsing of all HTTP-compliant multipart content.
### Application-Specific Usage
Since this is a generic parser, applications can extract their own specific headers:
```typescript
import { parseMultipartStream } from "mixpart";
const response = await fetch("https://api.example.com/messages", {
headers: {
Accept: "multipart/mixed",
Authorization: "Bearer TOKEN",
},
});
for await (const part of parseMultipartStream(response)) {
// Extract application-specific headers using Headers API
const messageId = part.headers.get("x-message-id");
const timestamp = part.headers.get("x-timestamp");
const contentType = part.headers.get("content-type");
if (messageId && timestamp) {
console.log("Message:", {
id: messageId,
timestamp: timestamp,
contentType: contentType,
});
// Stream process the payload without loading it all into memory
const reader = part.payload.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
// Only combine chunks when ready to process
const totalLength = chunks.reduce((sum, chunk) => sum + chunk.length, 0);
const fullPayload = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of chunks) {
fullPayload.set(chunk, offset);
offset += chunk.length;
}
if (contentType === "application/json") {
const messageData = JSON.parse(new TextDecoder().decode(fullPayload));
console.log("JSON data:", messageData);
}
}
}
```
## API Reference
### `parseMultipartStream(response: Response, options?: ParserOptions)`
Returns an async generator that yields `MultipartMessage` objects as they are parsed from the stream.
**Parameters:**
- `response`: A `Response` object with a multipart/mixed body
- `options`: Optional configuration for memory safety limits
**Returns:**
- `AsyncGenerator<MultipartMessage, void, unknown>`
### `ParserOptions`
Configuration interface for memory safety:
```typescript
interface ParserOptions {
maxHeaderSize?: number; // Maximum header buffer size (default: 64KB)
maxBoundaryBuffer?: number; // Maximum boundary buffer size (default: 8KB)
}
```
### `MultipartMessage`
Interface representing a parsed multipart message:
```typescript
interface MultipartMessage {
headers: Headers; // Proper Headers object with get(), has(), etc.
payload: ReadableStream<Uint8Array>; // Streaming payload - no buffering
}
```
**Key Benefits:**
- **No Memory Buffering**: The payload is streamed, not loaded into memory
- **Immediate Processing**: Messages are yielded as soon as headers are parsed
- **Scalable**: Can handle arbitrarily large payloads without memory issues
- **Memory Safe**: Protected against malformed data attacks
### `extractBoundary(contentType: string)`
Utility function to extract the boundary parameter from a Content-Type header.
**Parameters:**
- `contentType`: Content-Type header value
**Returns:**
- `string`: The boundary string
### `MultipartParseError`
Error class thrown when multipart parsing fails:
```typescript
class MultipartParseError extends Error {
name: "MultipartParseError";
}
```
## Performance Characteristics
- **Memory Usage**: O(boundary_length + chunk_size) - only buffers incomplete boundaries/headers
- **Processing Speed**: ~50% faster than pure JavaScript implementations via native Buffer operations
- **Throughput**: Can handle high-volume message streams with minimal latency
- **Streaming**: True streaming - messages yielded immediately, payloads never buffered
- **Memory Safety**: Protected against unbounded growth with configurable limits
- **Error Recovery**: Continues processing after individual part failures
## Streaming Architecture
The parser uses a sophisticated streaming architecture with memory safety:
1. **Immediate Yielding**: Messages are yielded as soon as headers are parsed
2. **Streaming Payloads**: Each message payload is a `ReadableStream<Uint8Array>`
3. **Zero Buffering**: Never buffers complete payloads in memory
4. **Chunk Processing**: Processes data in chunks as they arrive from the network
5. **Memory Limits**: Only buffers incomplete boundaries/headers with strict size limits
6. **Safety Guards**: Automatic detection and prevention of memory exhaustion attacks
**Memory Safety Features:**
- Headers limited to 64KB by default (configurable)
- Boundary buffers limited to 8KB by default (configurable)
- Descriptive error messages for debugging malformed data
- Protection against malicious multipart streams
This design allows processing of arbitrarily large multipart responses without memory constraints while protecting against malformed data.
## Requirements
- Node.js >= 22.0.0
- TypeScript >= 5.0.0 (for development)
## License
MIT