UNPKG

pompelmi

Version:

RFI-safe file uploads for Node.js — Express/Koa/Next.js middleware with deep ZIP inspection, MIME/size checks, and optional YARA scanning.

611 lines (485 loc) 20.7 kB
<p align="center"> <a href="https://www.producthunt.com/products/pompelmi?embed=true&utm_source=badge-featured&utm_medium=badge&utm_source=badge-pompelmi" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=1010722&theme=light&t=1756653468504" alt="pompelmi - free&#0044;&#0032;open&#0045;source&#0032;file&#0032;scanner | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a> <br/> <a href="https://github.com/pompelmi/pompelmi" target="_blank" rel="noopener noreferrer"> <img src="https://raw.githubusercontent.com/pompelmi/pompelmi/refs/heads/main/assets/logo.svg" alt="pompelmi logo" width="360" height="280" /> </a> <br/> <a href="https://www.detectionengineering.net/p/det-eng-weekly-issue-124-the-defcon"><img alt="Featured in Detection Engineering Weekly #124" src="https://img.shields.io/badge/featured-Detection%20Engineering%20Weekly-0A84FF?logo=substack"></a> <br/> </p> <h1 align="center">pompelmi</h1> <p align="center"><strong>Fast file‑upload malware scanning for Node.js</strong> — optional <strong>YARA</strong> integration, ZIP deep‑inspection, and drop‑in adapters for <em>Express</em>, <em>Koa</em>, and <em>Next.js</em>. Private by design. Typed. Tiny.</p> <p align="center"> <a href="https://www.npmjs.com/package/pompelmi"><img alt="npm version" src="https://img.shields.io/npm/v/pompelmi?label=pompelmi&color=0a7ea4"></a> <a href="https://www.npmjs.com/package/pompelmi"><img alt="npm downloads" src="https://img.shields.io/npm/dm/pompelmi?label=downloads&color=6E9F18"></a> <img alt="node" src="https://img.shields.io/badge/node-%3E%3D18-339933?logo=node.js&logoColor=white"> <img alt="types" src="https://img.shields.io/badge/types-TypeScript-3178C6?logo=typescript&logoColor=white"> <a href="https://github.com/pompelmi/pompelmi/blob/main/LICENSE"><img alt="license" src="https://img.shields.io/npm/l/pompelmi"></a> <a href="https://codecov.io/gh/pompelmi/pompelmi"><img alt="codecov" src="https://codecov.io/gh/pompelmi/pompelmi/branch/main/graph/badge.svg?flag=core"/></a> <a href="https://github.com/pompelmi/pompelmi/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/pompelmi/pompelmi?style=social"></a> <a href="https://github.com/pompelmi/pompelmi/actions/workflows/ci-release-publish.yml"><img alt="CI / Release / Publish" src="https://img.shields.io/github/actions/workflow/status/pompelmi/pompelmi/ci-release-publish.yml?branch=main&label=CI%20%2F%20Release%20%2F%20Publish"></a> <a href="https://github.com/pompelmi/pompelmi/issues"><img alt="open issues" src="https://img.shields.io/github/issues/pompelmi/pompelmi"></a> <img alt="PRs welcome" src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg"> </p> <p align="center"><em>Coverage badge reflects core library (<code>src/**</code>); adapters are measured separately.</em></p> <p align="center"><a href="https://pompelmi.github.io/pompelmi/">Documentation</a> · <a href="#installation">Install</a> · <a href="#quick-start">Quick‑start</a> · <a href="#github-action">GitHub Action</a> · <a href="#adapters">Adapters</a> · <a href="#diagrams">Diagrams</a> · <a href="#configuration">Config</a> · <a href="#production-checklist">Production checklist</a> · <a href="#yara-getting-started">YARA</a> · <a href="#quick-test-eicar">Quick test</a> · <a href="#security-notes">Security</a> · <a href="#faq">FAQ</a> </p> --- ## Overview **pompelmi** scans untrusted file uploads **before** they hit disk. A tiny, TypeScript-first toolkit for Node.js with composable scanners, deep ZIP inspection, and optional signature engines. - **Private by design** — no outbound calls; bytes never leave your process - **Composable scanners** — mix heuristics + signatures; set `stopOn` and timeouts - **ZIP hardening** — traversal/bomb guards, polyglot & macro hints - **Drop-in adapters** — Express, Koa, Fastify, Next.js - **Typed & tiny** — modern TS, minimal surface ## Highlights - **Block risky uploads early** — classify uploads as _clean_, _suspicious_, or _malicious_ and stop them at the edge. - **Real guards** — extension allow‑list, server‑side MIME sniff (magic bytes), per‑file size caps, and **deep ZIP** traversal with anti‑bomb limits. - **Built‑in scanners** — drop‑in **CommonHeuristicsScanner** (PDF risky actions, Office macros, PE header) and **Zip‑bomb Guard**; add your own or YARA via a tiny `{ scan(bytes) }` contract. - **Compose scanning** — run multiple scanners in parallel or sequentially with timeouts and short‑circuiting via `composeScanners()`. - **Zero cloud** — scans run in‑process. Keep bytes private. - **DX first** — TypeScript types, ESM/CJS builds, tiny API, adapters for popular web frameworks. > Keywords: file upload security, malware scanning, YARA, Node.js, Express, Koa, Next.js, ZIP scanning, ZIP bomb, PDF JavaScript, Office macros --- ## Installation ```bash # core library npm i pompelmi # or pnpm add pompelmi # or yarn add pompelmi ``` > Optional dev deps used in the examples: > > ```bash > npm i -D tsx express multer @koa/router @koa/multer koa next > ``` --- ## Quick‑start **At a glance (policy + scanners)** ```ts // Compose built‑in scanners (no EICAR). Optionally add your own/YARA. import { CommonHeuristicsScanner, createZipBombGuard, composeScanners } from 'pompelmi'; export const policy = { includeExtensions: ['zip','png','jpg','jpeg','pdf'], allowedMimeTypes: ['application/zip','image/png','image/jpeg','application/pdf','text/plain'], maxFileSizeBytes: 20 * 1024 * 1024, timeoutMs: 5000, concurrency: 4, failClosed: true, onScanEvent: (ev: unknown) => console.log('[scan]', ev) }; export const scanner = composeScanners( [ ['zipGuard', createZipBombGuard({ maxEntries: 512, maxTotalUncompressedBytes: 100 * 1024 * 1024, maxCompressionRatio: 12 })], ['heuristics', CommonHeuristicsScanner], // ['yara', YourYaraScanner], ], { parallel: false, stopOn: 'suspicious', timeoutMsPerScanner: 1500, tagSourceName: true } ); ``` ### Express ```ts import express from 'express'; import multer from 'multer'; import { createUploadGuard } from '@pompelmi/express-middleware'; import { policy, scanner } from './security'; // the snippet above const app = express(); const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: policy.maxFileSizeBytes } }); app.post('/upload', upload.any(), createUploadGuard({ ...policy, scanner }), (req, res) => { res.json({ ok: true, scan: (req as any).pompelmi ?? null }); }); app.listen(3000, () => console.log('http://localhost:3000')); ``` ### Koa ```ts import Koa from 'koa'; import Router from '@koa/router'; import multer from '@koa/multer'; import { createKoaUploadGuard } from '@pompelmi/koa-middleware'; import { policy, scanner } from './security'; const app = new Koa(); const router = new Router(); const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: policy.maxFileSizeBytes } }); router.post('/upload', upload.any(), createKoaUploadGuard({ ...policy, scanner }), (ctx) => { ctx.body = { ok: true, scan: (ctx as any).pompelmi ?? null }; }); app.use(router.routes()).use(router.allowedMethods()); app.listen(3003, () => console.log('http://localhost:3003')); ``` ### Next.js (App Router) ```ts // app/api/upload/route.ts import { createNextUploadHandler } from '@pompelmi/next-upload'; import { policy, scanner } from '@/lib/security'; export const runtime = 'nodejs'; export const dynamic = 'force-dynamic'; export const POST = createNextUploadHandler({ ...policy, scanner }); ``` --- ## GitHub Action Run **pompelmi** in CI to scan repository files or built artifacts. **Minimal usage** ```yaml name: Security scan (pompelmi) on: [push, pull_request] jobs: scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Scan repository with pompelmi uses: pompelmi/pompelmi/.github/actions/pompelmi-scan@v1 with: path: . deep_zip: true fail_on_detect: true ``` **Scan a single artifact** ```yaml - uses: pompelmi/pompelmi/.github/actions/pompelmi-scan@v1 with: artifact: build.zip deep_zip: true fail_on_detect: true ``` **Inputs** | Input | Default | Description | | --- | --- | --- | | `path` | `.` | Directory to scan. | | `artifact` | `""` | Single file/archive to scan. | | `yara_rules` | `""` | Glob path to YARA rules (e.g. `rules/*.yar`). | | `deep_zip` | `true` | Enable deep nested-archive inspection. | | `max_depth` | `3` | Max nested-archive depth. | | `fail_on_detect` | `true` | Fail the job if detections occur. | > The Action lives in this repo at `.github/actions/pompelmi-scan`. When published to the Marketplace, consumers can copy the snippets above as-is. --- ## Adapters Use the adapter that matches your web framework. All adapters share the same policy options and scanning contract. | Framework | Package | Status | | --- | --- | --- | | Express | `@pompelmi/express-middleware` | alpha | | Koa | `@pompelmi/koa-middleware` | alpha | | Next.js (App Router) | `@pompelmi/next-upload` | alpha | | Fastify | `@pompelmi/fastify-plugin` | alpha | | NestJS | nestjs — planned | | Remix | remix — planned | | hapi | hapi plugin — planned | | SvelteKit | sveltekit — planned | --- ## Diagrams ### Upload scanning flow ```mermaid flowchart TD A["Client uploads file(s)"] --> B["Web App Route"] B --> C{"Pre-filters<br/>(ext, size, MIME)"} C -- fail --> X["HTTP 4xx"] C -- pass --> D{"Is ZIP?"} D -- yes --> E["Iterate entries<br/>(limits & scan)"] E --> F{"Verdict?"} D -- no --> F{"Scan bytes"} F -- malicious/suspicious --> Y["HTTP 422 blocked"] F -- clean --> Z["HTTP 200 ok + results"] ``` <details> <summary>Mermaid source</summary> ```mermaid flowchart TD A["Client uploads file(s)"] --> B["Web App Route"] B --> C{"Pre-filters<br/>(ext, size, MIME)"} C -- fail --> X["HTTP 4xx"] C -- pass --> D{"Is ZIP?"} D -- yes --> E["Iterate entries<br/>(limits & scan)"] E --> F{"Verdict?"} D -- no --> F{"Scan bytes"} F -- malicious/suspicious --> Y["HTTP 422 blocked"] F -- clean --> Z["HTTP 200 ok + results"] ``` </details> ### Sequence (App ↔ pompelmi ↔ YARA) ```mermaid sequenceDiagram participant U as User participant A as App Route (/upload) participant P as pompelmi (adapter) participant Y as YARA engine U->>A: POST multipart/form-data A->>P: guard(files, policies) P->>P: MIME sniff + size + ext checks alt ZIP archive P->>P: unpack entries with limits end P->>Y: scan(bytes) Y-->>P: matches[] P-->>A: verdict (clean/suspicious/malicious) A-->>U: 200 or 4xx/422 with reason ``` <details> <summary>Mermaid source</summary> ```mermaid sequenceDiagram participant U as User participant A as App Route (/upload) participant P as pompelmi (adapter) participant Y as YARA engine U->>A: POST multipart/form-data A->>P: guard(files, policies) P->>P: MIME sniff + size + ext checks alt ZIP archive P->>P: unpack entries with limits end P->>Y: scan(bytes) Y-->>P: matches[] P-->>A: verdict (clean/suspicious/malicious) A-->>U: 200 or 4xx/422 with reason ``` </details> ### Components (monorepo) ```mermaid flowchart LR subgraph Repo core["pompelmi (core)"] express["@pompelmi/express-middleware"] koa["@pompelmi/koa-middleware"] next["@pompelmi/next-upload"] fastify(("fastify-plugin · planned")) nest(("nestjs · planned")) remix(("remix · planned")) hapi(("hapi-plugin · planned")) svelte(("sveltekit · planned")) end core --> express core --> koa core --> next core -.-> fastify core -.-> nest core -.-> remix core -.-> hapi core -.-> svelte ``` <details> <summary>Mermaid source</summary> ```mermaid flowchart LR subgraph Repo core["pompelmi (core)"] express["@pompelmi/express-middleware"] koa["@pompelmi/koa-middleware"] next["@pompelmi/next-upload"] fastify(("fastify-plugin · planned")) nest(("nestjs · planned")) remix(("remix · planned")) hapi(("hapi-plugin · planned")) svelte(("sveltekit · planned")) end core --> express core --> koa core --> next core -.-> fastify core -.-> nest core -.-> remix core -.-> hapi core -.-> svelte ``` </details> --- ## Configuration All adapters accept a common set of options: | Option | Type (TS) | Purpose | | --- | --- | --- | | `scanner` | `{ scan(bytes: Uint8Array): Promise<Match[]> }` | Your scanning engine. Return `[]` when clean; non‑empty to flag. | | `includeExtensions` | `string[]` | Allow‑list of file extensions. Evaluated case‑insensitively. | | `allowedMimeTypes` | `string[]` | Allow‑list of MIME types after magic‑byte sniffing. | | `maxFileSizeBytes` | `number` | Per‑file size cap. Oversize files are rejected early. | | `timeoutMs` | `number` | Per‑file scan timeout; guards against stuck scanners. | | `concurrency` | `number` | How many files to scan in parallel. | | `failClosed` | `boolean` | If `true`, errors/timeouts block the upload. | | `onScanEvent` | `(event: unknown) => void` | Optional telemetry hook for logging/metrics. | **Common recipes** Allow only images up to 5 MB: ```ts includeExtensions: ['png','jpg','jpeg','webp'], allowedMimeTypes: ['image/png','image/jpeg','image/webp'], maxFileSizeBytes: 5 * 1024 * 1024, failClosed: true, ``` --- ## Production checklist - [ ] **Limit file size** aggressively (`maxFileSizeBytes`). - [ ] **Restrict extensions & MIME** to what your app truly needs. - [ ] **Set `failClosed: true` in production** to block on timeouts/errors. - [ ] **Handle ZIPs carefully** (enable deep ZIP, keep nesting low, cap entry sizes). - [ ] **Compose scanners** with `composeScanners()` and enable `stopOn` to fail fast on early detections. - [ ] **Log scan events** (`onScanEvent`) and monitor for spikes. - [ ] **Run scans in a separate process/container** for defense‑in‑depth when possible. - [ ] **Sanitize file names and paths** if you persist uploads. - [ ] **Prefer memory storage + post‑processing**; avoid writing untrusted bytes before policy passes. - [ ] **Add CI scanning** with the GitHub Action to catch bad files in repos/artifacts. --- ## YARA Getting Started YARA lets you detect suspicious or malicious content using pattern‑matching rules. **pompelmi** treats YARA matches as signals that you can map to your own verdicts (e.g., mark high‑confidence rules as `malicious`, heuristics as `suspicious`). > **Status:** Optional. You can run without YARA. If you adopt it, keep your rules small, time‑bound, and tuned to your threat model. ### Starter rules Below are three example rules you can adapt: `rules/starter/eicar.yar` ```yar rule EICAR_Test_File { meta: description = "EICAR antivirus test string (safe)" reference = "https://www.eicar.org" confidence = "high" verdict = "malicious" strings: $eicar = "X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*" condition: $eicar } ``` `rules/starter/pdf_js.yar` ```yar rule PDF_JavaScript_Embedded { meta: description = "PDF contains embedded JavaScript (heuristic)" confidence = "medium" verdict = "suspicious" strings: $magic = { 25 50 44 46 } // "%PDF" $js1 = "/JavaScript" ascii $js2 = "/JS" ascii $open = "/OpenAction" ascii $aa = "/AA" ascii condition: uint32(0) == 0x25504446 and ( $js1 or $js2 ) and ( $open or $aa ) } ``` `rules/starter/office_macros.yar` ```yar rule Office_Macro_Suspicious_Words { meta: description = "Heuristic: suspicious VBA macro keywords" confidence = "medium" verdict = "suspicious" strings: $s1 = /Auto(Open|Close)/ nocase $s2 = "Document_Open" nocase ascii $s3 = "CreateObject(" nocase ascii $s4 = "WScript.Shell" nocase ascii $s5 = "Shell(" nocase ascii $s6 = "Sub Workbook_Open()" nocase ascii condition: 2 of ($s*) } ``` > These are **examples**. Expect some false positives; tune to your app. ### Minimal integration (adapter contract) If you use a YARA binding (e.g., `@automattic/yara`), wrap it behind the `scanner` contract: ```ts // Example YARA scanner adapter (pseudo‑code) import * as Y from '@automattic/yara'; // Compile your rules from disk at boot (recommended) // const sources = await fs.readFile('rules/starter/*.yar', 'utf8'); // const compiled = await Y.compile(sources); export const YourYaraScanner = { async scan(bytes: Uint8Array) { // const matches = await compiled.scan(bytes, { timeout: 1500 }); const matches = []; // plug your engine here // Map to the structure your app expects; return [] when clean. return matches.map((m: any) => ({ rule: m.rule, meta: m.meta ?? {}, tags: m.tags ?? [], })); } }; ``` Then include it in your composed scanner: ```ts import { composeScanners, CommonHeuristicsScanner } from 'pompelmi'; // import { YourYaraScanner } from './yara-scanner'; export const scanner = composeScanners( [ ['heuristics', CommonHeuristicsScanner], // ['yara', YourYaraScanner], ], { parallel: false, stopOn: 'suspicious', timeoutMsPerScanner: 1500, tagSourceName: true } ); ``` ### Policy suggestion (mapping matches → verdict) - **malicious**: high‑confidence rules (e.g., `EICAR_Test_File`) - **suspicious**: heuristic rules (e.g., PDF JavaScript, macro keywords) - **clean**: no matches Combine YARA with MIME sniffing, ZIP safety limits, and strict size/time caps. ## Quick test (no EICAR) Use the examples above, then send a **minimal PDF** that contains risky tokens (this triggers the built‑in heuristics). **1) Create a tiny PDF with risky actions** Linux: ```bash printf '%%PDF-1.7\n1 0 obj\n<< /OpenAction 1 0 R /AA << /JavaScript (alert(1)) >> >>\nendobj\n%%EOF\n' > risky.pdf ``` macOS: ```bash printf '%%PDF-1.7\n1 0 obj\n<< /OpenAction 1 0 R /AA << /JavaScript (alert(1)) >> >>\nendobj\n%%EOF\n' > risky.pdf ``` **2) Send it to your endpoint** Express (default from the Quick‑start): ```bash curl -F "file=@risky.pdf;type=application/pdf" http://localhost:3000/upload -i ``` You should see an HTTP **422 Unprocessable Entity** (blocked by policy). Clean files return **200 OK**. Pre‑filter failures (size/ext/MIME) should return a **4xx**. Adapt these conventions to your app as needed. --- ## Security notes - The library **reads** bytes; it never executes files. - YARA detections depend on the **rules you provide**; expect some false positives/negatives. - ZIP scanning applies limits (entries, per‑entry size, total uncompressed, nesting) to reduce archive‑bomb risk. - Prefer running scans in a **dedicated process/container** for defense‑in‑depth. --- ## Star history [![Star History Chart](https://api.star-history.com/svg?repos=pompelmi/pompelmi&type=Date)](https://star-history.com/#pompelmi/pompelmi&Date) --- ## FAQ **Do I need YARA?** No. `scanner` is pluggable. The examples use a minimal scanner for clarity; you can call out to a YARA engine or any other detector you prefer. **Where do the results live?** In the examples, the guard attaches scan data to the request context (e.g. `req.pompelmi` in Express, `ctx.pompelmi` in Koa). In Next.js, include the results in your JSON response as you see fit. **Why 422 for blocked files?** Using **422** to signal a policy violation keeps it distinct from transport errors; it’s a common pattern. Use the codes that best match your API guidelines. **Are ZIP bombs handled?** Archives are traversed with limits to reduce archive‑bomb risk. Keep your size limits conservative and prefer `failClosed: true` in production. --- ## Tests & Coverage Run tests locally with coverage: ```bash pnpm vitest run --coverage --passWithNoTests ``` The badge tracks the **core library** (`src/**`). Adapters and engines are reported separately for now and will be folded into global coverage as their suites grow. If you integrate Codecov in CI, upload `coverage/lcov.info` and you can use this Codecov badge: ```md [![codecov](https://codecov.io/gh/pompelmi/pompelmi/branch/main/graph/badge.svg?flag=core)](https://codecov.io/gh/pompelmi/pompelmi) ``` ## Contributing PRs and issues welcome! Start with: ```bash pnpm -r build pnpm -r lint ``` --- ## License [MIT](./LICENSE) © 2025‑present pompelmi contributors