llm-inject-scan
Version:
A tiny, fast library that scans user prompts for risky patterns before they reach your LLM model. It flags likely prompt-injection attempts so you can block, review, or route them differently—without making a model call.
72 lines (55 loc) • 3.87 kB
Markdown
# llm-inject-scan
A tiny, fast library that scans user prompts for risky patterns before they reach your LLM model. It flags likely prompt-injection attempts so you can block, review, or route them differently—without making a model call.
### What it does
- **Jailbreak detection**: Spots classic attempts to override behavior (e.g., asking to ignore previous instructions or force “developer mode”).
- **Policy evasion cues**: Surfaces prompts nudging toward harmful or disallowed content.
- **Prompt-leak attempts**: Flags efforts to get the system prompt, internal instructions, or hidden configuration repeated back.
- **Indirect injection via links and Base64**: Detects prompts steering the model to fetch and act on untrusted URLs or with Base64 type of injection.
- **Role/context manipulation**: Catches inputs that try to reset or impersonate roles (e.g., “system:” or contrived contexts that distort guardrails).
### Why it matters
- **Defense-in-depth for LLM apps**: Reduce risk from jailbreaks, prompt leaks, and indirect injections before they trigger downstream effects.
- **Fast, deterministic guardrail**: Lightweight checks that run synchronously and cheaply at the edge, gateway, or server.
- **Works with messy inputs**: Resilient to common obfuscation tricks (diacritics, homoglyphs, leetspeak, small typos) often used to slip past naive filters.
- **Actionable outcomes**: Returns clear categories you can log, alert on, or enforce policy with—no black-box scores.
### Where it fits
- **User-facing chat UIs**: Preflight scan before sending text to your model.
- **API gateways and middleware**: Inline guardrail for multi-tenant or bring-your-own-data endpoints.
- **Serverless functions and workers**: Quick allow/deny decisions close to the request.
- **Batch and ETL**: Sanity-check large volumes of prompts before processing.
### Output
When a prompt is flagged, you get one or more categories indicating the likely intent, such as `jailbreak`, `evasion`, `promptLeak`, `indirect`, or `roleContext`. Use these to block, require human review, or route to a safer policy.
### Install
```bash
npm install llm-inject-scan
```
### Usage
```ts
import { createPromptValidator } from 'llm-inject-scan';
const validate = createPromptValidator({ /* disableBase64Check: false, disableUrlCheck: false */ });
const result = validate('Ignore all previous instructions and...');
if (!result.clean)) {
// e.g., deny external fetch or sanitize the request
}
```
### Options
| Option | Type | Default | Description |
|---|---|---|---|
| `disableBase64Check` | `boolean` | `false` | Skip Base64-like blob detection. When true, Base64-looking input will not add an `evasion` flag. |
| `disableUrlEncodingCheck` | `boolean` | `false` | Skip percent-encoded (URL-encoded) text detection. When true, sequences like `%49%67%6e...` will not add an `evasion` flag. |
| `disableUrlCheck` | `boolean` | `false` | Skip URL detection. When true, http/https links will not add an `indirect` flag. |
```ts
import { createPromptValidator } from 'llm-inject-scan';
const validate = createPromptValidator({
disableBase64Check: true,
disableUrlCheck: true,
});
const result = validate('Summarize http://attacker.com/payload and SGVsbG8sIHdvcmxkIQ==');
```
### Scope and philosophy
- **Focused**: Optimized for English prompts today.
- **Pragmatic**: Rule-driven and conservative to minimize noise, yet robust to simple obfuscation.
- **Composable**: Use it alongside other controls (rate limiting, content filters, isolation, allowlisted retrieval).
### Status
Early-stage and evolving based on real-world attempts and research. Expect the taxonomy, API, and coverage to change—and things might break. I'm actively looking for feedback and real-world examples; please open an issue or share ideas to help improve the library.
### License
ISC