utf8-sanitize
Version:
A performant zero-dependency utility to clean UTF-8 text, fix mojibake from latin1, verify string length, and sanitize input
96 lines (80 loc) • 2.88 kB
Markdown
# `utf8-sanitize` Usage
# Functions
```js
FullSanitize(input, options) // => string
//Provides a full pipeline to validate byte length, repair encoding, and sanitize a string, options passed to SanitizeInput
FixLatin1Corrupt(input) // => string
//Repairs mojibake corruption in latin1 single-byte to multi-byte UTF-8 character conversion with no dependencies
VerifyByteLength(input) // => boolean
// Check if a string size matches its expected or safe 32-bit size
SanitizeInput(input, options) // => string
// Cleans string by removing or escaping characters based on a sanitization mode specified in options (alphanumeric, html, filename)
MAX_SAFE_CHAR_LIMIT // => number
// Used by VerifyByteLength, max safe limit is 2^28 - 16 on V8 32-bit, rounded down to nearest hundred
```
## Options
### *Options are passed from* `FullSanitize` *for use in* `SanitizeInput`
* All modes remove common C0/C1 control and zero-width/invisible space characters
* Mode `alphanumeric` removes non-alphanumeric characters
* Mode `html` escapes tags such as `<script>` to prevent XSS
* Mode `filename` cleans characters disallowed in filenames on Win/OSX
* `keepSpaces` parameter is only used in `alphanumeric` mode
* Decides whether to clean spaces from input or not
* Defaults to true (keep) if unspecified
## `FullSanitize` Usage Examples
#### `alphanumeric`
```js
const options = {
mode: 'alphanumeric',
keepSpaces: false
};
const input = 'User: él (123)';
const result = FullSanitize(input, options);
// Expected output: "Userél123"
```
#### `html`
```js
const options = {
mode: 'html'
};
const input = 'é input <script>alert("XSS")</script>';
const result = FullSanitize(input, options);
// Expected output: "él input <script>alert("XSS")</script>"
```
#### `filename`
```js
const options = {
mode: 'filename'
};
const input = 'Report <Q1/2025> | Final é?.txt';
const result = FullSanitize(input, options);
// Expected output: "Report Q12025 Final él.txt"
```
## `FixLatin1Corrupt` Usage Example
```js
const input = 'El menú del dÃa.';
const result = FixLatin1Corrupt(input);
console.log(result);
// Expected output: "El menú del día."
```
## `VerifyByteLength` Usage Example
```js
const validInput = 'String 1';
console.log(VerifyByteLength(validInput));
// Expected output: true
const invalidInput = null;
console.log(VerifyByteLength(invalidInput));
// Expected output: false
```
## `SanitizeInput` Usage Example
```js
const input = '<p>A "test"!</p>';
const options = { mode: 'html' };
console.log(SanitizeInput(input, options));
// Expected output: "<p>A "test"!</p>"
```
*HTML example, see* `FullSanitize` *for* `alphanumeric` *and* `filename`
## `MAX_SAFE_CHAR_LIMIT` Usage Example
```js
if (input <= MAX_SAFE_CHAR_LIMIT)
```