unicode-to-plain-text
Version:
Convert fancy Unicode text to plain ASCII with smart language preservation
114 lines (81 loc) β’ 3.1 kB
Markdown
# unicode-to-plain-text
Convert fancy Unicode text to plain ASCII with smart language preservation
## Install
```
npm i unicode-to-plain-text
```
## Usage
Basic usage:
```js
import { toPlainText } from 'unicode-to-plain-text'
// Mathematical styles
toPlainText('πππ₯π₯π¨ ππ¨π«π₯π') // => 'Hello World'
// Enclosed characters
toPlainText('π
£π
π
’π
£') // => 'TEST'
// Fullwidth forms
toPlainText('οΌ¨οΌ₯οΌ¬οΌ¬οΌ―') // => 'HELLO'
```
Language preservation:
```js
// Real languages are automatically preserved
toPlainText('Hello ΞΡια ΟΞ±Ο') // => 'Hello ΞΡια ΟΞ±Ο' (Greek preserved)
toPlainText('Test ΠΡΠΈΠ²Π΅Ρ') // => 'Test ΠΡΠΈΠ²Π΅Ρ' (Cyrillic preserved)
// But lookalike characters are converted
toPlainText('Ξ test') // => 'A test' (Greek Alpha β Latin A)
```
Custom pipelines:
```js
import {
pipe,
handleUpsideDown,
mapCharacters,
normalizeUnicode,
removeDecorations,
normalizeWhitespace,
normalizeCasing
} from 'unicode-to-plain-text'
// Create a custom pipeline
const customTransform = pipe(
handleUpsideDown,
mapCharacters,
normalizeUnicode,
removeDecorations,
normalizeWhitespace
)
const result = customTransform('ππππ')
```
## API
### toPlainText(text, options?)
Converts fancy Unicode text to plain ASCII
| Property | Type | Description |
| --------- | ------ | ---------------------------------- |
| `text` | string | Input text with Unicode characters |
| `options` | object | Optional configuration object |
#### Options
| Option | Type | Default | Description |
| ---------------- | ------- | ------- | ------------------------------------------------------------------------------------ |
| `normalizeSpaces`| boolean | `true` | Collapse multiple spaces and trim whitespace |
| `skipEmoji` | boolean | `false` | Preserve emoji characters (still removes other decorations like box drawing, arrows) |
#### Examples
```js
// Default behavior - emojis removed
toPlainText('Hello π World') // => 'Hello World'
// Preserve emojis
toPlainText('Hello π World', { skipEmoji: true }) // => 'Hello π World'
// Preserve spacing
toPlainText('Hello World', { normalizeSpaces: false }) // => 'Hello World'
// Combined options
toPlainText('πππ₯π₯π¨ π ππ¨π«π₯π', { skipEmoji: true, normalizeSpaces: false })
// => 'Hello π World'
```
Returns a plain ASCII string with normalized whitespace and casing
### Individual Functions
- `handleUpsideDown(text)` - Reverses upside-down text
- `mapCharacters(text)` - Maps Unicode to ASCII equivalents
- `normalizeUnicode(text)` - Removes diacritics from Latin text
- `removeDecorations(text)` - Removes emojis and decorations
- `normalizeWhitespace(text)` - Normalizes and trims whitespace
- `normalizeCasing(text)` - Normalizes inconsistent casing
- `pipe(...fns)` - Composes functions into a pipeline
## License
Apache-2.0