obscenity
Version:
Robust, extensible profanity filter.
122 lines (86 loc) • 4.9 kB
Markdown
# Obscenity
> Robust, extensible profanity filter for NodeJS.
<a href="https://github.com/jo3-l/obscenity/actions"><img src="https://img.shields.io/github/actions/workflow/status/jo3-l/obscenity/.github/workflows/continuous-integration.yml?branch=main&style=for-the-badge" alt="Build status"></a>
<a href="https://app.codecov.io/gh/jo3-l/obscenity/"><img src="https://img.shields.io/codecov/c/github/jo3-l/obscenity?style=for-the-badge" alt="Codecov status"></a>
<a href="https://npmjs.com/package/obscenity"><img src="https://img.shields.io/npm/v/obscenity?style=for-the-badge" alt="npm version"></a>
<img src='https://img.shields.io/github/languages/top/jo3-l/serenity.svg?style=for-the-badge' alt='Language'/>
<a href="https://github.com/jo3-l/obscenity/blob/main/LICENSE.md"><img src="https://img.shields.io/github/license/jo3-l/obscenity?style=for-the-badge" alt="License"></a>
## Why Obscenity?
- **Accurate:** Though Obscenity is far from perfect (as with all profanity filters), it makes reducing false positives as simple as possible: adding whitelisted phrases is as easy as adding a new string to an array, and using word boundaries is equally simple.
- **Robust:** Obscenity's transformer-based design allows it to match on variants of phrases other libraries are typically unable to, e.g. `fuuuuuuuckkk`, `ʃṳ𝒸𝗄`, `wordsbeforefuckandafter` and so on. There's no need to manually write out all the variants either: just adding the pattern `fuck` will match all of the cases above by default.
- **Extensible:** With Obscenity, you aren't locked into anything - removing phrases that you don't agree with from the default set of words is trivial, as is disabling any transformations you don't like (perhaps you feel that leet-speak decoding is too error-prone for you).
## Installation
```shell
$ npm install obscenity
$ yarn add obscenity
$ pnpm add obscenity
```
## Example usage
First, import Obscenity:
```javascript
const { RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers } = require('obscenity');
```
Or, in TypeScript/ESM:
```typescript
import { RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers } from 'obscenity';
```
Now, we can create a new matcher using the English preset.
```javascript
const matcher = new RegExpMatcher({
...englishDataset.build(),
...englishRecommendedTransformers,
});
```
Now, we can use our matcher to search for profanities in the text. Here's two examples of what you can do:
**Check if there are any matches in some text:**
```javascript
if (matcher.hasMatch('fuck you')) {
console.log('The input text contains profanities.');
}
// The input text contains profanities.
```
**Output the positions of all matches along with the original word used:**
```javascript
// Pass "true" as the "sorted" parameter so the matches are sorted by their position.
const matches = matcher.getAllMatches('ʃ𝐟ʃὗƈk ỹоứ 𝔟ⁱẗ𝙘ɦ', true);
for (const match of matches) {
const { phraseMetadata, startIndex, endIndex } = englishDataset.getPayloadWithPhraseMetadata(match);
console.log(`Match for word ${phraseMetadata.originalWord} found between ${startIndex} and ${endIndex}.`);
}
// Match for word fuck found between 0 and 6.
// Match for word bitch found between 12 and 18.
```
**Censoring matched text:**
To censor text, we'll need to import another class: the `TextCensor`.
Some other imports and creation of the matcher have been elided for simplicity.
```javascript
const { TextCensor, ... } = require('obscenity');
// ...
const censor = new TextCensor();
const input = 'fuck you little bitch';
const matches = matcher.getAllMatches(input);
console.log(censor.applyTo(input, matches));
// %@$% you little **%@%
```
This is just a small slice of what Obscenity can do: for more, check out the [documentation](#documentation).
## Accuracy
> **Note:** As with all swear filters, Obscenity is not perfect (nor will it ever be). Use its output as a heuristic, and not as the sole judge of whether some content is appropriate or not.
With the English preset, Obscenity (correctly) finds matches in all of the following texts:
- you are a little **fuck**er
- **fk** you
- **ffuk** you
- i like **a$$es**
- <!-- biome-ignore format --> ʃ𝐟ʃὗƈk ỹоứ
...and it **does not match** on the following:
- the **pen is** mightier than the sword
- i love banan**as s**o yeah
- this song seems really b**anal**
- g**rape**s are really yummy
## Documentation
For a step-by-step guide on how to use Obscenity, check out the [guide](./docs/guide).
Otherwise, refer to the [auto-generated API documentation](./docs/reference).
## Contributing
Issues can be reported using the [issue tracker](https://github.com/jo3-l/obscenity/issues).
If you'd like to submit a pull request, please read the [contribution guide](./CONTRIBUTING.md) first.
## License
MIT