deep-profanity-filter
Version:
A thorough profanity filter that considers most common circumventions. Works with your custom list of blocked and whitelisted words and phrases. Identifies and/or replaces bad words. Works with *wildcards* at *start and/or end* of words.
88 lines (87 loc) • 4.65 kB
TypeScript
export type WordRegexComponents = {
start: string;
word: string;
end: string;
};
/**
* Escape a string so that we can build a `new RegExp(...)` with it and preserve
* any special characters within the string, such as `. * + ? ^ $ { } ( ) [ ] \\ /`
* and still match them properly. If you wish to match a singular backslash `\`
* literally, make sure that in your badwordlist or whitelist, as well as in your
* string that you are testing against, the backslash is escaped by replacing it
* with `\\`.
* @param inputString - The string you wish to escape for creating a regular expression.
* @returns The escaped string that can be used in `new RegExp(...)`
*/
export declare const escapeStringForRegex: (inputString: string) => string;
/**
* Splits up a word that has optional wildcards '*' at its start or end.
* Removes the wildcards, and returns an empty string for start and end
* if there was a wildcard, or a word-boundary string if there was none.
* These components are then used to build regular expressions with.
*
* @param {string} badword - The bad word to split into its components.
* @returns An object with the components accessible as obj.start, obj.word and obj.end
*/
export declare const getRegExpComponents: (badword: string) => WordRegexComponents;
/**
* Turn a bad word into a regular expression that checks if it is present
* in the string with word boundaries \b on each side that does not have a wildcard.
*
* The word "kitty" would result in the regular expression:
* /\bkitty\b/g
* the word "hell*" would result in the regular expression:
* /\bhell/g
* If the word is a phrase with whitespace, replace that whitespace with a regular
* expression that represents one or more non word characters.
* The phrase "ban ananas" turns into:
* /\bban[\W_]+ananas\b/g
*
* @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...)
* @returns The regular expression that can be used to find this word in a string.
*/
export declare const getNormalRegExp: (badwordComponents: WordRegexComponents) => RegExp;
/**
* Turn a bad word into a regular expression that checks for non-word characters
* interjected between all of the characters, but containing a word boundary
* on each side that does not have a wildcard.
*
* The word "kitty" would result in: /\bk[\W_]+i[\W_]+t[\W_]+t[\W_]+y\b/g
*
* It checks for variations such as:
* "k i t t y", "k-i-t-t-y", "k.i,t;t~y" (with a word boundary at each side)
*
* The word "hell*" would result in: /\bh[\W_]+e[\W_]+l[\W_]+l/g
*
* Phrases with whitespace
* If the word is a phrase with whitespace, turn the whitespace into the same
* regular expression that allows any non-word character, but make sure there is only
* one of these non-word-character-regexpressions at a space, as they allow 1 or more
* characters already (specified with the + at the end)
* So, "ban ananas" turns into:
* /\bb[\W_]+a[\W_]+n[\W_]+a[\W_]+n[\W_]+a[\W_]+n[\W_]+a[\W_]+s\b/g
*
* @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...)
* @returns The regular expression that can be used to find this word in a string.
*/
export declare const getCircumventionRegExp: (badwordComponents: WordRegexComponents) => RegExp;
/**
* Create a regular expression used for whitelisting, that treats singularly spaced out characters
* in front or after a bad word as "breaking the pattern" of the circumvention regular expression,
* so that words such as:
* "h e l l"
* can still get blocked, but words such as
* "s h e l l"
* will make sure that the input doesn't trigger on the phrase "hell".
*
* For an explanation on matchApostrophes, check the description of `preprocessWordLists(...)`,
* which covers the case of apostrophes matched vs. not matched at both the start and end of the word.
*
* @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...)
* @param {boolean} atWordStart - Whether this regex whitelists the word with an additional letter at the start
* or whether it covers the case of an additional letter at the end.
* @param {boolean} matchApostrophes - Whether the regular expression treats apostrophes before and after the word differently.
* @returns The regular expression that can be used to find this word in a string, or undefined if the regular expression
* is irrelevant and should not be used.
*/
export declare const getCircumventionWhitelistRegExp: (badwordComponents: WordRegexComponents, atWordStart: boolean, matchApostrophes: boolean) => RegExp;