deep-profanity-filter

Version:

A thorough profanity filter that considers most common circumventions. Works with your custom list of blocked and whitelisted words and phrases. Identifies and/or replaces bad words. Works with *wildcards* at *start and/or end* of words.

github.com/Zariem/deep-profanity-filter

Zariem/deep-profanity-filter

88 lines (87 loc) • 4.65 kB

TypeScript

View Raw

export type WordRegexComponents = { start: string; word: string; end: string; }; /** * Escape a string so that we can build a `new RegExp(...)` with it and preserve * any special characters within the string, such as `. * + ? ^ $ { } ( ) [ ] \\ /` * and still match them properly. If you wish to match a singular backslash `\` * literally, make sure that in your badwordlist or whitelist, as well as in your * string that you are testing against, the backslash is escaped by replacing it * with `\\`. * @param inputString - The string you wish to escape for creating a regular expression. * @returns The escaped string that can be used in `new RegExp(...)` */ export declare const escapeStringForRegex: (inputString: string) => string; /** * Splits up a word that has optional wildcards '*' at its start or end. * Removes the wildcards, and returns an empty string for start and end * if there was a wildcard, or a word-boundary string if there was none. * These components are then used to build regular expressions with. * * @param {string} badword - The bad word to split into its components. * @returns An object with the components accessible as obj.start, obj.word and obj.end */ export declare const getRegExpComponents: (badword: string) => WordRegexComponents; /** * Turn a bad word into a regular expression that checks if it is present * in the string with word boundaries \b on each side that does not have a wildcard. * * The word "kitty" would result in the regular expression: * /\bkitty\b/g * the word "hell*" would result in the regular expression: * /\bhell/g * If the word is a phrase with whitespace, replace that whitespace with a regular * expression that represents one or more non word characters. * The phrase "ban ananas" turns into: * /\bban[\W_]+ananas\b/g * * @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...) * @returns The regular expression that can be used to find this word in a string. */ export declare const getNormalRegExp: (badwordComponents: WordRegexComponents) => RegExp; /** * Turn a bad word into a regular expression that checks for non-word characters * interjected between all of the characters, but containing a word boundary * on each side that does not have a wildcard. * * The word "kitty" would result in: /\bk[\W_]+i[\W_]+t[\W_]+t[\W_]+y\b/g * * It checks for variations such as: * "k i t t y", "k-i-t-t-y", "k.i,t;t~y" (with a word boundary at each side) * * The word "hell*" would result in: /\bh[\W_]+e[\W_]+l[\W_]+l/g * * Phrases with whitespace * If the word is a phrase with whitespace, turn the whitespace into the same * regular expression that allows any non-word character, but make sure there is only * one of these non-word-character-regexpressions at a space, as they allow 1 or more * characters already (specified with the + at the end) * So, "ban ananas" turns into: * /\bb[\W_]+a[\W_]+n[\W_]+a[\W_]+n[\W_]+a[\W_]+n[\W_]+a[\W_]+s\b/g * * @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...) * @returns The regular expression that can be used to find this word in a string. */ export declare const getCircumventionRegExp: (badwordComponents: WordRegexComponents) => RegExp; /** * Create a regular expression used for whitelisting, that treats singularly spaced out characters * in front or after a bad word as "breaking the pattern" of the circumvention regular expression, * so that words such as: * "h e l l" * can still get blocked, but words such as * "s h e l l" * will make sure that the input doesn't trigger on the phrase "hell". * * For an explanation on matchApostrophes, check the description of `preprocessWordLists(...)`, * which covers the case of apostrophes matched vs. not matched at both the start and end of the word. * * @param {WordRegexComponents} badWordComponents - The bad word, split into components by getRegExpComponents(...) * @param {boolean} atWordStart - Whether this regex whitelists the word with an additional letter at the start * or whether it covers the case of an additional letter at the end. * @param {boolean} matchApostrophes - Whether the regular expression treats apostrophes before and after the word differently. * @returns The regular expression that can be used to find this word in a string, or undefined if the regular expression * is irrelevant and should not be used. */ export declare const getCircumventionWhitelistRegExp: (badwordComponents: WordRegexComponents, atWordStart: boolean, matchApostrophes: boolean) => RegExp;