UNPKG

word-sensor

Version:

A powerful and flexible word filtering library for JavaScript/TypeScript with advanced features like regex patterns, statistics, and batch processing

573 lines (443 loc) โ€ข 15.3 kB
# WordSensor v2.0.0 ๐Ÿš€ [![npm version](https://badge.fury.io/js/word-sensor.svg)](https://badge.fury.io/js/word-sensor) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?logo=typescript&logoColor=white)](https://www.typescriptlang.org/) [![Tests](https://img.shields.io/badge/Tests-36%20passed-brightgreen)](https://github.com/asruldev/word-sensor) **WordSensor** is a powerful and flexible word filtering library for JavaScript/TypeScript. It helps you detect, replace, or remove forbidden words from text with advanced features like regex patterns, statistics, batch processing, and more. ## โœจ Features - ๐Ÿ” **Advanced Detection**: Detect prohibited words with precise positioning - ๐Ÿšซ **Multiple Filtering Modes**: Replace, remove, or highlight forbidden words - ๐ŸŽญ **Smart Masking**: Full, partial, or smart masking options - ๐Ÿ“Š **Statistics & Analytics**: Track detections and get detailed insights - ๐Ÿ”ง **Regex Support**: Use custom regex patterns for complex filtering - ๐Ÿ“ฆ **Batch Processing**: Process multiple texts efficiently - ๐ŸŽฏ **Preset Filters**: Ready-to-use profanity, spam, and phishing filters - ๐Ÿ”„ **Custom Replacers**: Create custom replacement functions - ๐Ÿ“ˆ **Real-time Monitoring**: Log and track all detections - ๐ŸŒ **API Integration**: Load forbidden words from external APIs - ๐Ÿ“ **File Support**: Import word lists from files - โšก **High Performance**: Optimized for speed and memory efficiency - ๐ŸŽจ **Emoji Replacers**: Replace words with emojis - ๐Ÿ”’ **Word Boundaries**: Configurable word boundary detection - ๐Ÿ“ **TypeScript Support**: Full TypeScript definitions included ## ๐Ÿ“ฆ Installation ```bash npm install word-sensor ``` or ```bash yarn add word-sensor ``` ## ๐Ÿš€ Quick Start ### Basic Usage ```typescript import { WordSensor } from 'word-sensor'; // Create a sensor with forbidden words const sensor = new WordSensor({ words: ['badword', 'offensive', 'rude'], maskChar: '*', caseInsensitive: true, logDetections: true }); // Filter text const result = sensor.filter('This is a badword test.'); console.log(result); // "This is a ******* test." ``` ### Using Preset Filters ```typescript import { createProfanityFilter, createSpamFilter, createPhishingFilter } from 'word-sensor'; // Create specialized filters const profanityFilter = createProfanityFilter(); const spamFilter = createSpamFilter(); const phishingFilter = createPhishingFilter(); // Use them console.log(profanityFilter.filter('This is badword content.')); // "This is ******* content." console.log(spamFilter.filter('Buy now! Free money!')); // "#### now! #### money!" ``` ## ๐Ÿ“š API Reference ### WordSensor Class #### Constructor ```typescript new WordSensor(config?: WordSensorConfig) ``` **Configuration Options:** - `words?: string[]` - Initial list of forbidden words - `maskChar?: string` - Character used for masking (default: "*") - `caseInsensitive?: boolean` - Case-insensitive matching (default: true) - `logDetections?: boolean` - Enable detection logging (default: false) - `enableRegex?: boolean` - Enable regex pattern support (default: false) - `wordBoundary?: boolean` - Use word boundaries (default: true) - `customReplacer?: (word: string, context: string) => string` - Custom replacement function #### Core Methods ##### `filter(text: string, mode?: "replace" | "remove" | "highlight", maskType?: "full" | "partial" | "smart"): string` Filter text with specified mode and masking type. ```typescript // Replace with full masking sensor.filter('This is badword.'); // "This is *******." // Remove forbidden words sensor.filter('This is badword.', 'remove'); // "This is ." // Highlight forbidden words sensor.filter('This is badword.', 'highlight'); // "This is [FILTERED: badword]." // Smart masking sensor.filter('This is badword.', 'replace', 'smart'); // "This is b****d." ``` ##### `detect(text: string): string[]` Detect all forbidden words in text. ```typescript const detected = sensor.detect('This contains badword and offensive content.'); console.log(detected); // ["badword", "offensive"] ``` ##### `detectWithPositions(text: string): Array<{word: string, start: number, end: number}>` Detect forbidden words with their positions. ```typescript const positions = sensor.detectWithPositions('This badword is offensive.'); console.log(positions); // [ // { word: "badword", start: 5, end: 12 }, // { word: "offensive", start: 16, end: 25 } // ] ``` #### Word Management ```typescript // Add words sensor.addWord('newbadword', '###'); // With custom mask sensor.addWords(['word1', 'word2']); // Remove words sensor.removeWord('badword'); sensor.removeWords(['word1', 'word2']); // Check words sensor.hasWord('badword'); // true/false sensor.getWords(); // Get all forbidden words sensor.clearWords(); // Clear all words ``` #### Regex Patterns ```typescript // Enable regex support const regexSensor = new WordSensor({ enableRegex: true }); // Add regex patterns regexSensor.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]'); regexSensor.addRegexPattern('\\b\\d{4}-\\d{4}-\\d{4}-\\d{4}\\b', '[CARD]'); // Filter with regex const result = regexSensor.filter('Contact me at test@example.com'); console.log(result); // "Contact me at [EMAIL]" ``` #### Statistics & Monitoring ```typescript // Get detection statistics const stats = sensor.getStats(); console.log(stats); // { // totalDetections: 5, // uniqueWords: ["badword", "offensive"], // detectionCounts: { "badword": 3, "offensive": 2 }, // lastDetectionTime: Date // } // Get detection logs const logs = sensor.getDetectionLogs(); console.log(logs); // ["badword", "offensive", "badword", ...] // Reset statistics sensor.resetStats(); ``` #### Configuration Methods ```typescript // Update configuration sensor.setMaskChar('#'); sensor.setCaseInsensitive(false); sensor.setLogDetections(true); sensor.setCustomReplacer((word) => `[${word.toUpperCase()}]`); ``` #### Utility Methods ```typescript // Check if text is clean sensor.isClean('This is clean text.'); // true sensor.isClean('This has badword.'); // false // Get clean percentage sensor.getCleanPercentage('This badword is offensive.'); // 50 // Sanitize text (quick filter) sensor.sanitizeText('This is badword.'); // "This is *******." ``` ### Utility Functions #### Preset Filters ```typescript import { createProfanityFilter, createSpamFilter, createPhishingFilter, PRESET_WORDS } from 'word-sensor'; // Create specialized filters const profanityFilter = createProfanityFilter('*'); const spamFilter = createSpamFilter('#'); const phishingFilter = createPhishingFilter('!'); // Access preset word lists console.log(PRESET_WORDS.profanity); console.log(PRESET_WORDS.spam); console.log(PRESET_WORDS.phishing); ``` #### Batch Processing ```typescript import { batchFilter, batchDetect, getBatchStats } from 'word-sensor'; const texts = [ 'This is bad.', 'This is offensive.', 'This is clean.' ]; // Batch filter const filtered = batchFilter(texts, sensor); console.log(filtered); // ["This is ***.", "This is *********.", "This is clean."] // Batch detect const detected = batchDetect(texts, sensor); console.log(detected); // [ // { text: "This is bad.", detected: ["bad"] }, // { text: "This is offensive.", detected: ["offensive"] }, // { text: "This is clean.", detected: [] } // ] // Get batch statistics const stats = getBatchStats(texts, sensor); console.log(stats); // { // totalTexts: 3, // cleanTexts: 1, // dirtyTexts: 2, // totalDetections: 2, // averageCleanPercentage: 66.67 // } ``` #### Custom Replacers ```typescript import { createCustomReplacer, createEmojiReplacer } from 'word-sensor'; // Create custom replacer const customReplacer = createCustomReplacer({ 'bad': 'good', 'offensive': 'appropriate', 'rude': 'polite' }); // Create emoji replacer const emojiReplacer = createEmojiReplacer(); // Use with sensor sensor.setCustomReplacer(customReplacer); sensor.setCustomReplacer(emojiReplacer); ``` #### Regex Utilities ```typescript import { validateRegexPattern, escapeRegexSpecialChars } from 'word-sensor'; // Validate regex pattern validateRegexPattern('\\b\\w+\\b'); // true validateRegexPattern('invalid['); // false // Escape special characters escapeRegexSpecialChars('test.com'); // "test\\.com" escapeRegexSpecialChars('test*test'); // "test\\*test" ``` #### API Integration ```typescript import { loadForbiddenWordsFromAPI, loadWordsFromFile } from 'word-sensor'; // Load from API await loadForbiddenWordsFromAPI( 'https://api.example.com/forbidden-words', 'data.words', sensor ); // Load from file (browser) const fileInput = document.getElementById('file') as HTMLInputElement; const file = fileInput.files[0]; if (file) { const words = await loadWordsFromFile(file); sensor.addWords(words); } ``` ## ๐ŸŽฏ Advanced Examples ### Content Moderation System ```typescript import { WordSensor, createProfanityFilter, createSpamFilter } from 'word-sensor'; class ContentModerator { private profanityFilter: WordSensor; private spamFilter: WordSensor; private customFilter: WordSensor; constructor() { this.profanityFilter = createProfanityFilter(); this.spamFilter = createSpamFilter(); this.customFilter = new WordSensor({ enableRegex: true, wordBoundary: false }); // Add custom patterns this.customFilter.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]'); this.customFilter.addRegexPattern('\\b\\d{10,}\\b', '[PHONE]'); } moderateContent(content: string): { isClean: boolean; filteredContent: string; violations: string[]; stats: any; } { // Apply all filters let filteredContent = content; const violations: string[] = []; // Check profanity const profanityDetected = this.profanityFilter.detect(content); if (profanityDetected.length > 0) { violations.push('profanity'); filteredContent = this.profanityFilter.filter(filteredContent); } // Check spam const spamDetected = this.spamFilter.detect(content); if (spamDetected.length > 0) { violations.push('spam'); filteredContent = this.spamFilter.filter(filteredContent); } // Apply custom filters filteredContent = this.customFilter.filter(filteredContent); return { isClean: violations.length === 0, filteredContent, violations, stats: { profanity: this.profanityFilter.getStats(), spam: this.spamFilter.getStats(), custom: this.customFilter.getStats() } }; } } // Usage const moderator = new ContentModerator(); const result = moderator.moderateContent('This is badword spam content with test@example.com'); console.log(result); ``` ### Real-time Chat Filter ```typescript import { WordSensor, createEmojiReplacer } from 'word-sensor'; class ChatFilter { private sensor: WordSensor; private messageHistory: string[] = []; constructor() { this.sensor = new WordSensor({ words: ['badword', 'offensive'], logDetections: true, customReplacer: createEmojiReplacer() }); } processMessage(message: string, userId: string): { filteredMessage: string; isClean: boolean; warning: string | null; } { const filteredMessage = this.sensor.filter(message); const isClean = this.sensor.isClean(message); // Check user history const userViolations = this.messageHistory.filter(msg => msg.includes(userId) && !this.sensor.isClean(msg) ).length; let warning = null; if (!isClean) { if (userViolations >= 3) { warning = 'You have been warned multiple times. Further violations may result in a ban.'; } else { warning = 'Please keep the chat appropriate.'; } } // Log message this.messageHistory.push(`${userId}: ${message}`); return { filteredMessage, isClean, warning }; } getModerationStats() { return this.sensor.getStats(); } } ``` ### Batch Content Analysis ```typescript import { WordSensor, batchDetect, getBatchStats } from 'word-sensor'; class ContentAnalyzer { private sensor: WordSensor; constructor() { this.sensor = new WordSensor({ words: ['inappropriate', 'spam', 'offensive'], logDetections: true }); } analyzeBatch(contentList: string[]): { summary: any; details: Array<{ content: string; isClean: boolean; detectedWords: string[]; cleanPercentage: number; }>; } { const batchResults = batchDetect(contentList, this.sensor); const batchStats = getBatchStats(contentList, this.sensor); const details = contentList.map((content, index) => ({ content, isClean: batchResults[index].detected.length === 0, detectedWords: batchResults[index].detected, cleanPercentage: this.sensor.getCleanPercentage(content) })); return { summary: { ...batchStats, sensorStats: this.sensor.getStats() }, details }; } } ``` ## ๐Ÿงช Testing ```bash # Run tests npm test # Run tests in watch mode npm run test:watch # Run tests with coverage npm run test:coverage ``` ## ๐Ÿ“ฆ Build ```bash # Build for production npm run build # Build in watch mode npm run dev # Clean build artifacts npm run clean ``` ## ๐Ÿค Contributing Contributions are welcome! Please feel free to submit a Pull Request. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add some amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## ๐Ÿ“„ License This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ‘จโ€๐Ÿ’ป Author Developed by [Asrul Harahap](https://github.com/asruldev). - GitHub: [@asruldev](https://github.com/asruldev) - Twitter: [@asruldev](https://twitter.com/asruldev) ## ๐Ÿ™ Acknowledgments - Thanks to all contributors who helped improve this library - Inspired by the need for better content moderation tools - Built with TypeScript for better developer experience ## ๐Ÿ“ˆ Changelog ### v2.0.0 - โœจ **Major Release**: Complete rewrite with advanced features - ๐Ÿ”ง **New Constructor**: Config-based initialization - ๐Ÿ“Š **Statistics**: Comprehensive detection tracking - ๐Ÿ” **Regex Support**: Custom regex pattern filtering - ๐Ÿ“ฆ **Batch Processing**: Efficient multi-text processing - ๐ŸŽฏ **Preset Filters**: Ready-to-use specialized filters - ๐ŸŽจ **Custom Replacers**: Flexible replacement functions - ๐Ÿ“ˆ **Position Detection**: Get exact word positions - ๐Ÿ”„ **Smart Masking**: Intelligent masking algorithms - ๐ŸŒ **API Integration**: External word list loading - ๐Ÿ“ **File Support**: Import word lists from files - ๐ŸŽจ **Emoji Replacers**: Fun emoji-based replacements - ๐Ÿ“ **Enhanced Types**: Better TypeScript support - ๐Ÿงช **Comprehensive Tests**: 36 test cases covering all features ### v1.0.5 - ๐Ÿ› Bug fixes and improvements - ๐Ÿ“ Better documentation --- โญ **Star this repository if you find it useful!**