word-sensor
Version:
A powerful and flexible word filtering library for JavaScript/TypeScript with advanced features like regex patterns, statistics, and batch processing
573 lines (443 loc) โข 15.3 kB
Markdown
# WordSensor v2.0.0 ๐
[](https://badge.fury.io/js/word-sensor)
[](https://opensource.org/licenses/MIT)
[](https://www.typescriptlang.org/)
[](https://github.com/asruldev/word-sensor)
**WordSensor** is a powerful and flexible word filtering library for JavaScript/TypeScript. It helps you detect, replace, or remove forbidden words from text with advanced features like regex patterns, statistics, batch processing, and more.
## โจ Features
- ๐ **Advanced Detection**: Detect prohibited words with precise positioning
- ๐ซ **Multiple Filtering Modes**: Replace, remove, or highlight forbidden words
- ๐ญ **Smart Masking**: Full, partial, or smart masking options
- ๐ **Statistics & Analytics**: Track detections and get detailed insights
- ๐ง **Regex Support**: Use custom regex patterns for complex filtering
- ๐ฆ **Batch Processing**: Process multiple texts efficiently
- ๐ฏ **Preset Filters**: Ready-to-use profanity, spam, and phishing filters
- ๐ **Custom Replacers**: Create custom replacement functions
- ๐ **Real-time Monitoring**: Log and track all detections
- ๐ **API Integration**: Load forbidden words from external APIs
- ๐ **File Support**: Import word lists from files
- โก **High Performance**: Optimized for speed and memory efficiency
- ๐จ **Emoji Replacers**: Replace words with emojis
- ๐ **Word Boundaries**: Configurable word boundary detection
- ๐ **TypeScript Support**: Full TypeScript definitions included
## ๐ฆ Installation
```bash
npm install word-sensor
```
or
```bash
yarn add word-sensor
```
## ๐ Quick Start
### Basic Usage
```typescript
import { WordSensor } from 'word-sensor';
// Create a sensor with forbidden words
const sensor = new WordSensor({
words: ['badword', 'offensive', 'rude'],
maskChar: '*',
caseInsensitive: true,
logDetections: true
});
// Filter text
const result = sensor.filter('This is a badword test.');
console.log(result); // "This is a ******* test."
```
### Using Preset Filters
```typescript
import { createProfanityFilter, createSpamFilter, createPhishingFilter } from 'word-sensor';
// Create specialized filters
const profanityFilter = createProfanityFilter();
const spamFilter = createSpamFilter();
const phishingFilter = createPhishingFilter();
// Use them
console.log(profanityFilter.filter('This is badword content.')); // "This is ******* content."
console.log(spamFilter.filter('Buy now! Free money!')); // "#### now! #### money!"
```
## ๐ API Reference
### WordSensor Class
#### Constructor
```typescript
new WordSensor(config?: WordSensorConfig)
```
**Configuration Options:**
- `words?: string[]` - Initial list of forbidden words
- `maskChar?: string` - Character used for masking (default: "*")
- `caseInsensitive?: boolean` - Case-insensitive matching (default: true)
- `logDetections?: boolean` - Enable detection logging (default: false)
- `enableRegex?: boolean` - Enable regex pattern support (default: false)
- `wordBoundary?: boolean` - Use word boundaries (default: true)
- `customReplacer?: (word: string, context: string) => string` - Custom replacement function
#### Core Methods
##### `filter(text: string, mode?: "replace" | "remove" | "highlight", maskType?: "full" | "partial" | "smart"): string`
Filter text with specified mode and masking type.
```typescript
// Replace with full masking
sensor.filter('This is badword.'); // "This is *******."
// Remove forbidden words
sensor.filter('This is badword.', 'remove'); // "This is ."
// Highlight forbidden words
sensor.filter('This is badword.', 'highlight'); // "This is [FILTERED: badword]."
// Smart masking
sensor.filter('This is badword.', 'replace', 'smart'); // "This is b****d."
```
##### `detect(text: string): string[]`
Detect all forbidden words in text.
```typescript
const detected = sensor.detect('This contains badword and offensive content.');
console.log(detected); // ["badword", "offensive"]
```
##### `detectWithPositions(text: string): Array<{word: string, start: number, end: number}>`
Detect forbidden words with their positions.
```typescript
const positions = sensor.detectWithPositions('This badword is offensive.');
console.log(positions);
// [
// { word: "badword", start: 5, end: 12 },
// { word: "offensive", start: 16, end: 25 }
// ]
```
#### Word Management
```typescript
// Add words
sensor.addWord('newbadword', '###'); // With custom mask
sensor.addWords(['word1', 'word2']);
// Remove words
sensor.removeWord('badword');
sensor.removeWords(['word1', 'word2']);
// Check words
sensor.hasWord('badword'); // true/false
sensor.getWords(); // Get all forbidden words
sensor.clearWords(); // Clear all words
```
#### Regex Patterns
```typescript
// Enable regex support
const regexSensor = new WordSensor({ enableRegex: true });
// Add regex patterns
regexSensor.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]');
regexSensor.addRegexPattern('\\b\\d{4}-\\d{4}-\\d{4}-\\d{4}\\b', '[CARD]');
// Filter with regex
const result = regexSensor.filter('Contact me at test@example.com');
console.log(result); // "Contact me at [EMAIL]"
```
#### Statistics & Monitoring
```typescript
// Get detection statistics
const stats = sensor.getStats();
console.log(stats);
// {
// totalDetections: 5,
// uniqueWords: ["badword", "offensive"],
// detectionCounts: { "badword": 3, "offensive": 2 },
// lastDetectionTime: Date
// }
// Get detection logs
const logs = sensor.getDetectionLogs();
console.log(logs); // ["badword", "offensive", "badword", ...]
// Reset statistics
sensor.resetStats();
```
#### Configuration Methods
```typescript
// Update configuration
sensor.setMaskChar('#');
sensor.setCaseInsensitive(false);
sensor.setLogDetections(true);
sensor.setCustomReplacer((word) => `[${word.toUpperCase()}]`);
```
#### Utility Methods
```typescript
// Check if text is clean
sensor.isClean('This is clean text.'); // true
sensor.isClean('This has badword.'); // false
// Get clean percentage
sensor.getCleanPercentage('This badword is offensive.'); // 50
// Sanitize text (quick filter)
sensor.sanitizeText('This is badword.'); // "This is *******."
```
### Utility Functions
#### Preset Filters
```typescript
import {
createProfanityFilter,
createSpamFilter,
createPhishingFilter,
PRESET_WORDS
} from 'word-sensor';
// Create specialized filters
const profanityFilter = createProfanityFilter('*');
const spamFilter = createSpamFilter('#');
const phishingFilter = createPhishingFilter('!');
// Access preset word lists
console.log(PRESET_WORDS.profanity);
console.log(PRESET_WORDS.spam);
console.log(PRESET_WORDS.phishing);
```
#### Batch Processing
```typescript
import { batchFilter, batchDetect, getBatchStats } from 'word-sensor';
const texts = [
'This is bad.',
'This is offensive.',
'This is clean.'
];
// Batch filter
const filtered = batchFilter(texts, sensor);
console.log(filtered);
// ["This is ***.", "This is *********.", "This is clean."]
// Batch detect
const detected = batchDetect(texts, sensor);
console.log(detected);
// [
// { text: "This is bad.", detected: ["bad"] },
// { text: "This is offensive.", detected: ["offensive"] },
// { text: "This is clean.", detected: [] }
// ]
// Get batch statistics
const stats = getBatchStats(texts, sensor);
console.log(stats);
// {
// totalTexts: 3,
// cleanTexts: 1,
// dirtyTexts: 2,
// totalDetections: 2,
// averageCleanPercentage: 66.67
// }
```
#### Custom Replacers
```typescript
import { createCustomReplacer, createEmojiReplacer } from 'word-sensor';
// Create custom replacer
const customReplacer = createCustomReplacer({
'bad': 'good',
'offensive': 'appropriate',
'rude': 'polite'
});
// Create emoji replacer
const emojiReplacer = createEmojiReplacer();
// Use with sensor
sensor.setCustomReplacer(customReplacer);
sensor.setCustomReplacer(emojiReplacer);
```
#### Regex Utilities
```typescript
import { validateRegexPattern, escapeRegexSpecialChars } from 'word-sensor';
// Validate regex pattern
validateRegexPattern('\\b\\w+\\b'); // true
validateRegexPattern('invalid['); // false
// Escape special characters
escapeRegexSpecialChars('test.com'); // "test\\.com"
escapeRegexSpecialChars('test*test'); // "test\\*test"
```
#### API Integration
```typescript
import { loadForbiddenWordsFromAPI, loadWordsFromFile } from 'word-sensor';
// Load from API
await loadForbiddenWordsFromAPI(
'https://api.example.com/forbidden-words',
'data.words',
sensor
);
// Load from file (browser)
const fileInput = document.getElementById('file') as HTMLInputElement;
const file = fileInput.files[0];
if (file) {
const words = await loadWordsFromFile(file);
sensor.addWords(words);
}
```
## ๐ฏ Advanced Examples
### Content Moderation System
```typescript
import { WordSensor, createProfanityFilter, createSpamFilter } from 'word-sensor';
class ContentModerator {
private profanityFilter: WordSensor;
private spamFilter: WordSensor;
private customFilter: WordSensor;
constructor() {
this.profanityFilter = createProfanityFilter();
this.spamFilter = createSpamFilter();
this.customFilter = new WordSensor({
enableRegex: true,
wordBoundary: false
});
// Add custom patterns
this.customFilter.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]');
this.customFilter.addRegexPattern('\\b\\d{10,}\\b', '[PHONE]');
}
moderateContent(content: string): {
isClean: boolean;
filteredContent: string;
violations: string[];
stats: any;
} {
// Apply all filters
let filteredContent = content;
const violations: string[] = [];
// Check profanity
const profanityDetected = this.profanityFilter.detect(content);
if (profanityDetected.length > 0) {
violations.push('profanity');
filteredContent = this.profanityFilter.filter(filteredContent);
}
// Check spam
const spamDetected = this.spamFilter.detect(content);
if (spamDetected.length > 0) {
violations.push('spam');
filteredContent = this.spamFilter.filter(filteredContent);
}
// Apply custom filters
filteredContent = this.customFilter.filter(filteredContent);
return {
isClean: violations.length === 0,
filteredContent,
violations,
stats: {
profanity: this.profanityFilter.getStats(),
spam: this.spamFilter.getStats(),
custom: this.customFilter.getStats()
}
};
}
}
// Usage
const moderator = new ContentModerator();
const result = moderator.moderateContent('This is badword spam content with test@example.com');
console.log(result);
```
### Real-time Chat Filter
```typescript
import { WordSensor, createEmojiReplacer } from 'word-sensor';
class ChatFilter {
private sensor: WordSensor;
private messageHistory: string[] = [];
constructor() {
this.sensor = new WordSensor({
words: ['badword', 'offensive'],
logDetections: true,
customReplacer: createEmojiReplacer()
});
}
processMessage(message: string, userId: string): {
filteredMessage: string;
isClean: boolean;
warning: string | null;
} {
const filteredMessage = this.sensor.filter(message);
const isClean = this.sensor.isClean(message);
// Check user history
const userViolations = this.messageHistory.filter(msg =>
msg.includes(userId) && !this.sensor.isClean(msg)
).length;
let warning = null;
if (!isClean) {
if (userViolations >= 3) {
warning = 'You have been warned multiple times. Further violations may result in a ban.';
} else {
warning = 'Please keep the chat appropriate.';
}
}
// Log message
this.messageHistory.push(`${userId}: ${message}`);
return { filteredMessage, isClean, warning };
}
getModerationStats() {
return this.sensor.getStats();
}
}
```
### Batch Content Analysis
```typescript
import { WordSensor, batchDetect, getBatchStats } from 'word-sensor';
class ContentAnalyzer {
private sensor: WordSensor;
constructor() {
this.sensor = new WordSensor({
words: ['inappropriate', 'spam', 'offensive'],
logDetections: true
});
}
analyzeBatch(contentList: string[]): {
summary: any;
details: Array<{
content: string;
isClean: boolean;
detectedWords: string[];
cleanPercentage: number;
}>;
} {
const batchResults = batchDetect(contentList, this.sensor);
const batchStats = getBatchStats(contentList, this.sensor);
const details = contentList.map((content, index) => ({
content,
isClean: batchResults[index].detected.length === 0,
detectedWords: batchResults[index].detected,
cleanPercentage: this.sensor.getCleanPercentage(content)
}));
return {
summary: {
...batchStats,
sensorStats: this.sensor.getStats()
},
details
};
}
}
```
## ๐งช Testing
```bash
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
```
## ๐ฆ Build
```bash
# Build for production
npm run build
# Build in watch mode
npm run dev
# Clean build artifacts
npm run clean
```
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## ๐ License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
## ๐จโ๐ป Author
Developed by [Asrul Harahap](https://github.com/asruldev).
- GitHub: [@asruldev](https://github.com/asruldev)
- Twitter: [@asruldev](https://twitter.com/asruldev)
## ๐ Acknowledgments
- Thanks to all contributors who helped improve this library
- Inspired by the need for better content moderation tools
- Built with TypeScript for better developer experience
## ๐ Changelog
### v2.0.0
- โจ **Major Release**: Complete rewrite with advanced features
- ๐ง **New Constructor**: Config-based initialization
- ๐ **Statistics**: Comprehensive detection tracking
- ๐ **Regex Support**: Custom regex pattern filtering
- ๐ฆ **Batch Processing**: Efficient multi-text processing
- ๐ฏ **Preset Filters**: Ready-to-use specialized filters
- ๐จ **Custom Replacers**: Flexible replacement functions
- ๐ **Position Detection**: Get exact word positions
- ๐ **Smart Masking**: Intelligent masking algorithms
- ๐ **API Integration**: External word list loading
- ๐ **File Support**: Import word lists from files
- ๐จ **Emoji Replacers**: Fun emoji-based replacements
- ๐ **Enhanced Types**: Better TypeScript support
- ๐งช **Comprehensive Tests**: 36 test cases covering all features
### v1.0.5
- ๐ Bug fixes and improvements
- ๐ Better documentation
---
โญ **Star this repository if you find it useful!**