UNPKG

turkish-profanity-filter

Version:

A configurable Turkish profanity filter for text content

387 lines (293 loc) 11.1 kB
# Turkish Profanity Filter [![npm version](https://img.shields.io/npm/v/turkish-profanity-filter.svg)](https://www.npmjs.com/package/turkish-profanity-filter) [![npm downloads](https://img.shields.io/npm/dm/turkish-profanity-filter.svg)](https://www.npmjs.com/package/turkish-profanity-filter) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Test Status](https://github.com/derdogant/turkish-profanity-filter/workflows/Tests/badge.svg)](https://github.com/derdogant/turkish-profanity-filter/actions) A configurable library for detecting and censoring Turkish profanity in text content. This package properly handles Turkish special characters and provides flexible options for content moderation. ## Features - ✅ Built-in Turkish profanity word list - ✅ Proper handling of Turkish special characters (ğ, ü, ş, ı, ö, ç) - ✅ Configurable word matching (whole words or partial) - ✅ Case-sensitive or case-insensitive filtering - ✅ Customizable replacement text - ✅ Methods for detection, extraction, and censoring - ✅ Dynamic word list management (add/remove words) ## Installation ```bash # Using npm npm install turkish-profanity-filter # Using yarn yarn add turkish-profanity-filter # Using pnpm pnpm add turkish-profanity-filter ``` ## Quick Start ### CommonJS (Traditional Node.js) ```javascript const TurkishProfanityFilter = require('turkish-profanity-filter'); // Create a new filter instance with default settings const filter = new TurkishProfanityFilter(); // Check if text contains profanity const hasProfanity = filter.check('Bu cümlede kötü bir kelime var mı?'); console.log('Contains profanity:', hasProfanity); // Censor profanity in text const censored = filter.censor('Bu cümlede kötü kelime sansürlenecek.'); console.log('Censored text:', censored); // Get all profanity words found in text const badWords = filter.getWords('Burada birkaç kötü kelime olabilir.'); console.log('Found profanity words:', badWords); ``` ### ES6 Modules ```javascript import TurkishProfanityFilter from 'turkish-profanity-filter'; // Create a new filter instance const filter = new TurkishProfanityFilter(); // Using async/await with the filter const processChatMessage = async (message) => { // Check if message contains inappropriate content if (filter.check(message)) { // Censor the message const cleanMessage = filter.censor(message); return { original: message, censored: cleanMessage, containsProfanity: true }; } return { original: message, censored: message, containsProfanity: false }; }; // Using with modern JavaScript features const messages = [ 'Merhaba nasılsın?', 'Bu kötü bir mesaj.', 'Güzel bir gün!' ]; // Using array methods with the filter const processedMessages = messages.map(msg => ({ text: msg, isProfane: filter.check(msg), censored: filter.check(msg) ? filter.censor(msg) : msg })); console.log(processedMessages); ``` ## Advanced Usage ### Custom Configuration ```javascript // CommonJS const TurkishProfanityFilter = require('turkish-profanity-filter'); // ES6 // import TurkishProfanityFilter from 'turkish-profanity-filter'; // Initialize with custom options const filter = new TurkishProfanityFilter({ // Use your own word list instead of the default wordList: ['kötü', 'çirkin', 'küfür'], // Match only whole words (default: true) wholeWords: true, // Make matching case-sensitive (default: false) caseSensitive: false, // Custom replacement string (default: '***') replacement: '[sansürlendi]' }); ``` ### Modifying Word List Dynamically ```javascript // ES6 with destructuring and spread operator import TurkishProfanityFilter from 'turkish-profanity-filter'; const filter = new TurkishProfanityFilter(); // Add single word to the filter filter.addWords('yeni-kötü-kelime'); // Add multiple words at once with ES6 array const newBadWords = ['kelime1', 'kelime2', 'kelime3']; filter.addWords(newBadWords); // Remove a word from the filter filter.removeWords('artık-kötü-değil'); // Remove multiple words at once const wordsToRemove = ['temiz1', 'temiz2']; filter.removeWords(wordsToRemove); // Using the filter with ES6 string templates const userName = 'Ahmet'; const userMessage = 'Bu bir kötü mesajdır'; const processedMessage = filter.check(userMessage) ? `${userName}: ${filter.censor(userMessage)}` : `${userName}: ${userMessage}`; console.log(processedMessage); // "Ahmet: Bu bir *** mesajdır" ``` ### Integration with Express.js (ES6) ```javascript import express from 'express'; import TurkishProfanityFilter from 'turkish-profanity-filter'; const app = express(); const filter = new TurkishProfanityFilter(); app.use(express.json()); // Middleware to filter profanity in request body using arrow functions app.use((req, res, next) => { if (req.body?.content) { // Optional chaining operator (?.) - ES2020 // Check if content contains profanity if (filter.check(req.body.content)) { // Either reject the request // return res.status(400).json({ error: 'Content contains inappropriate language' }); // Or censor the content req.body.content = filter.censor(req.body.content); } } next(); }); // Using arrow functions app.post('/comments', (req, res) => { // req.body.content is now free of profanity // Save to database, etc. res.json({ success: true }); }); // Using async/await with Express app.get('/filter-stats', async (req, res) => { try { const stats = { wordListSize: filter.options.wordList.length, configuration: { caseSensitive: filter.options.caseSensitive, wholeWords: filter.options.wholeWords } }; res.json(stats); } catch (error) { res.status(500).json({ error: error.message }); } }); const PORT = process.env.PORT || 3000; app.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); ``` ## Special Handling for Turkish Characters This library is specifically designed to work with Turkish text and properly handles Turkish special characters (ğ, ü, ş, ı, ö, ç). The implementation uses a custom approach for word boundary detection that works better with non-ASCII characters than JavaScript's standard `\b` word boundary. ```javascript // ES6 example with Turkish characters import TurkishProfanityFilter from 'turkish-profanity-filter'; const filter = new TurkishProfanityFilter(); // These will all be properly detected (with default case-insensitive setting) console.log(filter.check('kötü')); // true console.log(filter.check('KÖTÜ')); // true console.log(filter.check('Çirkin')); // true console.log(filter.check('KÜFÜR')); // true // With whole word matching (default) console.log(filter.check('kötülük')); // false - only matches whole words // ES6 string methods with the filter const textSamples = [ 'Güzel bir gün', 'kötü bir söz', 'ÇIRKIN davranış', 'normal yazı' ]; // Using filter with array methods const results = textSamples .filter(text => filter.check(text)) .map(text => ({ original: text, censored: filter.censor(text) })); console.log(results); // [ // { original: 'kötü bir söz', censored: '*** bir söz' }, // { original: 'ÇIRKIN davranış', censored: '*** davranış' } // ] ``` ## API Reference ### Constructor `new TurkishProfanityFilter(options)` Creates a new filter instance with optional configuration. **Options:** | Option | Type | Default | Description | |--------|------|---------|-------------| | `wordList` | Array | Built-in list | Array of profanity words to detect | | `wholeWords` | Boolean | `true` | Whether to match only whole words | | `caseSensitive` | Boolean | `false` | Whether to match case-sensitively | | `replacement` | String | `'***'` | String to replace profanity with | ### Methods | Method | Parameters | Return Type | Description | |--------|------------|-------------|-------------| | `check(text)` | String | Boolean | Returns `true` if text contains profanity | | `censor(text)` | String | String | Returns text with profanity replaced by the replacement string | | `getWords(text)` | String | Array | Returns an array of all profanity words found in text | | `addWords(words)` | String or Array | void | Add one word (string) or multiple words (array) to the filter | | `removeWords(words)` | String or Array | void | Remove one word (string) or multiple words (array) from the filter | ## Performance Considerations For optimal performance, especially with large texts or high traffic applications: 1. **Cache Results**: If checking the same text repeatedly, cache the results 2. **Batch Processing**: When processing large volumes of text, consider batching 3. **Word List Size**: Larger word lists will impact performance; keep it optimized ## ES6 Performance Example ```javascript import TurkishProfanityFilter from 'turkish-profanity-filter'; // Create a memory cache using Map const cache = new Map(); const checkWithCache = (filter, text) => { // Return cached result if available if (cache.has(text)) { return cache.get(text); } // Calculate result and store in cache const result = filter.check(text); cache.set(text, result); return result; }; // Batch processing example const batchProcess = (filter, textArray) => { // Using Promise.all for parallel processing return Promise.all( textArray.map(async (text) => { // Process each text item return { original: text, containsProfanity: filter.check(text), censored: filter.check(text) ? filter.censor(text) : text }; }) ); }; // Usage const filter = new TurkishProfanityFilter(); const messages = ['Merhaba', 'kötü kelime', 'Nasılsın?']; // Process messages in batch batchProcess(filter, messages) .then(results => console.log(results)) .catch(error => console.error(error)); ``` ## Contributing Contributions are welcome! Here's how you can help: 1. **Fork the repository** 2. **Create a feature branch**: `git checkout -b feature/amazing-feature` 3. **Make your changes** 4. **Run tests**: `npm test` 5. **Commit your changes**: `git commit -m 'Add amazing feature'` 6. **Push to your branch**: `git push origin feature/amazing-feature` 7. **Open a Pull Request** ### Contribution Guidelines - Ensure all tests pass before submitting a PR - Add tests for new features - Follow the existing code style - Update documentation for any changes - Keep pull requests focused on a single feature/fix ## Word List Contributions When contributing to the word list: - Submit additions/removals as separate PRs - Include reasoning for additions/removals - Be mindful of cultural context and sensitivity ## Development ```bash # Clone the repository git clone https://github.com/derdogant/turkish-profanity-filter.git cd turkish-profanity-filter # Install dependencies npm install # Run tests npm test # Run the debug script npm run debug ``` ## License MIT