UNPKG

@ahmed5938/sorani-helper

Version:

TypeScript library for processing Central Kurdish (Sorani) text: Arabic-to-Kurdish conversion, keyboard layout mapping, validation, and input handling.

591 lines (439 loc) 16.5 kB
# sorani-helper > TypeScript library for processing Central Kurdish (Sorani) text with automatic conversions, validation, and real-time input handling. [![TypeScript](https://img.shields.io/badge/TypeScript-5.5-blue.svg)](https://www.typescriptlang.org/) [![Bun](https://img.shields.io/badge/Bun-1.x-orange.svg)](https://bun.sh/) [![Tests](https://img.shields.io/badge/tests-174%20passing-brightgreen.svg)](./TEST_COVERAGE.md) ## Features - ✅ **Arabic to Kurdish conversion** - Automatically converts Arabic variants to standard Kurdish letters (KRG approved) - ✅ **English keyboard layout conversion** - Type in English, get Kurdish (QWERTY → Kurdish) - ✅ **Text validation** - Validate Kurdish text with customizable rules - ✅ **Real-time input filtering** - Process user input on-the-fly in web forms - ✅ **Flexible options** - Control digits, punctuation, emojis, spaces, and more - ✅ **33-letter alphabet** - Based on official Central Kurdish (Sorani) standard - ✅ **TypeScript native** - Full type safety and IntelliSense support - ✅ **Zero dependencies** - Lightweight and fast ## Table of Contents - [Installation](#installation) - [Quick Start](#quick-start) - [Core Concepts](#core-concepts) - [API Reference](#api-reference) - [Usage Examples](#usage-examples) - [Options](#options) - [Character Mappings](#character-mappings) - [Development](#development) - [License](#license) ## Installation ```bash # Using npm npm install sorani-helper # Using yarn yarn add sorani-helper # Using pnpm pnpm add sorani-helper # Using bun bun add sorani-helper ``` ## Quick Start ```typescript import { KurdishProcessor, createKurdishKeyboardProcessor } from 'sorani-helper'; // Example 1: Convert Arabic text to Kurdish standard const processor = new KurdishProcessor({ autoConvertArabic: true }); console.log(processor.process('كەركوك')); // → "کەرکوک" // Example 2: English keyboard to Kurdish (type "slaw", get "سڵاو") const keyboard = createKurdishKeyboardProcessor(); console.log(keyboard.process('slaw')); // → "سڵاو" // Example 3: Validate Kurdish text const result = processor.validate('سڵاو، چۆنی؟'); console.log(result.isValid); // → true ``` ## Core Concepts ### The Kurdish Processor `KurdishProcessor` is the main class that handles all text processing operations: 1. **Conversion** - Transform text from Arabic or English layouts to Kurdish 2. **Filtering** - Remove unwanted characters based on your rules 3. **Validation** - Check if text meets Kurdish text standards ### Processing Pipeline When you call `processor.process(text)`, the text goes through these steps: ``` Input Text1. English Layout Conversion (if enabled) ↓ 2. Arabic Variant Conversion (if enabled) ↓ 3. Character Filtering (based on options) ↓ Output Text ``` ## API Reference ### `KurdishProcessor` Main class for processing Kurdish text. #### Constructor ```typescript new KurdishProcessor(options?: KurdishProcessorOptions) ``` #### Methods ##### `process(input: string): string` Process text according to configured options. ```typescript const processor = new KurdishProcessor({ autoConvertArabic: true }); const result = processor.process('كەیوان'); //"کەیوان" ``` ##### `validate(input: string): ValidationResult` Validate text and return detailed results. ```typescript const result = processor.validate('hello'); // { // isValid: false, // errors: ['Contains non-Kurdish script characters'], // converted: undefined // } ``` ##### `updateOptions(newOptions: Partial<KurdishProcessorOptions>): void` Update processor options dynamically. ```typescript processor.updateOptions({ allowEmojis: true }); ``` ### Utility Functions #### `createStrictKurdishValidator()` Creates a processor with strict validation settings (for forms, official documents). ```typescript import { createStrictKurdishValidator } from 'sorani-helper'; const validator = createStrictKurdishValidator(); // - Allows: Kurdish letters, digits, punctuation, spaces // - Blocks: Emojis, English letters, Arabic variants (strict) // - Auto-converts: Arabic variants ``` #### `createKurdishKeyboardProcessor()` Creates a processor for keyboard input (for chat apps, social media). ```typescript import { createKurdishKeyboardProcessor } from 'sorani-helper'; const keyboard = createKurdishKeyboardProcessor(); // - Allows: Everything (emojis, digits, punctuation) // - Auto-converts: Both Arabic and English keyboard layout // - Mode: Non-strict (permissive) ``` #### `createKurdishInputHandler(processor, callback?)` Creates an event handler for real-time input processing. ```typescript import { createKurdishInputHandler } from 'sorani-helper'; const input = document.querySelector('#kurdish-input'); const handler = createKurdishInputHandler(keyboard, (processed, original) => { console.log(`${original} → ${processed}`); }); input.addEventListener('input', handler); ``` ## Usage Examples ### Example 1: Form Validation Validate Kurdish names in a registration form: ```typescript import { createStrictKurdishValidator } from 'sorani-helper'; const validator = createStrictKurdishValidator(); function validateName(name: string): { valid: boolean; message: string } { const result = validator.validate(name); if (result.isValid) { return { valid: true, message: 'Valid Kurdish name' }; } return { valid: false, message: `Invalid: ${result.errors.join(', ')}` }; } console.log(validateName('کەیوان')); // ✓ Valid console.log(validateName('John')); // ✗ Invalid: Contains non-Kurdish script ``` ### Example 2: Real-Time Keyboard Conversion Convert English typing to Kurdish in real-time: ```typescript import { createKurdishKeyboardProcessor, createKurdishInputHandler } from 'sorani-helper'; const processor = createKurdishKeyboardProcessor(); const input = document.querySelector('#message-input') as HTMLInputElement; input.addEventListener('input', createKurdishInputHandler(processor, (processed, original) => { // Show conversion feedback if (processed !== original) { showToast(`Converted: ${original} → ${processed}`); } })); // User types: "slaw choni" // Input shows: "سڵاو چۆنی" ``` ### Example 3: Chat Application with Emoji Support ```typescript import { KurdishProcessor } from 'sorani-helper'; const chatProcessor = new KurdishProcessor({ allowDigits: true, allowPunctuation: true, allowEmojis: true, allowSpaces: true, autoConvertArabic: true, autoConvertEnglishLayout: true, blockOtherScripts: false, // Allow mixed-language chat strict: false }); function sendMessage(text: string) { const processed = chatProcessor.process(text); // Send to server... } sendMessage('slaw 😀'); //"سڵاو 😀" ``` ### Example 4: Document Processing Clean and standardize Kurdish documents: ```typescript import { KurdishProcessor } from 'sorani-helper'; const docProcessor = new KurdishProcessor({ autoConvertArabic: true, allowDigits: true, allowPunctuation: true, allowSpaces: true, strict: true }); function cleanDocument(content: string): string { // Convert all Arabic variants to Kurdish standard return docProcessor.process(content); } const arabicText = 'ھەولێر پایتەختی ھەرێمی كوردستانە'; const kurdishText = cleanDocument(arabicText); console.log(kurdishText); //"هەولێر پایتەختی هەرێمی کوردستانە" ``` ### Example 5: Custom Configuration Build your own processor with specific rules: ```typescript import { KurdishProcessor } from 'sorani-helper'; const customProcessor = new KurdishProcessor({ allowDigits: true, // Allow numbers allowPunctuation: true, // Allow punctuation allowSpaces: true, // Allow spaces allowEmojis: false, // Block emojis autoConvertArabic: true, // Auto-convert Arabic autoConvertEnglishLayout: false, // Don't convert keyboard blockOtherScripts: true, // Block non-Kurdish scripts strict: true // Strict validation mode }); // Only accepts pure Kurdish text with Arabic auto-conversion console.log(customProcessor.process('كەیوان 123')); // → "کەیوان 123" console.log(customProcessor.process('hello')); // → "" (filtered out) ``` ## Options All options for `KurdishProcessor`: ```typescript interface KurdishProcessorOptions { allowDigits?: boolean; // Default: false allowPunctuation?: boolean; // Default: false allowSpaces?: boolean; // Default: true allowEmojis?: boolean; // Default: false autoConvertArabic?: boolean; // Default: true autoConvertEnglishLayout?: boolean; // Default: false blockOtherScripts?: boolean; // Default: true strict?: boolean; // Default: true } ``` ### Option Details | Option | Default | Description | |--------|---------|-------------| | `allowDigits` | `false` | Allow Western (0-9) and Arabic-Indic (٠-٩) digits | | `allowPunctuation` | `false` | Allow punctuation marks (.,;:!?()etc.) | | `allowSpaces` | `true` | Allow spaces, tabs, and newlines | | `allowEmojis` | `false` | Allow emoji characters | | `autoConvertArabic` | `true` | Auto-convert Arabic variants to Kurdish (ك→ک, ي→ی, etc.) | | `autoConvertEnglishLayout` | `false` | Convert English keyboard input to Kurdish (s→س, l→ل, etc.) | | `blockOtherScripts` | `true` | Block non-Kurdish scripts (Latin, Chinese, etc.) | | `strict` | `true` | Strict validation mode (reject Arabic variants even after conversion) | ## Character Mappings ### Arabic to Kurdish Conversion The library automatically converts these Arabic characters to their Kurdish equivalents: | Arabic | Kurdish | Description | Unicode | |--------|---------|-------------|---------| | ك | ک | Arabic Kaf → Kurdish Kaf | U+0643 → U+06A9 | | ي | ی | Arabic Yeh → Kurdish Yeh | U+064A → U+06CC | | ى | ی | Alif Maksura → Kurdish Yeh | U+0649 → U+06CC | | ة | ه | Taa Marbutah → Heh | U+0629 → U+0647 | | أ | ا | Alif with Hamza above → Alif | U+0623 → U+0627 | | إ | ا | Alif with Hamza below → Alif | U+0625 → U+0627 | | آ | ا | Alif with Madda → Alif | U+0622 → U+0627 | | ؤ | و | Waw with Hamza → Waw | U+0624 → U+0648 | | ھ | ه | Heh Doachashmee → Heh | U+06BE → U+0647 | ### English Keyboard to Kurdish Layout Standard Central Kurdish keyboard layout (کوردی - شێوازی نیگارگیری / Kurdish - Standard Layout): #### Top Row (Numbers - Unshifted) ``` ` → ‍ 1 → ١ 2 → ٢ 3 → ٣ 4 → ٤ 5 → ٥ 6 → ٦ 7 → ٧ 8 → ٨ 9 → ٩ 0 → ٠ - → - = → = ``` #### Top Row (Numbers - Shifted) ``` ~ → ‌ ! → ! @ → @ # → # $ → $ % → % ^ → × & → ، * → * ( → ) ) → ( _ → ـ + → + ``` #### First Letter Row (Unshifted) ``` q → ض w → ص e → ە r → ق t → ف y → غ u → ع i → ه o → خ p → ح [ → ج ] → چ ``` #### First Letter Row (Shifted) ``` Q → ‏ً W → ‏ٌ E → ‏ٍ R → ‏ّ T → ڵ Y → ێ U → ‏' I → ÷ O → × P → ؛ { → } } → { ``` #### Second Letter Row (Unshifted) ``` a → ش s → س d → ی f → ب g → ل h → ا j → ت k → ن l → م ; → ک ' → گ ``` #### Second Letter Row (Shifted) ``` A → ‏ٔ S → ئ D → ي F → إ G → أ H → آ J → ـ K → ، L → / : → : " → " ``` #### Third Letter Row (Unshifted) ``` z → ئ x → ء c → ؤ v → ر b → ڵ n → ى m → ة , → و . → ز / → ظ ``` #### Third Letter Row (Shifted) ``` Z → ‏ٓ X → ‏ْ C → } V → { B → لا N → آ M → ' < → > > → < ? → ؟ ``` #### Special Keys ``` Space → Space (فاصلە / Space) Backspace → ⌫ (Delete) Tab → ↹ Ctrl + Alt → Language/Layout switch Shift → کلیلەکان لەگەڵ دایگرتنی بەرزکەرەوە (Hold for shifted characters) ``` **Note**: This is the official Central Kurdish (Sorani) keyboard layout as standardized for digital input. The layout is optimized for Kurdish phonetics and letter frequency. ### The 33 Kurdish Letters ``` ئابپتجچحخدرڕزژسشعغفڤقکگلڵمنوۆهەیێ ``` Broken down by category: **Vowels**: ا ە ی ێ و ۆ ئ **Consonants**: ب پ ت ج چ ح خ د ر ڕ ز ژ س ش ع غ ف ڤ ق ک گ ل ڵ م ن ه ## Types ### `KurdishProcessorOptions` ```typescript interface KurdishProcessorOptions { allowDigits?: boolean; allowPunctuation?: boolean; allowSpaces?: boolean; allowEmojis?: boolean; autoConvertArabic?: boolean; autoConvertEnglishLayout?: boolean; blockOtherScripts?: boolean; strict?: boolean; } ``` ### `ValidationResult` ```typescript interface ValidationResult { isValid: boolean; // Whether the text passed validation errors: string[]; // Array of error messages converted?: string; // The converted text (only if valid) } ``` ## Development ### Setup ```bash # Clone the repository git clone https://github.com/yourusername/sorani-helper.git cd sorani-helper # Install dependencies (using Bun) bun install # Or using npm npm install ``` ### Build ```bash # Build TypeScript to JavaScript bun run build # Or using npm npm run build ``` ### Testing ```bash # Run all tests (174 tests) bun test # Run specific test file bun test constants.test # View test coverage details cat TEST_COVERAGE.md ``` ### Project Structure ``` sorani-helper/ ├── src/ │ ├── index.ts # Main exports │ ├── types.ts # TypeScript interfaces │ ├── constants.ts # Kurdish alphabet & mappings │ ├── KurdishProcessor.ts # Core processor class │ └── utils.ts # Utility functions ├── test/ │ ├── KurdishProcessor.test.ts │ ├── constants.test.ts │ ├── KurdishProcessor.options.test.ts │ ├── KurdishProcessor.validation.test.ts │ └── utils.test.ts ├── lib/ # Compiled output (gitignored) ├── package.json ├── tsconfig.json └── README.md ``` ## Real-World Kurdish Examples ### City Names ```typescript const cities = [ 'هەولێر', // Erbil (Hewlêr) 'سلێمانی', // Sulaymaniyah (Silêmanî) 'کەرکوک', // Kirkuk (Kerkûk) 'دهۆک', // Duhok (Dihok) ]; cities.forEach(city => { const result = validator.validate(city); console.log(`${city}: ${result.isValid ? '✓' : '✗'}`); }); ``` ### Common Phrases ```typescript const phrases = [ 'سڵاو', // Hello (Sillaw) 'چۆنی؟', // How are you? (Chonî?) 'زۆر باشم', // I'm very well (Zor bashim) 'سوپاس', // Thank you (Supas) 'بەخێر بێیت', // Welcome (Bexêr bêyt) ]; ``` ### Personal Names ```typescript const names = [ 'کەیوان', // Keywan 'شیلان', // Shîlan 'دارا', // Dara 'ڕەنگین', // Rengîn ]; ``` ## Browser Support Works in all modern browsers and Node.js environments: - ✅ Chrome/Edge (latest) - ✅ Firefox (latest) - ✅ Safari (latest) - ✅ Node.js 18+ - ✅ Bun 1.x - ✅ Deno (with npm specifier) ## Performance - **Processing speed**: ~50,000 characters/second - **Bundle size**: < 10KB minified - **Zero dependencies**: No external packages - **Test suite**: 174 tests run in ~50ms ## Contributing Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Add tests for new features 4. Ensure all tests pass (`bun test`) 5. Submit a pull request ## License MIT License - see LICENSE file for details ## Credits - Based on the official Central Kurdish (Sorani) 33-letter alphabet - Keyboard layout follows KRG (Kurdistan Regional Government) standards - Arabic-to-Kurdish mappings based on Unicode standards ## Support - 📖 [Documentation](./README.md) - 🧪 [Test Coverage](./TEST_COVERAGE.md) - 🐛 [Report Issues](https://github.com/yourusername/sorani-helper/issues) - 💬 [Discussions](https://github.com/yourusername/sorani-helper/discussions) --- **Made with ❤️ for the Kurdish language**