UNPKG

string-similarity-plus

Version:

String similarity calculation with enhanced special character normalization

71 lines (54 loc) 2.33 kB
# string-similarity-plus A robust string similarity calculator that handles various special characters and Unicode variations. ## Features - Calculate similarity percentage between two strings - Normalize special characters (quotes, dashes, spaces, etc.) - Find similar strings in an array based on a threshold - Works with multilingual text including CJK characters ## Installation ```bash npm install string-similarity-plus ``` ## Usage ```javascript const { calculateStringSimilarity, findSimilarStrings } = require('string-similarity-plus'); // Calculate similarity between two strings const str1 = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>"; const str2 = "<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>"; const similarity = calculateStringSimilarity(str1, str2); console.log(Similarity: ${similarity.toFixed(2)}%); // Should show very high similarity // Find similar strings in an array const content = [ "<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>", "<h2>Some other content</h2>", "<h2>無限供應火鍋放題牛摩</h2>", ]; const searchString = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>"; const SIMILARITY_THRESHOLD = 80; // Set your desired similarity threshold const matches = findSimilarStrings(searchString, content, SIMILARITY_THRESHOLD); console.log(matches); // Will show matching items ``` ## API ### calculateStringSimilarity(str1, str2) Calculates the similarity percentage between two strings. - **Parameters**: - `str1` (string): First string to compare - `str2` (string): Second string to compare - **Returns**: Number between 0-100 representing similarity percentage ### findSimilarStrings(searchString, contentArray, threshold) Finds strings in an array that are similar to the search string. - **Parameters**: - `searchString` (string): String to search for - `contentArray` (array): Array of strings to search in - `threshold` (number, optional): Similarity threshold percentage (default: 80) - **Returns**: Array of matching strings ## Special Character Handling This library normalizes various special characters including: - Different types of quotes and apostrophes - Various dashes and hyphens - Different space characters - Various brackets and parentheses - Different types of dots, ellipses, and slashes - And more... ## License MIT