UNPKG

string-probability

Version:

A TypeScript library to calculate Jaro-Winkler distance and string similarity probabilities between two strings.

112 lines (71 loc) 3.81 kB
# string-probability **string-probability** is a TypeScript library for calculating the similarity between strings using probabilistic models based on edit distance. Unlike traditional string comparison methods, this library emphasizes probability-based similarity, providing a more nuanced measure of how closely two strings match. --- ## Features * Calculate **string similarity probability** between 0 (completely different) and 1 (identical). * Support for multiple probability models: * **Standard**: normalized inverse distance * **Alpha**: exponential decay for strict sensitivity * **Beta**: power/exponent-based sensitivity curve * Uses **Jaro-Winkler distance** internally for robust handling of transpositions and common prefixes. * Flexible configuration to tune sensitivity for your specific use case. --- ## Installation ```bash npm install string-probability # or yarn add string-probability # or bun add string-probability ``` --- ## Usage ```typescript import { probability } from "string-probability"; // Standard probability (default) const prob1 = probability("hello", "hello"); // ~1.0 const prob2 = probability("hello", "world"); // lower probability // Alpha mode: exponential decay sensitivity const prob3 = probability("test", "best", { mode: "alpha", value: 1.5 }); // Beta mode: power/exponent sensitivity const prob4 = probability("cat", "bat", { mode: "beta", value: 2.0 }); ``` --- ## API ### `probability(str1: string, str2: string, options?)` Calculates the similarity probability between two strings. **Parameters**: | Parameter | Type | Description | | --------- | ------------------------------------------------------------ | ----------------------------------------------------------- | | `str1` | `string` | First string | | `str2` | `string` | Second string | | `options` | `{ value?: number; mode?: "standard" \| "alpha" \| "beta" }` | Optional configuration for calculation mode and sensitivity | **Returns**: `number` a probability between 0 and 1. --- ### Probability Modes 1. **Standard (default)** `p = 1 / (1 + d / L)` * Balanced, normalized approach. * Intuitive probability values. 2. **Alpha (exponential decay)** `p = e^(-α * d)` * High α stricter matching * Low α more forgiving * Smooth probability degradation 3. **Beta (power/exponent)** `p = 1 - d^β` * β > 1 more forgiving * β < 1 stricter * β = 1 linear relationship > `d` is the Jaro-Winkler distance between strings, `L` is the maximum string length. --- ## Why Probability Over Direct Matching? Traditional string matching methods (e.g., exact equality or simple thresholds) are binary they only indicate whether strings are identical or “close enough.” Probabilistic approaches provide several advantages: 1. **Graded Similarity**: Probability values express the degree of similarity rather than a yes/no result. 2. **Robustness to Minor Differences**: Small typos, transpositions, or variations reduce the probability smoothly instead of failing outright. 3. **Custom Sensitivity**: Exponential and power models allow fine-tuning for strict or forgiving matching. 4. **Better Integration with Machine Learning**: Probability scores can be used directly in algorithms that require continuous similarity metrics. Using probability enables smarter decisions in search, matching, deduplication, and natural language applications. --- ## License MIT © 2025 Mohtasim Alam Sohom