string-probability
Version:
A TypeScript library to calculate Jaro-Winkler distance and string similarity probabilities between two strings.
112 lines (71 loc) • 3.81 kB
Markdown
# string-probability
**string-probability** is a TypeScript library for calculating the similarity between strings using probabilistic models based on edit distance. Unlike traditional string comparison methods, this library emphasizes probability-based similarity, providing a more nuanced measure of how closely two strings match.
## Features
* Calculate **string similarity probability** between 0 (completely different) and 1 (identical).
* Support for multiple probability models:
* **Standard**: normalized inverse distance
* **Alpha**: exponential decay for strict sensitivity
* **Beta**: power/exponent-based sensitivity curve
* Uses **Jaro-Winkler distance** internally for robust handling of transpositions and common prefixes.
* Flexible configuration to tune sensitivity for your specific use case.
## Installation
```bash
npm install string-probability
# or
yarn add string-probability
# or
bun add string-probability
```
## Usage
```typescript
import { probability } from "string-probability";
// Standard probability (default)
const prob1 = probability("hello", "hello"); // ~1.0
const prob2 = probability("hello", "world"); // lower probability
// Alpha mode: exponential decay sensitivity
const prob3 = probability("test", "best", { mode: "alpha", value: 1.5 });
// Beta mode: power/exponent sensitivity
const prob4 = probability("cat", "bat", { mode: "beta", value: 2.0 });
```
## API
### `probability(str1: string, str2: string, options?)`
Calculates the similarity probability between two strings.
**Parameters**:
| Parameter | Type | Description |
| --------- | ------------------------------------------------------------ | ----------------------------------------------------------- |
| `str1` | `string` | First string |
| `str2` | `string` | Second string |
| `options` | `{ value?: number; mode?: "standard" \| "alpha" \| "beta" }` | Optional configuration for calculation mode and sensitivity |
**Returns**: `number` — a probability between 0 and 1.
### Probability Modes
1. **Standard (default)**
`p = 1 / (1 + d / L)`
* Balanced, normalized approach.
* Intuitive probability values.
2. **Alpha (exponential decay)**
`p = e^(-α * d)`
* High α → stricter matching
* Low α → more forgiving
* Smooth probability degradation
3. **Beta (power/exponent)**
`p = 1 - d^β`
* β > 1 → more forgiving
* β < 1 → stricter
* β = 1 → linear relationship
> `d` is the Jaro-Winkler distance between strings, `L` is the maximum string length.
## Why Probability Over Direct Matching?
Traditional string matching methods (e.g., exact equality or simple thresholds) are binary — they only indicate whether strings are identical or “close enough.” Probabilistic approaches provide several advantages:
1. **Graded Similarity**: Probability values express the degree of similarity rather than a yes/no result.
2. **Robustness to Minor Differences**: Small typos, transpositions, or variations reduce the probability smoothly instead of failing outright.
3. **Custom Sensitivity**: Exponential and power models allow fine-tuning for strict or forgiving matching.
4. **Better Integration with Machine Learning**: Probability scores can be used directly in algorithms that require continuous similarity metrics.
Using probability enables smarter decisions in search, matching, deduplication, and natural language applications.
## License
MIT © 2025 Mohtasim Alam Sohom