fuzzy-neon
Version:
Collection of fuzzy string matching algorithms written in rust
102 lines (96 loc) • 3.9 kB
Markdown
**fuzzy-neon:** Collection of fuzzy string matching algorithms written in rust
Created using [neon](https://neon-bindings.com/).
Requires `Node` and `Cargo` to install.
**Node**
In order to install fuzzy-neon you may need to first install `cargo-cp-artifact`:
```sh
$ npm i cargo-cp-artifcat
```
Installing the Module
```sh
$ npm i fuzzy-neon
```
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
```
* Hamming distance `hamming(str1, str2)`
* Levenshtein (recursive impl) `levenshtein(str1, str2)`
* Wagner-Fischer (dynamic programming impl of levenshtein) `wagnerFischer(str1, str2)`
* Damerau-Levenshtein `damerauLevenshtein(str1, str2)`
* Jaro-Winkler Distance `jaroWinkler(str1, str2)`
* Longest Common Subsequence `lcs(str1, str2)`
* n-gram based distance metric `ngram(str1, str2, ngramSize)`
* Jensen-Shannon Vector Distance `jensonshannonVector(str1, str2)`
Computes number of positions between two strings where characters differ.
Expanded to allow strings of different lengths
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
```
The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation **use Wagner-Fischer**
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.levenshtein('nick', 'nit');
console.log(score);
// 2
```
Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two.
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.wagnerFischer('nick', 'nit');
console.log(score);
// 2
```
Extension of the Levenshtein distance metric which allows for adjacent character transpositions
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.damerauLevenshtein('taco', 'atco');
console.log(score);
// 1
```
Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see [here](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1.
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jaroWinkler('nice', 'nick');
console.log(score);
// 0.11549999999999994
```
n-gram based string distance metric implemented based on the work from [this paper](http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf).
Extremely good at integrating context into producing the metric score, O(n^2) complexity.
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram
console.log(score);
// 0.16666666666666666
```
Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in [this paper](https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=wtJy4BEAAAAJ&sortby=pubdate&citation_for_view=wtJy4BEAAAAJ:EYYDruWGBe4C). O(n) complexity.
**Example**
```js
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jensonshannonVector('hi their', 'hi there');
console.log(score);
// 0.36638698518165513
```