UNPKG

fuzzy-neon

Version:

Collection of fuzzy string matching algorithms written in rust

102 lines (96 loc) 3.9 kB
# fuzzy-neon **fuzzy-neon:** Collection of fuzzy string matching algorithms written in rust Created using [neon](https://neon-bindings.com/). ## Installation Requires `Node` and `Cargo` to install. **Node** In order to install fuzzy-neon you may need to first install `cargo-cp-artifact`: ```sh $ npm i cargo-cp-artifcat ``` Installing the Module ```sh $ npm i fuzzy-neon ``` ## Usage ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.hamming('nick', 'nice'); console.log(score); // 1 ``` ## Available String Matching Metrics * Hamming distance `hamming(str1, str2)` * Levenshtein (recursive impl) `levenshtein(str1, str2)` * Wagner-Fischer (dynamic programming impl of levenshtein) `wagnerFischer(str1, str2)` * Damerau-Levenshtein `damerauLevenshtein(str1, str2)` * Jaro-Winkler Distance `jaroWinkler(str1, str2)` * Longest Common Subsequence `lcs(str1, str2)` * n-gram based distance metric `ngram(str1, str2, ngramSize)` * Jensen-Shannon Vector Distance `jensonshannonVector(str1, str2)` ## Algorithm Explanation/Useful Links #### Hamming Computes number of positions between two strings where characters differ. Expanded to allow strings of different lengths **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.hamming('nick', 'nice'); console.log(score); // 1 ``` #### Levenshtein The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation **use Wagner-Fischer** **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.levenshtein('nick', 'nit'); console.log(score); // 2 ``` #### Wagner-Fischer Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two. **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.wagnerFischer('nick', 'nit'); console.log(score); // 2 ``` #### Damerau-Levenshtein Extension of the Levenshtein distance metric which allows for adjacent character transpositions **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.damerauLevenshtein('taco', 'atco'); console.log(score); // 1 ``` #### Jaro Winkler Distance Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see [here](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1. **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.jaroWinkler('nice', 'nick'); console.log(score); // 0.11549999999999994 ``` #### N-gram Based distance metric n-gram based string distance metric implemented based on the work from [this paper](http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf). Extremely good at integrating context into producing the metric score, O(n^2) complexity. **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram console.log(score); // 0.16666666666666666 ``` ### Jensen Shannon Distance Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in [this paper](https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=wtJy4BEAAAAJ&sortby=pubdate&citation_for_view=wtJy4BEAAAAJ:EYYDruWGBe4C). O(n) complexity. **Example** ```js const fuzzy = require('fuzzy-neon'); let score = fuzzy.jensonshannonVector('hi their', 'hi there'); console.log(score); // 0.36638698518165513 ```