document-tfidf

Version:

A TFIDF analysis package that allows for tokens of any word length

36 lines (30 loc) • 1.91 kB

Markdown

####Getting Started Install package with: ``` npm install document-tfidf ``` ####Features: * countTermFrequencies * storeTermFrequencies * normalizeTermFrequencies * identifyUniqueTerms * fullTFIDFAnalysis #### Documentation * Term Frequency - Inverse Document Frequency (TFIDF) Module: * countTermFrequencies: function(text [, options]) * Counts the number of times each token appears in the input text. * Current options include tokenLength, which dictates the number of words that comprise each token. tokenLength defaults to 1. * Depends on nGrams module, which can get all tokens with arbitrary length. * storeTermFrequencies: function(tokenSet, TFStorage) * Adds the tokenSet to the collectionStorage for improved analysis over time. * It’s recommended to save this collection in a persistent data store, although this is unnecessary. * If collectionStorage is not provided, it will create it as an object and return that object. * normalizeTermFrequencies: function(tokenSet, TFStorage) * For each token in tokenSet, normalizeTermFrequencies will divide its count by the total number found in TFStorage and return the token set with normalized counts. * identifyUniqueTerms: function(normalizedTokenSet [, options]) * From the input normalizedTokenSet, identifyUniqueTerms will return the most unique tokens, as defined by the highest TFIDF * Current options include uniqueThreshold. If specified, identifyUniqueTerms will return all terms with a TFIDF equal to or greater than the uniqueThreshold * fullTFIDAnalysis: function(text [, options]) * Completes all of the above TFIDF calculations * options correspond with the options for each piece of the analysis View the full specs and check out more text analysis in my [Text Analysis Suite](https://github.com/Syeoryn/textAnalysisSuite).