document-tfidf
Version:
A TFIDF analysis package that allows for tokens of any word length
36 lines (30 loc) • 1.91 kB
Markdown
####Getting Started
Install package with:
```
npm install document-tfidf
```
####Features:
* countTermFrequencies
* storeTermFrequencies
* normalizeTermFrequencies
* identifyUniqueTerms
* fullTFIDFAnalysis
#### Documentation
* Term Frequency - Inverse Document Frequency (TFIDF) Module:
* countTermFrequencies: function(text [, options])
* Counts the number of times each token appears in the input text.
* Current options include tokenLength, which dictates the number of words that comprise each token. tokenLength defaults to 1.
* Depends on nGrams module, which can get all tokens with arbitrary length.
* storeTermFrequencies: function(tokenSet, TFStorage)
* Adds the tokenSet to the collectionStorage for improved analysis over time.
* It’s recommended to save this collection in a persistent data store, although this is unnecessary.
* If collectionStorage is not provided, it will create it as an object and return that object.
* normalizeTermFrequencies: function(tokenSet, TFStorage)
* For each token in tokenSet, normalizeTermFrequencies will divide its count by the total number found in TFStorage and return the token set with normalized counts.
* identifyUniqueTerms: function(normalizedTokenSet [, options])
* From the input normalizedTokenSet, identifyUniqueTerms will return the most unique tokens, as defined by the highest TFIDF
* Current options include uniqueThreshold. If specified, identifyUniqueTerms will return all terms with a TFIDF equal to or greater than the uniqueThreshold
* fullTFIDAnalysis: function(text [, options])
* Completes all of the above TFIDF calculations
* options correspond with the options for each piece of the analysis
View the full specs and check out more text analysis in my [Text Analysis Suite](https://github.com/Syeoryn/textAnalysisSuite).