compromise
Version:
modest natural language processing
831 lines (678 loc) • 61.5 kB
Markdown
<div align="center">
<img height="15px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<div><b>compromise</b></div>
<img src="https://user-images.githubusercontent.com/399657/68222691-6597f180-ffb9-11e9-8a32-a7f38aa8bded.png"/>
<div>modest natural language processing</div>
<div><code>npm install compromise</code></div>
<div align="center">
<sub>
by
<a href="https://spencermounta.in/">Spencer Kelly</a> and
<a href="https://github.com/spencermountain/compromise/graphs/contributors">
many contributors
</a>
</sub>
</div>
<img height="22px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
</div>
<div align="center">
<div>
<a href="https://npmjs.org/package/compromise">
<img src="https://img.shields.io/npm/v/compromise.svg?style=flat-square" />
</a>
<a href="https://codecov.io/gh/spencermountain/compromise">
<img src="https://codecov.io/gh/spencermountain/compromise/branch/master/graph/badge.svg" />
</a>
<a href="https://bundlephobia.com/result?p=compromise">
<img src="https://img.shields.io/bundlephobia/min/compromise"/>
<!-- <img src="https://badge-size.herokuapp.com/spencermountain/compromise/master/builds/compromise.min.js" /> -->
</a>
</div>
<div align="center">
<sub>
<a href="https://github.com/nlp-compromise/fr-compromise">french</a> • <a href="https://github.com/nlp-compromise/de-compromise">german</a> • <a href="https://github.com/nlp-compromise/it-compromise">italian</a> • <a href="https://github.com/nlp-compromise/es-compromise">spanish</a>
</sub>
</div>
</div>
<!-- spacer -->
<img height="25px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<div align="left">
don't you find it strange,
<br/>
<ul>
<img height="2px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<sub>how easy <b>text</b> is to <b>make</b>,</sub>
<br/>
<img height="2px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<i>↬<sub>ᔐᖜ</sub><b>↬</b></i> <sub></sub>
and how hard it is to actually <b>parse</b> and <i>use</i>?
</ul>
</div>
<!-- spacer -->
<img height="45px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<div align="left">
<img height="10px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>compromise <i><a href="https://observablehq.com/@spencermountain/compromise-justification">tries its best</a></i> to turn text into data.
<br/>
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>it makes limited and sensible decisions.
<br/>
<sub >
<img height="15px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/> it's not as smart as you'd think.
</sub>
<img height="45px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<!--
it is
<a href="https://docs.compromise.cool/compromise-filesize">small,
<a href="https://docs.compromise.cool/compromise-performance">quick</a>,
and often <i><a href="https://docs.compromise.cool/compromise-accuracy">good-enough</a></i>.
<br/> -->
</div>
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
```js
import nlp from 'compromise'
let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
```
<!-- spacer -->
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<div align="left">
<i>don't be fancy, at all:</i>
</div>
```js
if (doc.has('simon says #Verb')) {
return true
}
```
<!-- spacer -->
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221814-05ed1680-ffb8-11e9-8b6b-c7528d163871.png"/>
</div>
<div align="left">
<i>grab parts of the text:</i>
</div>
```js
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-match">match docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221837-0d142480-ffb8-11e9-9d30-90669f1b897c.png"/>
</div>
<!-- spacer -->
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<i>and get data:</i>
```js
import plg from 'compromise-speech'
nlp.extend(plg)
let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
"text": "Milwaukee",
"terms": [{
"normal": "milwaukee",
"syllables": ["mil", "wau", "kee"]
}]
}]
*/
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-json">json docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221814-05ed1680-ffb8-11e9-8b6b-c7528d163871.png"/>
</div>
<!-- spacer -->
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
avoid the problems of brittle parsers:
```js
let doc = nlp("we're not gonna take it..")
doc.has('gonna') // true
doc.has('going to') // true (implicit)
// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-contractions">contraction docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221814-05ed1680-ffb8-11e9-8b6b-c7528d163871.png"/>
</div>
<!-- spacer -->
<img height="30" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
and whip stuff around like it's data:
```js
let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-values">number docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221837-0d142480-ffb8-11e9-9d30-90669f1b897c.png"/>
</div>
<!-- spacer -->
<img height="30" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<sub>-because it actually is-</sub>
```js
let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'
```
<div align="right">
<a href="https://docs.compromise.cool/nouns">noun docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221731-e8b84800-ffb7-11e9-8453-6395e0e903fa.png"/>
</div>
<!-- spacer -->
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
Use it on the client-side:
```html
<script src="https://unpkg.com/compromise"></script>
<script>
var doc = nlp('two bottles of beer')
doc.numbers().minus(1)
document.body.innerHTML = doc.text()
// 'one bottle of beer'
</script>
```
or likewise:
```typescript
import nlp from 'compromise'
var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'
```
<img height="75px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<!--
bragging graphs
-->
<!-- spacer -->
<img height="30" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
compromise is **~250kb** (minified):
<div align="center">
<!-- filesize -->
<a href="https://bundlephobia.com/result?p=compromise">
<img width="600" src="https://user-images.githubusercontent.com/399657/68234819-14dfc300-ffd0-11e9-8b30-cb8545707b29.png"/>
</a>
</div>
it's pretty fast. It can run on keypress:
<div align="center">
<a href="https://observablehq.com/@spencermountain/compromise-performance">
<img width="600" src="https://user-images.githubusercontent.com/399657/159795115-ed62440a-be41-424c-baa4-8dd15c48377d.png"/>
</a>
</div>
it works mainly by <a href="https://observablehq.com/@spencermountain/verbs">conjugating all forms</a> of a basic word list.
The final lexicon is <a href="https://observablehq.com/@spencermountain/compromise-lexicon">~14,000 words</a>:
<div align="center">
<img width="600" src="https://user-images.githubusercontent.com/399657/68234805-0d201e80-ffd0-11e9-8dc6-f7a600352555.png"/>
</div>
you can read more about how it works, [here](https://observablehq.com/@spencermountain/compromise-internals). it's weird.
<!-- spacer -->
<img height="75px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<!--
one/two/three parts
-->
<p align="left">
<sub>okay -</sub>
<h1>
<code>compromise/one</code>
</h1>
<p align="center">A <code>tokenizer</code> of words, sentences, and punctuation.</p>
<img height="15px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<p>
```js
import nlp from 'compromise/one'
let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
normal:"wayne's world party time",
terms:[{ text: "Wayne's", normal: "wayne" },
...
]
}]
*/
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-tokenization">tokenizer docs</a>
</div>
<b>compromise/one</b> splits your text up, wraps it in a handy API,
<ul>
<sub>and does nothing else -</sub>
</ul>
<img height="25px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<b>/one</b> is quick - most sentences take a 10th of a millisecond.
It can do <b>~1mb</b> of text a second - or 10 wikipedia pages.
<i>Infinite jest</i> takes 3s.
<div align="right">
You can also parallelize, or stream text to it with <a href="https://github.com/spencermountain/compromise/tree/master/plugins/speed">compromise-speed</a>.
</div>
<!-- spacer -->
<img height="60px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<!-- two -->
<p align="center">
<h1 align="left">
<code>compromise/two</code>
</h1>
<p align="center">A <code>part-of-speech</code> tagger, and grammar-interpreter.</p>
<img height="15px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<p>
```js
import nlp from 'compromise/two'
let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-tagger">tagger docs</a>
</div>
<p>
<img height="25px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
</p>
<b>compromise/two</b> automatically calculates the very basic grammar of each word.
<sub>this is more useful than people sometimes realize.</sub>
Light grammar helps you write cleaner templates, and get closer to the information.
<!-- Part-of-speech tagging is profoundly-difficult task to get 100% on. It is also a profoundly easy task to get 85% on. -->
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
compromise has <b>83 tags</b>, arranged in <a href="https://observablehq.com/@spencermountain/compromise-tags">a handsome graph</a>.
<b>#FirstName</b> → <b>#Person</b> → <b>#ProperNoun</b> → <b>#Noun</b>
you can see the grammar of each word by running `doc.debug()`
you can see the reasoning for each tag with `nlp.verbose('tagger')`.
if you prefer <a href="https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"><i>Penn tags</i></a>, you can derive them with:
```js
let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()
```
<img height="60px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<!-- three -->
<p align="center">
<h1 align="left">
<code>compromise/three</code>
</h1>
<p align="center"><code>Phrase</code> and sentence tooling.</p>
<img height="15px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
<p>
```js
import nlp from 'compromise/three'
let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-selections">selection docs</a>
</div>
<b>compromise/three</b> is a set of tooling to <i>zoom into</i> and operate on parts of a text.
`.numbers()` grabs all the numbers in a document, for example - and extends it with new methods, like `.subtract()`.
When you have a phrase, or group of words, you can see additional metadata about it with `.json()`
```js
let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
text: 'four out of five',
terms: [ [Object], [Object], [Object], [Object] ],
fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
}
]*/
```
```js
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
text: '$4.09CAD',
terms: [ [Object] ],
number: { prefix: '$', num: 4.09, suffix: 'cad'}
}
]*/
```
<img height="80px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
## API
### Compromise/one
##### Output
- **[.text()](https://observablehq.com/@spencermountain/compromise-text)** - return the document as text
- **[.json()](https://observablehq.com/@spencermountain/compromise-json)** - return the document as data
- **[.debug()](https://observablehq.com/@spencermountain/compromise-output)** - pretty-print the interpreted document
- **[.out()](https://observablehq.com/@spencermountain/compromise-output)** - a named or custom output
- **[.html({})](https://observablehq.com/@spencermountain/compromise-html)** - output custom html tags for matches
- **[.wrap({})](https://observablehq.com/@spencermountain/compromise-output)** - produce custom output for document matches
##### Utils
- **[.found](https://observablehq.com/@spencermountain/compromise-utils)** _[getter]_ - is this document empty?
- **[.docs](https://observablehq.com/@spencermountain/compromise-utils)** _[getter]_ get term objects as json
- **[.length](https://observablehq.com/@spencermountain/compromise-utils)** _[getter]_ - count the # of characters in the document (string length)
- **[.isView](https://observablehq.com/@spencermountain/compromise-utils)** _[getter]_ - identify a compromise object
- **[.compute()](https://observablehq.com/@spencermountain/compromise-compute)** - run a named analysis on the document
- **[.clone()](https://observablehq.com/@spencermountain/compromise-utils)** - deep-copy the document, so that no references remain
- **[.termList()](https://observablehq.com/@spencermountain/compromise-accessors)** - return a flat list of all Term objects in match
- **[.cache({})](https://observablehq.com/@spencermountain/compromise-cache)** - freeze the current state of the document, for speed-purposes
- **[.uncache()](https://observablehq.com/@spencermountain/compromise-cache)** - un-freezes the current state of the document, so it may be transformed
- **[.freeze({})](https://observablehq.com/@spencermountain/compromise-freeze)** - prevent any tags from being removed, in these terms
- **[.unfreeze({})](https://observablehq.com/@spencermountain/compromise-freeze)** - allow tags to change again, as default
##### Accessors
- **[.all()](https://observablehq.com/@spencermountain/compromise-utils)** - return the whole original document ('zoom out')
- **[.terms()](https://observablehq.com/@spencermountain/compromise-selections)** - split-up results by each individual term
- **[.first(n)](https://observablehq.com/@spencermountain/compromise-accessors)** - use only the first result(s)
- **[.last(n)](https://observablehq.com/@spencermountain/compromise-accessors)** - use only the last result(s)
- **[.slice(n,n)](https://observablehq.com/@spencermountain/compromise-accessors)** - grab a subset of the results
- **[.eq(n)](https://observablehq.com/@spencermountain/compromise-accessors)** - use only the nth result
- **[.firstTerms()](https://observablehq.com/@spencermountain/compromise-accessors)** - get the first word in each match
- **[.lastTerms()](https://observablehq.com/@spencermountain/compromise-accessors)** - get the end word in each match
- **[.fullSentences()](https://observablehq.com/@spencermountain/compromise-accessors)** - get the whole sentence for each match
- **[.groups()](https://observablehq.com/@spencermountain/compromise-accessors)** - grab any named capture-groups from a match
- **[.wordCount()](https://observablehq.com/@spencermountain/compromise-utils)** - count the # of terms in the document
- **[.confidence()](https://observablehq.com/@spencermountain/compromise-utils)** - an average score for pos tag interpretations
##### Match
_(match methods use the [match-syntax](https://docs.compromise.cool/compromise-match-syntax).)_
- **[.match('')](https://observablehq.com/@spencermountain/compromise-match)** - return a new Doc, with this one as a parent
- **[.not('')](https://observablehq.com/@spencermountain/compromise-match)** - return all results except for this
- **[.matchOne('')](https://observablehq.com/@spencermountain/compromise-match)** - return only the first match
- **[.if('')](https://observablehq.com/@spencermountain/compromise-match)** - return each current phrase, only if it contains this match ('only')
- **[.ifNo('')](https://observablehq.com/@spencermountain/compromise-match)** - Filter-out any current phrases that have this match ('notIf')
- **[.has('')](https://observablehq.com/@spencermountain/compromise-match)** - Return a boolean if this match exists
- **[.before('')](https://observablehq.com/@spencermountain/compromise-match)** - return all terms before a match, in each phrase
- **[.after('')](https://observablehq.com/@spencermountain/compromise-match)** - return all terms after a match, in each phrase
- **[.union()](https://observablehq.com/@spencermountain/compromise-set)** - return combined matches without duplicates
- **[.intersection()](https://observablehq.com/@spencermountain/compromise-set)** - return only duplicate matches
- **[.complement()](https://observablehq.com/@spencermountain/compromise-set)** - get everything not in another match
- **[.settle()](https://observablehq.com/@spencermountain/compromise-set)** - remove overlaps from matches
- **[.growRight('')](https://observablehq.com/@spencermountain/compromise-match)** - add any matching terms immediately after each match
- **[.growLeft('')](https://observablehq.com/@spencermountain/compromise-match)** - add any matching terms immediately before each match
- **[.grow('')](https://observablehq.com/@spencermountain/compromise-match)** - add any matching terms before or after each match
- **[.sweep(net)](https://observablehq.com/@spencermountain/compromise-sweep)** - apply a series of match objects to the document
- **[.splitOn('')](https://observablehq.com/@spencermountain/compromise-split)** - return a Document with three parts for every match ('splitOn')
- **[.splitBefore('')](https://observablehq.com/@spencermountain/compromise-split)** - partition a phrase before each matching segment
- **[.splitAfter('')](https://observablehq.com/@spencermountain/compromise-split)** - partition a phrase after each matching segment
- **[.join()](https://observablehq.com/@spencermountain/compromise-split)** - merge any neighbouring terms in each match
- **[.joinIf(leftMatch, rightMatch)](https://observablehq.com/@spencermountain/compromise-split)** - merge any neighbouring terms under given conditions
- **[.lookup([])](https://observablehq.com/@spencermountain/compromise-match)** - quick find for an array of string matches
- **[.autoFill()](https://observablehq.com/@spencermountain/compromise-typeahead)** - create type-ahead assumptions on the document
##### Tag
- **[.tag('')](https://observablehq.com/@spencermountain/compromise-tagger)** - Give all terms the given tag
- **[.tagSafe('')](https://observablehq.com/@spencermountain/compromise-tagger)** - Only apply tag to terms if it is consistent with current tags
- **[.unTag('')](https://observablehq.com/@spencermountain/compromise-tagger)** - Remove this term from the given terms
- **[.canBe('')](https://observablehq.com/@spencermountain/compromise-tagger)** - return only the terms that can be this tag
##### Case
- **[.toLowerCase()](https://observablehq.com/@spencermountain/compromise-case)** - turn every letter of every term to lower-cse
- **[.toUpperCase()](https://observablehq.com/@spencermountain/compromise-case)** - turn every letter of every term to upper case
- **[.toTitleCase()](https://observablehq.com/@spencermountain/compromise-case)** - upper-case the first letter of each term
- **[.toCamelCase()](https://observablehq.com/@spencermountain/compromise-case)** - remove whitespace and title-case each term
##### Whitespace
- **[.pre('')](https://observablehq.com/@spencermountain/compromise-whitespace)** - add this punctuation or whitespace before each match
- **[.post('')](https://observablehq.com/@spencermountain/compromise-whitespace)** - add this punctuation or whitespace after each match
- **[.trim()](https://observablehq.com/@spencermountain/compromise-whitespace)** - remove start and end whitespace
- **[.hyphenate()](https://observablehq.com/@spencermountain/compromise-whitespace)** - connect words with hyphen, and remove whitespace
- **[.dehyphenate()](https://observablehq.com/@spencermountain/compromise-whitespace)** - remove hyphens between words, and set whitespace
- **[.toQuotations()](https://observablehq.com/@spencermountain/compromise-whitespace)** - add quotation marks around these matches
- **[.toParentheses()](https://observablehq.com/@spencermountain/compromise-whitespace)** - add brackets around these matches
##### Loops
- **[.map(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - run each phrase through a function, and create a new document
- **[.forEach(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - run a function on each phrase, as an individual document
- **[.filter(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - return only the phrases that return true
- **[.find(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - return a document with only the first phrase that matches
- **[.some(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - return true or false if there is one matching phrase
- **[.random(fn)](https://observablehq.com/@spencermountain/compromise-loops)** - sample a subset of the results
##### Insert
- **[.replace(match, replace)](https://observablehq.com/@spencermountain/compromise-insert)** - search and replace match with new content
- **[.replaceWith(replace)](https://observablehq.com/@spencermountain/compromise-insert)** - substitute-in new text
- **[.remove()](https://observablehq.com/@spencermountain/compromise-insert)** - fully remove these terms from the document
- **[.insertBefore(str)](https://observablehq.com/@spencermountain/compromise-insert)** - add these new terms to the front of each match (prepend)
- **[.insertAfter(str)](https://observablehq.com/@spencermountain/compromise-insert)** - add these new terms to the end of each match (append)
- **[.concat()](https://observablehq.com/@spencermountain/compromise-insert)** - add these new things to the end
- **[.swap(fromLemma, toLemma)](https://observablehq.com/@spencermountain/compromise-root)** - smart replace of root-words,using proper conjugation
##### Transform
- **[.sort('method')](https://observablehq.com/@spencermountain/compromise-sorting)** - re-arrange the order of the matches (in place)
- **[.reverse()](https://observablehq.com/@spencermountain/compromise-sorting)** - reverse the order of the matches, but not the words
- **[.normalize({})](https://observablehq.com/@spencermountain/compromise-normalization)** - clean-up the text in various ways
- **[.unique()](https://observablehq.com/@spencermountain/compromise-sorting)** - remove any duplicate matches
##### Lib
_(these methods are on the main `nlp` object)_
- **[nlp.tokenize(str)](https://observablehq.com/@spencermountain/compromise-tokenization)** - parse text without running POS-tagging
- **[nlp.lazy(str, match)](https://observablehq.com/@spencermountain/compromise-performance)** - scan through a text with minimal analysis
- **[nlp.plugin({})](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - mix in a compromise-plugin
- **[nlp.parseMatch(str)](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - pre-parse any match statements into json
- **[nlp.world()](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - grab or change library internals
- **[nlp.model()](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - grab all current linguistic data
- **[nlp.methods()](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - grab or change internal methods
- **[nlp.hooks()](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - see which compute methods run automatically
- **[nlp.verbose(mode)](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - log our decision-making for debugging
- **[nlp.version](https://observablehq.com/@spencermountain/compromise-constructor-methods)** - current semver version of the library
- **[nlp.addWords(obj, isFrozen?)](https://observablehq.com/@spencermountain/compromise-plugin)** - add new words to the lexicon
- **[nlp.addTags(obj)](https://observablehq.com/@spencermountain/compromise-plugin)** - add new tags to the tagSet
- **[nlp.typeahead(arr)](https://observablehq.com/@spencermountain/compromise-typeahead)** - add words to the auto-fill dictionary
- **[nlp.buildTrie(arr)](https://observablehq.com/@spencermountain/compromise-lookup)** - compile a list of words into a fast lookup form
- **[nlp.buildNet(arr)](https://observablehq.com/@spencermountain/compromise-sweep)** - compile a list of matches into a fast match form
<!-- spacer -->
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
### compromise/two:
##### Contractions
- **[.contractions()](https://observablehq.com/@spencermountain/compromise-contractions)** - things like "didn't"
- **[.contractions().expand()](https://observablehq.com/@spencermountain/compromise-contractions)** - things like "didn't"
- **[.contract()](https://observablehq.com/@spencermountain/compromise-contractions)** - things like "didn't"
<!-- spacer -->
<img height="30px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
### compromise/three:
##### Nouns
- **[.nouns()](https://observablehq.com/@spencermountain/nouns)** - return any subsequent terms tagged as a Noun
- **[.nouns().json()](https://observablehq.com/@spencermountain/nouns)** - overloaded output with noun metadata
- **[.nouns().parse()](https://observablehq.com/@spencermountain/nouns)** - get tokenized noun-phrase
- **[.nouns().isPlural()](https://observablehq.com/@spencermountain/nouns)** - return only plural nouns
- **[.nouns().isSingular()](https://observablehq.com/@spencermountain/nouns)** - return only singular nouns
- **[.nouns().toPlural()](https://observablehq.com/@spencermountain/nouns)** - `'football captain' → 'football captains'`
- **[.nouns().toSingular()](https://observablehq.com/@spencermountain/nouns)** - `'turnovers' → 'turnover'`
- **[.nouns().adjectives()](https://observablehq.com/@spencermountain/nouns)** - get any adjectives describing this noun
##### Verbs
- **[.verbs()](https://observablehq.com/@spencermountain/verbs)** - return any subsequent terms tagged as a Verb
- **[.verbs().json()](https://observablehq.com/@spencermountain/verbs)** - overloaded output with verb metadata
- **[.verbs().parse()](https://observablehq.com/@spencermountain/verbs)** - get tokenized verb-phrase
- **[.verbs().subjects()](https://observablehq.com/@spencermountain/verbs)** - what is doing the verb action
- **[.verbs().adverbs()](https://observablehq.com/@spencermountain/verbs)** - return the adverbs describing this verb.
- **[.verbs().isSingular()](https://observablehq.com/@spencermountain/verbs)** - return singular verbs like 'spencer walks'
- **[.verbs().isPlural()](https://observablehq.com/@spencermountain/verbs)** - return plural verbs like 'we walk'
- **[.verbs().isImperative()](https://observablehq.com/@spencermountain/verbs)** - only instruction verbs like 'eat it!'
- **[.verbs().toPastTense()](https://observablehq.com/@spencermountain/verbs)** - `'will go' → 'went'`
- **[.verbs().toPresentTense()](https://observablehq.com/@spencermountain/verbs)** - `'walked' → 'walks'`
- **[.verbs().toFutureTense()](https://observablehq.com/@spencermountain/verbs)** - `'walked' → 'will walk'`
- **[.verbs().toInfinitive()](https://observablehq.com/@spencermountain/verbs)** - `'walks' → 'walk'`
- **[.verbs().toGerund()](https://observablehq.com/@spencermountain/verbs)** - `'walks' → 'walking'`
- **[.verbs().toPastParticiple()](https://observablehq.com/@spencermountain/verbs)** - `'drive' → 'had driven'`
- **[.verbs().conjugate()](https://observablehq.com/@spencermountain/verbs)** - return all conjugations of these verbs
- **[.verbs().isNegative()](https://observablehq.com/@spencermountain/verbs)** - return verbs with 'not', 'never' or 'no'
- **[.verbs().isPositive()](https://observablehq.com/@spencermountain/verbs)** - only verbs without 'not', 'never' or 'no'
- **[.verbs().toNegative()](https://observablehq.com/@spencermountain/verbs)** - `'went' → 'did not go'`
- **[.verbs().toPositive()](https://observablehq.com/@spencermountain/verbs)** - `"didn't study" → 'studied'`
##### Numbers
- **[.numbers()](https://observablehq.com/@spencermountain/compromise-values)** - grab all written and numeric values
- **[.numbers().parse()](https://observablehq.com/@spencermountain/compromise-values)** - get tokenized number phrase
- **[.numbers().get()](https://observablehq.com/@spencermountain/compromise-values)** - get a simple javascript number
- **[.numbers().json()](https://observablehq.com/@spencermountain/compromise-values)** - overloaded output with number metadata
- **[.numbers().toNumber()](https://observablehq.com/@spencermountain/compromise-values)** - convert 'five' to `5`
- **[.numbers().toLocaleString()](https://observablehq.com/@spencermountain/compromise-values)** - add commas, or nicer formatting for numbers
- **[.numbers().toText()](https://observablehq.com/@spencermountain/compromise-values)** - convert '5' to `five`
- **[.numbers().toOrdinal()](https://observablehq.com/@spencermountain/compromise-values)** - convert 'five' to `fifth` or `5th`
- **[.numbers().toCardinal()](https://observablehq.com/@spencermountain/compromise-values)** - convert 'fifth' to `five` or `5`
- **[.numbers().isOrdinal()](https://observablehq.com/@spencermountain/compromise-values)** - return only ordinal numbers
- **[.numbers().isCardinal()](https://observablehq.com/@spencermountain/compromise-values)** - return only cardinal numbers
- **[.numbers().isEqual(n)](https://observablehq.com/@spencermountain/compromise-values)** - return numbers with this value
- **[.numbers().greaterThan(min)](https://observablehq.com/@spencermountain/compromise-values)** - return numbers bigger than n
- **[.numbers().lessThan(max)](https://observablehq.com/@spencermountain/compromise-values)** - return numbers smaller than n
- **[.numbers().between(min, max)](https://observablehq.com/@spencermountain/compromise-values)** - return numbers between min and max
- **[.numbers().isUnit(unit)](https://observablehq.com/@spencermountain/compromise-values)** - return only numbers in the given unit, like 'km'
- **[.numbers().set(n)](https://observablehq.com/@spencermountain/compromise-values)** - set number to n
- **[.numbers().add(n)](https://observablehq.com/@spencermountain/compromise-values)** - increase number by n
- **[.numbers().subtract(n)](https://observablehq.com/@spencermountain/compromise-values)** - decrease number by n
- **[.numbers().increment()](https://observablehq.com/@spencermountain/compromise-values)** - increase number by 1
- **[.numbers().decrement()](https://observablehq.com/@spencermountain/compromise-values)** - decrease number by 1
- **[.money()](https://observablehq.com/@spencermountain/compromise-values)** - things like `'$2.50'`
- **[.money().get()](https://observablehq.com/@spencermountain/compromise-values)** - retrieve the parsed amount(s) of money
- **[.money().json()](https://observablehq.com/@spencermountain/compromise-values)** - currency + number info
- **[.money().currency()](https://observablehq.com/@spencermountain/compromise-values)** - which currency the money is in
- **[.fractions()](https://observablehq.com/@spencermountain/compromise-values)** - like '2/3rds' or 'one out of five'
- **[.fractions().parse()](https://observablehq.com/@spencermountain/compromise-values)** - get tokenized fraction
- **[.fractions().get()](https://observablehq.com/@spencermountain/compromise-values)** - simple numerator, denominator data
- **[.fractions().json()](https://observablehq.com/@spencermountain/compromise-values)** - json method overloaded with fractions data
- **[.fractions().toDecimal()](https://observablehq.com/@spencermountain/compromise-values)** - '2/3' -> '0.66'
- **[.fractions().normalize()](https://observablehq.com/@spencermountain/compromise-values)** - 'four out of 10' -> '4/10'
- **[.fractions().toText()](https://observablehq.com/@spencermountain/compromise-values)** - '4/10' -> 'four tenths'
- **[.fractions().toPercentage()](https://observablehq.com/@spencermountain/compromise-values)** - '4/10' -> '40%'
- **[.percentages()](https://observablehq.com/@spencermountain/compromise-values)** - like '2.5%'
- **[.percentages().get()](https://observablehq.com/@spencermountain/compromise-values)** - return the percentage number / 100
- **[.percentages().json()](https://observablehq.com/@spencermountain/compromise-values)** - json overloaded with percentage information
- **[.percentages().toFraction()](https://observablehq.com/@spencermountain/compromise-values)** - '80%' -> '8/10'
##### Sentences
- **[.sentences()](https://observablehq.com/@spencermountain/compromise-sentences)** - return a sentence class with additional methods
- **[.sentences().json()](https://observablehq.com/@spencermountain/compromise-sentences)** - overloaded output with sentence metadata
<!-- - **[.sentences().subjects()](https://observablehq.com/@spencermountain/compromise-sentences)** - return the main noun of each sentence -->
- **[.sentences().toPastTense()](https://observablehq.com/@spencermountain/compromise-sentences)** - `he walks` -> `he walked`
- **[.sentences().toPresentTense()](https://observablehq.com/@spencermountain/compromise-sentences)** - `he walked` -> `he walks`
- **[.sentences().toFutureTense()](https://observablehq.com/@spencermountain/compromise-sentences)** -- `he walks` -> `he will walk`
- **[.sentences().toInfinitive()](https://observablehq.com/@spencermountain/compromise-sentences)** -- verb root-form `he walks` -> `he walk`
- **[.sentences().toNegative()](https://observablehq.com/@spencermountain/compromise-sentences)** - - `he walks` -> `he didn't walk`
- **[.sentences().isQuestion()](https://observablehq.com/@spencermountain/compromise-sentences)** - return questions with a `?`
- **[.sentences().isExclamation()](https://observablehq.com/@spencermountain/compromise-sentences)** - return sentences with a `!`
- **[.sentences().isStatement()](https://observablehq.com/@spencermountain/compromise-sentences)** - return sentences without `?` or `!`
##### Adjectives
- **[.adjectives()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'quick'`
- **[.adjectives().json()](https://observablehq.com/@spencermountain/compromise-selections)** - get adjective metadata
- **[.adjectives().conjugate()](https://observablehq.com/@spencermountain/compromise-selections)** - return all inflections of these adjectives
- **[.adjectives().adverbs()](https://observablehq.com/@spencermountain/compromise-selections)** - get adverbs describing this adjective
- **[.adjectives().toComparative()](https://observablehq.com/@spencermountain/compromise-selections)** - 'quick' -> 'quicker'
- **[.adjectives().toSuperlative()](https://observablehq.com/@spencermountain/compromise-selections)** - 'quick' -> 'quickest'
- **[.adjectives().toAdverb()](https://observablehq.com/@spencermountain/compromise-selections)** - 'quick' -> 'quickly'
- **[.adjectives().toNoun()](https://observablehq.com/@spencermountain/compromise-selections)** - 'quick' -> 'quickness'
##### Misc selections
- **[.clauses()](https://observablehq.com/@spencermountain/compromise-selections)** - split-up sentences into multi-term phrases
- **[.chunks()](https://observablehq.com/@spencermountain/compromise-selections)** - split-up sentences noun-phrases and verb-phrases
- **[.hyphenated()](https://observablehq.com/@spencermountain/compromise-selections)** - all terms connected with a hyphen or dash like `'wash-out'`
- **[.phoneNumbers()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'(939) 555-0113'`
- **[.hashTags()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'#nlp'`
- **[.emails()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'hi@compromise.cool'`
- **[.emoticons()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `:)`
- **[.emojis()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `💋`
- **[.atMentions()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'@nlp_compromise'`
- **[.urls()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'compromise.cool'`
- **[.pronouns()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'he'`
- **[.conjunctions()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'but'`
- **[.prepositions()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'of'`
- **[.abbreviations()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'Mrs.'`
- **[.people()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - names like 'John F. Kennedy'
- **[.people().json()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - get person-name metadata
- **[.people().parse()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - get person-name interpretation
- **[.places()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - like 'Paris, France'
- **[.organizations()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - like 'Google, Inc'
- **[.topics()](https://observablehq.com/@spencermountain/topics-named-entity-recognition)** - `people()` + `places()` + `organizations()`
- **[.adverbs()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'quickly'`
- **[.adverbs().json()](https://observablehq.com/@spencermountain/compromise-selections)** - get adverb metadata
- **[.acronyms()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `'FBI'`
- **[.acronyms().strip()](https://observablehq.com/@spencermountain/compromise-selections)** - remove periods from acronyms
- **[.acronyms().addPeriods()](https://observablehq.com/@spencermountain/compromise-selections)** - add periods to acronyms
- **[.parentheses()](https://observablehq.com/@spencermountain/compromise-selections)** - return anything inside (parentheses)
- **[.parentheses().strip()](https://observablehq.com/@spencermountain/compromise-selections)** - remove brackets
- **[.possessives()](https://observablehq.com/@spencermountain/compromise-selections)** - things like `"Spencer's"`
- **[.possessives().strip()](https://observablehq.com/@spencermountain/compromise-selections)** - "Spencer's" -> "Spencer"
- **[.quotations()](https://observablehq.com/@spencermountain/compromise-selections)** - return any terms inside paired quotation marks
- **[.quotations().strip()](https://observablehq.com/@spencermountain/compromise-selections)** - remove quotation marks
- **[.slashes()](https://observablehq.com/@spencermountain/compromise-selections)** - return any terms grouped by slashes
- **[.slashes().split()](https://observablehq.com/@spencermountain/compromise-selections)** - turn 'love/hate' into 'love hate'
<p>
<img height="85px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
</p>
<div align="center">
<img src="https://user-images.githubusercontent.com/399657/68221814-05ed1680-ffb8-11e9-8b6b-c7528d163871.png"/>
</div>
### .extend():
This library comes with a considerate, common-sense baseline for english grammar.
You're free to change, or lay-waste to any settings - which is the fun part actually.
the easiest part is just to suggest tags for any given words:
```js
let myWords = {
kermit: 'FirstName',
fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)
```
or make heavier changes with a [compromise-plugin](https://observablehq.com/@spencermountain/compromise-plugins).
```js
import nlp from 'compromise'
nlp.extend({
// add new tags
tags: {
Character: {
isA: 'Person',
notA: 'Adjective',
},
},
// add or change words in the lexicon
words: {
kermit: 'Character',
gonzo: 'Character',
},
// change inflections
irregulars: {
get: {
pastTense: 'gotten',
gerund: 'gettin',
},
},
// add new methods to compromise
api: View => {
View.prototype.kermitVoice = function () {
this.sentences().prepend('well,')
this.match('i [(am|was)]').prepend('um,')
return this
}
},
})
```
<div align="right">
<a href="https://docs.compromise.cool/compromise-plugins">.plugin() docs</a>
</div>
<div align="center">
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221848-11404200-ffb8-11e9-90cd-3adee8d8564f.png"/>
</div>
<!-- spacer -->
<div >
<img height="50px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
</div>
### Docs:
##### gentle introduction:
- **[#1) Input → output](https://docs.compromise.cool/tutorial-1)**
- **[#2) Match & transform](https://docs.compromise.cool/compromise-tutorial-2)**
- **[#3) Making a chat-bot](https://docs.compromise.cool/compromise-making-a-bot)**
<!-- * **[Tutorial #4]()** - Making a plugin -->
<div >
<img height="25px" src="https://user-images.githubusercontent.com/399657/68221862-17ceb980-ffb8-11e9-87d4-7b30b6488f16.png"/>
</div>
##### Documentation:
| Concepts | API | Plugins |
| ------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------: |
| [Accuracy](https://observablehq.com/@spencermountain/compromise-accuracy) | [Accessors](https://observablehq.com/@spencermountain/compromise-accessors) | [Adjectives](https://observablehq.com/@spencermountain/compromise-adjectives) |
| [Caching](https://observablehq.com/@spencermountain/compromise-cache) | [Constructor-methods](https://observablehq.com/@spencermountain/compromise-constructor-methods) | [Dates](https://observablehq.com/@spencermountain/compromise-dates) |
| [Case](https://observablehq.com/@spencermountain/compromise-case) | [Contractions](https://observablehq.com/@spencermountain/compromise-contractions) | [Export](https://observablehq.com/@spencermountain/compromise-export) |
| [Filesize](https://observablehq.com/@spencermountain/compromise-filesize) | [Insert](https://observablehq.com/@spencermountain/compromise-insert) | [Hash](https://observablehq.com/@spencermountain/compromise-hash) |
| [Internals](https://observablehq.com/@spencermountain/compromise-internals) | [Json](https://observablehq.com/@spencermountain/compromise-json) | [Html](https://observablehq.com/@spencermountain/compromise-html) |
| [Justification](https://observablehq.com/@spencermountain/compromise-justification) | [Character Offsets](https://observablehq.com/@spencermountain/compromise-offsets) | [Keypress](https://observablehq.com/@spencermountain/compromise-keypress) |
| [Lexicon](https://observablehq.com/@spencermountain/compromise-lexicon) | [Loops](https://observablehq.com/@spencermountain/compromise-loops) | [Ngrams](https://observablehq.com/@spencermountain/compromise-ngram) |
| [Match-syntax](https://observablehq.com/@spencermountain/compromise-match-syntax) | [Match](https://observablehq.com/@spencermountain/compromise-match) | [Numbers](https://observablehq.com/@spencermountain/compromise-values) |
| [Performance](https://observablehq.com/@spencermountain/compromise-performance) | [Nouns](https://observablehq.com/@spencermountain/nouns) | [Paragraphs](https://observablehq.com/@spencermountain/compromise-paragraphs) |
| [Plugins](https://observablehq.com/@spencermountain/compromise-plugins) | [Output](https://observablehq.com/@spencermountain/compromise-output) | [Scan](https://observablehq.com/@spencermountain/compromise-scan) |
| [Projects](https://observablehq.com/@spencermountain/compromise-projects) | [Selections](https://observablehq.com/@spencermountain/compromise-selections) | [Sentences](https://observablehq.com/@spencermountain/compromise-sentences) |
| [Tagger](https://observablehq.com/@spencermountain/compromise-tagger) | [Sorting](https://observablehq.com/@spencermountain/compromise-sorting) | [Syllables](https://observablehq.com/@spencermountain/compromise-syllables) |
| [Tags](https://observablehq.com/@spencermountain/compromise-tags) | [Split](https://observablehq.com/@spencermountain/compromise-split) | [Pronounce](https://observablehq.com/@spencermountain/compromise-pronounce) |
| [Tokenization](https://observablehq.com/@spencermountain/compromise-tokenization) | [Text](https://observablehq.com/@spencermountain/compromise-text) | [Strict](https://observablehq.com/@spencermountain/compromise-strict) |
| [Named-Entities](https://observablehq.com/@spencermountain/topics-named-entity-recognition) | [Utils](https://observablehq.com/@spencermountain/compromise-utils) | [Penn-tags](https://observablehq.com/@spencermountain/compromise-penn-tags) |
| [Whitespace](https://observablehq.com/@spencermountain/compromise-whitespace) | [Verbs](https://observablehq.com/@spencermountain/verbs) | [Typeahead](https://observablehq.com/@spencermountain/compromise/compromise-typeahead) |
| [World data](https://observablehq.com/@spencermountain/compromise-world) | [Normalization](https://observablehq.com/@spencermountain/compromise-normalization) | [Sweep](https://observablehq.com/@spencermountain/compromise-sweep) |
| [Fuzzy-matching](https://observablehq.com/@spencermountain/compromise-fuzzy-matching) | [Typescript](https://observablehq.com/@spencermountain/compromise-typescript) | [Mutation](https://observablehq.com/@spencermountain/compromise-mutation) |
| [Root-forms](https://observablehq.com/@spencermountain/compromise-root) |
<di