compromise
Version:
modest natural language processing
376 lines (250 loc) • 10.6 kB
Markdown
compromise uses semver, and pushes to npm frequently
(github-releases occasionally)
- **Major** is a breaking api change - method or response changes that can cause runtime errors.
- **Minor** is a behaviour change - Tagging or grammar changes.
- **Patch** is an obvious, non-controversial bugfix.
While all _Major_ releases should be reviewed, our only two _large_ releases are **v6** in 2016 and and **v12** in 2019. Others have been mostly incremental, or niche.
##### 12.3.0
- prefer `@titleCase` instead of `#TitleCase` tag
- update dependencies
- fix case-sensitive paths
- fix greedy-start match condition regression #651
- fix single period sentence runtime error
- fix potentially-unsafe regexes
- improved tagging for '-ed' verbs (#616)
- improve support for auxilary-pastTense ('was lifted') verb-phrases
- more robust number-tagging regexes
- setup typescript types for plugins #661 (thanks @Drache93!)
- verb conjugation and tagger bugfixes
- disambiguate acryonyms & yelling
##### 12.2.1
- fix 'aint' contraction
- make Doc.world writable
- update deps
- more tests
- fix shared period with acronym at end of sentence
- fix some mis-classification of contraction
- fix over-active emoji regex
- tag 'cookin', 'hootin' as `Gerund`
- support unicode single-quote symbols in contractions
##### 12.2.0
- improved splitting in .nouns()
- add `.nouns().adjectives()` method
- add `concat` param to `.pre()` and `.post()`
- allow ellipses at start of term _"....so"_ in `@hasEllipses`
- fix matches with optional-end `foo?$` match syntax
- add typescript types for subsets
##### 12.1.0
- add 'sideEffect:false' flag to build
- considerable speedup (20%) in tagger
- ensure trimming of whitespace for root/clean/reduced text formats
- fix client-side logging
- more flexible params to `replace()` and `replaceWith()`
### 12.0.0 :postal_horn:
- see **[Release Notes](https://github.com/spencermountain/compromise/wiki/v12-Release-Notes)**
- support singular units in `.value()`
- `.quotations()` no-longer return repeated results for nested quotes
- simplify quotation tagset
- `.out('normal')` no longer includes quotes or trailing-possessives
- improve `.debug()` on client-side
- better honorific support, add `honorifics` feature to .normalize()
- elipses bugfixes
- replace unicode chars in `.normalize()` now by default
- `acronyms().stripPeriods()` and `acronyms().addPeriods()`
- tag professions as `
- add more behaviours to `.normalize()`
- support match-results as inputs to .match() and .not()
- support some us-state abbreviations like 'Phoeniz AZ'
- add `nouns().toPossessive()`
- ngrams now remove empty-terms in contractions - fixes counting issue [
- expose internal `sentences().isQuestion()` method
- `.join()` as an alias for `.flatten()`
- slightly different behavior for wildcards in capture-groups [pull/472](https://github.com/spencermountain/compromise/pull/472)
- `.possessives()` subset + `
- hide massive `world` output for console.log of a term
- improve quotations() method
- add .parentheses() method
- add 'nickname' support to .people()
- 'will be #Adjective' now tagged as Copula
- include adverbs in verb conjugation (more) consistently
- `sentences().toContinuous()` and `verbs().toGerund()`
- some more aliases for jquery-like methods api
- move `getPunctuation`, `setPunctuation` from .sentence to main Text method
- rename internal `endPunctuation` to `getPunctuation`
- more consistent `cardinal/ordinal` tagging for values
##### 11.5.0
- add #Abbreviation tag
- add #ProperNoun tag
- fixes for noun inflection
##### 11.4.1
- include old ending punctuation in a `.replace()` cmd
##### 11.3.1
- almost-double the support for first-names
- changes to bestTag method
##### 11.2.1
- rolls-back some aggressive JustesonKatz stuff
- better support for emdash numberRange
- 'can\'t' contraction bugfix
- fix for dates().toShortForm()
##### 11.1.0
- add `#Multiple` Values tag, and changes to how invalid numbers like 'sixty fifteen hundred' are understood
- better em-dash/en-dash support
- better conjugate implicit verbs inside contractions - "i'm", "we've"
- nouns().articles() method
- neighborhoods as #Place
- support more complex noun-phrases with JustesonKatz in `.nouns()`
### v11
- support for persistent lexicon/tagset changes
- `addTags, addWords, addRegs, addPlurals, addConjugations` methods to extend native data
- - `.plugin()` method to wrap all of these into one
- - (removal of `.packWords()` method)
- more `.organizations()` matches
- regex-support in .match() - `nlp('it is waaaay cool').match('/aaa/').out()//'waaaay'`
- improved apostrophe-s disambiguation
- support whitespace before sentence boundary
- improved QuestionWord tagging, some `.questions()` without a question-mark
- phrasalVerb conjugation
- new #Activity tag for Gerunds as nouns 'walking is fun'
- change ngram params to an object `{size:int, max:int}`
- implement '[]' capture-group syntax in .match()
- bring-back `map, filter, foreach and reduce` methods
- set `.words()` as alias for .terms()
- `people().firstNames()`, `people().lastNames()`
- split-out comma-separated adverbs
- fix for '.watch' reserved word in efrt
- improved `places()` parsing
- improved `{min,max}` match syntax
- new `.out('match')` method
- quiet addition of .pack() and .unpack() for owen
- move internal lexicon around, to support new format in v11
- added states & provinces as
- added
- add increment/decrement/add/subtract methods to .values()
- add units(), noUnits() methods to .values()
- 'uncountable' nouns are no longer assumed to be singular
- money tag is no longer always a value
##### 10.4.0
- improved tagging of `VerbPhrase` and `Condition`
- fixes to contractions in sentence-changes - "i'm going -> i went"
- several verb conjugation fixes
- accept Terms & Result objects in .match() and .replace()
##### 10.3.0
- new `Percent` tag
- lump more units in with `.values()`
##### 10.2.0
- .trim() method,
- adjective tagging fixes
- some new .out() methods
##### 10.1.0
- fix return format of .isPlural(), so it acts like a match filter
- less-greedy date tagging & ambiguous month fixes
### v10
- cleanup & rename some `.value()` methods
- change lumping behaviour of lexicon terms with multiple words
- keep more former tags after a term replace method
- new `.random()` method
- new `.lessThan()`, `.greaterThan()`, `.equalTo()` methods
- new prefix/suffix/infix matches with `_ffix` syntax
- `tag()` supports a sequence of tags for a sequence of terms
- .match 'range' queries now use a real match - `#Adverb{2,4}`
- new `.before()` and `.after()` match methods
- removes `.lexicon()` method for many-lexicons concept
- changes params of `.replaceWith()` method to a 'keyTags' boolean
- improved .debug() and logging on client-side
- pretty-real filesize reduction by swapping es6 classes for es5 inheritance
- rename `Term.tag` object to `Term.tags` so the `.tag()` method can work throughout more-consistently
- fix 'Auxillary' tag typo to 'Auxiliary'
- optimisation of .match(), and tagset - significant speedup!
- adds `.tagger()` method and cleanup extra params
- adds `wordStart` and `wordEnd` offsets to `.out('offset')` for whitespace+punctuation
- new `.has()` method for faster lookups
- add `nlp.out('index')` method, 12 bugs
- add `nlp.tokenize()` method for disabling pos-tagging of input
- less-ambitious date-parsing of nl-date forms
- filesize reduction using [efrt](https://github.com/nlp-compromise/efrt) data structure (254k -> 214k)
- fix for IE9
- weee! [big change!](https://github.com/nlp-compromise/compromise/wiki/v7-Upgrade,-welcome) _npm package rename_
- builds now using browserify + derequire()
- re-written term-lumper logic
- new nlp.lexicon({word:'POS'}) flow
- be consistent with `text.normal()`, `term.all_forms()`, `text.word_count()`. `text.normal()` includes sentence-terminators, like periods etc.
- airport codes support, helper methods for specific POS
- newlines split sentences
- Text methods now return this, instead of array of sentences
- more-sensible responses for invalid, non-string inputs
- 14 PRs, with fixes for currencies, pluralization, conjugation
- Value.to_text() new method, fix "Posessive" POS typo
- return of the text.spot() method (Re:
- more aggressive lumping of dates, like 'last week of february'
- whitespace reproduction in .text() methods
- move negate from sentence to verb & statement
- rename 'implicit' to 'expansion' for smarter contractions
- added readable-compression to adj, verbs (121kb -> 117kb)
- hyphenated words are normalized into spaces
- grammar-aware match & replace functions
- Statement & Question classes
- split ngram, locale, and syllables into plugins in seperate repo
- es6 classes, babel building
- better test coverage
- ngram uses term tokenization, so that 'Tony Hawk' us one term, and not two
- more organized pos rules
- Pos tagging is done implicitly now once nlp.Text is run
- Entity spotting is split into .people(), .place(), .organisations()
- unicode normalisation is killed
- opaque two-letter tags are gone
- plugin support
- passive tense detection
- lexicon can be augmented third-party
- date parsing results are different
- smarter handling of ambiguous contractions ("he's" -> ["he is", "he has"])
### v1.0.0 - May 2015
- added name genders and beginning of co-reference resolution ('Tony' -> 'he') API.
- small breaking change on `Noun.is_plural` and `Noun.is_entity`, affording significant pos() speedup. Bumped Major version for these changes.
##### 0.5.2 - May 2015
- Phrasal verbs ('step up'), firstnames and .people()
##### 0.4.0 - May 2015
- Major file-size reduction through refactoring
##### 0.3.0 - Jan 2015
- New NER choosing algorithm, better capitalisation logic, consolidated tests
##### 0.2.0 - Nov 2014
- Sentence class methods, client-side demos