cjk-tokenizer

Version:

A CJK text tokenizer

31 lines (19 loc) • 575 B

Markdown

cjk-tokenizer ============= Extract terms from CJK text. The origin idea is stolen from [timdream/wordfreq](https://github.com/timdream/wordfreq). ## Why? A CJK text tokenizer that works as expected is missing in the javascript magic world. So I decided to build one with these features: * Chinese, Japanese and Korean support * Terms extracted would contain score, position in origin text, etc. * A more common stop words collection ## Install Use in project: ```shell npm i cjk-tokenizer --save ``` Cli: ```shell npm i cjk-tokenizer -g ``` ## Demo ## Contribute