cjk-tokenizer
Version:
A CJK text tokenizer
31 lines (19 loc) • 575 B
Markdown
cjk-tokenizer
=============
Extract terms from CJK text. The origin idea is stolen from [timdream/wordfreq](https://github.com/timdream/wordfreq).
## Why?
A CJK text tokenizer that works as expected is missing in the javascript magic world. So I decided to build one with these features:
* Chinese, Japanese and Korean support
* Terms extracted would contain score, position in origin text, etc.
* A more common stop words collection
## Install
Use in project:
```shell
npm i cjk-tokenizer --save
```
Cli:
```shell
npm i cjk-tokenizer -g
```
## Demo
## Contribute