Simple algorithm to tokenize Chinese texts into words using CC-CEDICT.
github.com/yishn/chinese-tokenizer
yishn/chinese-tokenizer