UNPKG

which-lang

Version:

the best option for predicting a language

56 lines (38 loc) 2.35 kB
# Fast-Text Language Detection In a search for the _best_ option for predicting a language from text which didn't require a large machine learning model, it appeared that fast-text, created by FaceBook, was the best option (https://towardsdatascience.com/benchmarking-language-detection-for-nlp-8250ea8b67c). ## Installation ``` npm install which-lang ``` **Note: This will install the fast-text model by facebook which is about 150MB. You also need python installed, if you're running an alipine docker see how to easily do this [here](https://stackoverflow.com/questions/54428608/docker-node-alpine-image-build-fails-on-node-gyp)** ## Usage ### Prediction _Testing_ ```js import LanguageDetection from 'which-lang'; async function run(){ const lid = new LanguageDetection() console.log(await lid.predict('FastText-LID provides a great language identification')) console.log(await lid.predict('FastText-LID bietet eine hervorragende Sprachidentifikation')) console.log(await lid.predict('FastText-LID fornisce un ottimo linguaggio di identificazione')) console.log(await lid.predict('FastText-LID fournit une excellente identification de la langue')) console.log(await lid.predict('FastText-LID proporciona una gran identificación de idioma')) console.log(await lid.predict('FastText-LID обеспечивает отличную идентификацию языка')) console.log(await lid.predict('这个case我想close.')) console.log(await lid.predict('FastText-LID提供了很好的語言識別')) } run() ``` > The second argument is the number of returned responses, i.e. `lid.predict(text, 10)` will return an array of 10 results _Output_ ``` [ { lang: 'en', prob: 0.6313226222991943, isReliableLanguage: true } ] [ { lang: 'de', prob: 0.9137917160987854, isReliableLanguage: true } ] [ { lang: 'it', prob: 0.974501371383667, isReliableLanguage: true } ] [ { lang: 'fr', prob: 0.7358829379081726, isReliableLanguage: true } ] [ { lang: 'es', prob: 0.9211937189102173, isReliableLanguage: true } ] [ { lang: 'ru', prob: 0.9899846911430359, isReliableLanguage: true } ] [ { lang: 'zh', prob: 0.9437162280082703, isReliableLanguage: true } ] [ { lang: 'zh', prob: 0.8515647649765015, isReliableLanguage: true } ] ``` > `isReliableLanguage` is true if there were 10 + test results and accuracy was 95% or more