which-lang
Version:
the best option for predicting a language
56 lines (38 loc) • 2.35 kB
Markdown
# Fast-Text Language Detection
In a search for the _best_ option for predicting a language from text which didn't require a large machine learning model, it appeared that fast-text, created by FaceBook, was the best option (https://towardsdatascience.com/benchmarking-language-detection-for-nlp-8250ea8b67c).
## Installation
```
npm install which-lang
```
**Note: This will install the fast-text model by facebook which is about 150MB. You also need python installed, if you're running an alipine docker see how to easily do this [here](https://stackoverflow.com/questions/54428608/docker-node-alpine-image-build-fails-on-node-gyp)**
## Usage
### Prediction
_Testing_
```js
import LanguageDetection from 'which-lang';
async function run(){
const lid = new LanguageDetection()
console.log(await lid.predict('FastText-LID provides a great language identification'))
console.log(await lid.predict('FastText-LID bietet eine hervorragende Sprachidentifikation'))
console.log(await lid.predict('FastText-LID fornisce un ottimo linguaggio di identificazione'))
console.log(await lid.predict('FastText-LID fournit une excellente identification de la langue'))
console.log(await lid.predict('FastText-LID proporciona una gran identificación de idioma'))
console.log(await lid.predict('FastText-LID обеспечивает отличную идентификацию языка'))
console.log(await lid.predict('这个case我想close.'))
console.log(await lid.predict('FastText-LID提供了很好的語言識別'))
}
run()
```
> The second argument is the number of returned responses, i.e. `lid.predict(text, 10)` will return an array of 10 results
_Output_
```
[ { lang: 'en', prob: 0.6313226222991943, isReliableLanguage: true } ]
[ { lang: 'de', prob: 0.9137917160987854, isReliableLanguage: true } ]
[ { lang: 'it', prob: 0.974501371383667, isReliableLanguage: true } ]
[ { lang: 'fr', prob: 0.7358829379081726, isReliableLanguage: true } ]
[ { lang: 'es', prob: 0.9211937189102173, isReliableLanguage: true } ]
[ { lang: 'ru', prob: 0.9899846911430359, isReliableLanguage: true } ]
[ { lang: 'zh', prob: 0.9437162280082703, isReliableLanguage: true } ]
[ { lang: 'zh', prob: 0.8515647649765015, isReliableLanguage: true } ]
```
> `isReliableLanguage` is true if there were 10 + test results and accuracy was 95% or more