UNPKG

node-lefff

Version:

Pure Javascript implementation of LEFFF lemmatizer

95 lines (71 loc) 3.12 kB
# node-lefff Pure JavaScript implementation of **LEFFF** lemmatizer. Part of this readme come from the Python implementation https://github.com/ClaudeCoulombe/FrenchLefffLemmatizer ### Introduction A French Lemmatizer in JavaScript based on the LEFFF (Lexique des Formes Fléchies du Français / Lexicon of French inflected forms) is a large-scale morphological and syntactic lexicon for French. A lemmatizer returns the lemma or more simply the dictionary entry of a word, In French, the lemmatization of a verb returns this verb to the infinitive and for the other words, the lemmatization returns this word to the masculine singular. ### Main reference Sagot (2010). The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French. In Proceedings of the 7th international conference on Language Resources and Evaluation (LREC 2010), Istanbul, Turkey. Retrieved from [Benoît Sagot Webpage about LEFFF](http://alpage.inria.fr/~sagot/lefff-en.html) In this project, we use the morphological lexicon only: .mlex file which has a simple format in CSV (4 fields separated by `\t`) [LEFFF download hyperlink](https://gforge.inria.fr/frs/download.php/file/34601/lefff-3.4.mlex.tgz) [Tagset](http://alpage.inria.fr/frmgwiki/content/tagset-frmg) format FRMG - from the ALPAGE project since 2004 ## Installation npm i node-lefff ## How to use const nodeLefff = require('node-lefff'); const nl = await nodeLefff.load(); nl.lem('action') // action nl.lem('acteur') // acteur nl.lem('actrices') // acteur nl.lem('Dleyton') // Dleyton ## How to use with [natural](https://www.npmjs.com/package/natural) const nodeLefff = require('node-lefff'); const nl = await nodeLefff.load(); const Stemmer = require('natural/lib/natural/stemmers/stemmer_fr'); const LefffLemmer = new Stemmer(); LefffLemmer.stem = nl.lem; LefffLemmer.tokenizeAndStem('Mes mémés m\'aimaient mais pas papa'); // ['mémé', 'aimer', 'papa'] ## API Reference ### nl.lem(word) - word `<String>`: Word - @return `<String>`: Lemmatized word, or word itself, if no lemma is known ### nl.infos(word) - word `<String>`: Word - @return `Array<Object>`: ``` [{ type: String lemma: String // Lemmatized word mode: String }, ...] ``` > Learn more about mode and type here: > - mode: https://github.com/Poyoman39/node-lefff/blob/main/src/lefff-3.4.mlex/lefff-tagset-0.1.2.pdf > - type: https://github.com/Poyoman39/node-lefff/blob/main/src/lefff-3.4.mlex/lefff-3.4.mlex ### nl.expandMode(mode) - mode `String`: mode string returned by `nl.infos` - @return `Object`: ``` { indicatif: Bool, conditionnel: Bool, impératif: Bool, subjonctif: Bool, participe: Bool, infinitif: Bool, présent: Bool, futur: Bool, imparfait: Bool, passéSimple: Bool, passé: Bool, premièrePersonne: Bool, deuxièmePersonne: Bool, troisièmePersonne: Bool, masculin: Bool, féminin: Bool, singulier: Bool, pluriel: Bool, } ```