wink-embeddings-small-en-50d

Version:

Small English 50-dimensional word-embedding dataset compatible with wink-nlp.

github.com/thegreatbey/wink-embeddings-small-en-50d

thegreatbey/wink-embeddings-small-en-50d

99 lines (65 loc) • 2.42 kB

Markdown

# wink-embeddings-small-en-50d [![npm version](https://img.shields.io/npm/v/wink-embeddings-small-en-50d.svg)](https://www.npmjs.com/package/wink-embeddings-small-en-50d) Small English 50-dimension word-embedding dataset compatible with [wink-nlp](https://www.npmjs.com/package/wink-nlp). > **Package size:** ≤ 10 MB > **Vocabulary:** ≈ 5 k–10 k most-common English words (you can regenerate with any size you like). --- ## Installation ```bash npm install wink-embeddings-small-en-50d ``` ## Usage ```ts import winkNLP from 'wink-nlp'; import model from 'wink-eng-lite-web-model'; import embeddings from 'wink-embeddings-small-en-50d'; const nlp = winkNLP(model); nlp.readDoc('hello world').tokens().each((t) => { const word = t.out(); const vector = embeddings[word]; console.log(word, vector); }); ``` Each vector is an array of **50 floats** and can be used with cosine similarity, etc. ## API ### `import embeddings from 'wink-embeddings-small-en-50d'` Returns a plain object mapping strings → number\[50\]. ```ts interface Vector extends ReadonlyArray<number> { length: 50; } interface Embeddings { [word: string]: Vector } ``` ## Regenerating / Updating the Dataset A conversion script is provided to build your own subset from any GloVe 50-dimension file. ```bash # Example: download the GloVe 6B 50d file curl -L https://nlp.stanford.edu/data/glove.6B.zip -o glove.zip unzip glove.zip glove.6B.50d.txt # Convert the first 10 000 lines → src/embeddings.json npm run convert:glove -- ./glove.6B.50d.txt src/embeddings.json 10000 ``` Commit the new `embeddings.json`, rebuild, and publish. ## Development ```bash npm install npm test npm run build ``` ## Testing The test-suite validates that: 1. All keys are strings. 2. Every vector has length 50 and all elements are numbers. ```bash npm test ``` ## Publishing ```bash npm version patch # or minor/major npm publish --access public ``` ## 🔗 Related 1. 👉 Need to clean and normalize text before embedding it? Check out [`text-prep-lite`](https://www.npmjs.com/package/text-prep-lite) 2. 👉 Need a simple and robust PDF text extraction utility with an quality interface? Check out [`pdf-worker-package`]https://www.npmjs.com/package/pdf-worker-package --- © 2025 Cavani21/TheGreatBey – MIT License