wink-embeddings-small-en-50d
Version:
Small English 50-dimensional word-embedding dataset compatible with wink-nlp.
99 lines (65 loc) • 2.42 kB
Markdown
# wink-embeddings-small-en-50d
[](https://www.npmjs.com/package/wink-embeddings-small-en-50d)
Small English 50-dimension word-embedding dataset compatible with [wink-nlp](https://www.npmjs.com/package/wink-nlp).
> **Package size:** ≤ 10 MB
> **Vocabulary:** ≈ 5 k–10 k most-common English words (you can regenerate with any size you like).
---
## Installation
```bash
npm install wink-embeddings-small-en-50d
```
## Usage
```ts
import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
import embeddings from 'wink-embeddings-small-en-50d';
const nlp = winkNLP(model);
nlp.readDoc('hello world').tokens().each((t) => {
const word = t.out();
const vector = embeddings[word];
console.log(word, vector);
});
```
Each vector is an array of **50 floats** and can be used with cosine similarity, etc.
## API
### `import embeddings from 'wink-embeddings-small-en-50d'`
Returns a plain object mapping strings → number\[50\].
```ts
interface Vector extends ReadonlyArray<number> { length: 50; }
interface Embeddings { [word: string]: Vector }
```
## Regenerating / Updating the Dataset
A conversion script is provided to build your own subset from any GloVe 50-dimension file.
```bash
# Example: download the GloVe 6B 50d file
curl -L https://nlp.stanford.edu/data/glove.6B.zip -o glove.zip
unzip glove.zip glove.6B.50d.txt
# Convert the first 10 000 lines → src/embeddings.json
npm run convert:glove -- ./glove.6B.50d.txt src/embeddings.json 10000
```
Commit the new `embeddings.json`, rebuild, and publish.
## Development
```bash
npm install
npm test
npm run build
```
## Testing
The test-suite validates that:
1. All keys are strings.
2. Every vector has length 50 and all elements are numbers.
```bash
npm test
```
## Publishing
```bash
npm version patch # or minor/major
npm publish --access public
```
## 🔗 Related
1. 👉 Need to clean and normalize text before embedding it?
Check out [`text-prep-lite`](https://www.npmjs.com/package/text-prep-lite)
2. 👉 Need a simple and robust PDF text extraction utility with an quality interface?
Check out [`pdf-worker-package`]https://www.npmjs.com/package/pdf-worker-package
---
© 2025 Cavani21/TheGreatBey – MIT License