bpe-merge-visualizer
Version:
CLI tool to visualize Byte Pair Encoding (BPE) merge steps
71 lines (46 loc) โข 1.29 kB
Markdown
# ๐ก BPE Merge Visualizer
A simple CLI tool to visualize step-by-step Byte Pair Encoding (BPE) merge operations using a mock rule set. Great for learning how BPE tokenization works (like in GPT models).
## ๐งช Usage
After running `npm link`, you can run the CLI tool with:
```bash
bpeviz
```
You'll be prompted to enter a word:
```
Enter a word to visualize BPE merging: indivisibility
Step 0: [i] [n] [d] [i] [v] [i] [s] [i] [b] [i] [l] [i] [t] [y]
Step 1: [i] [n] [d] [i] [v] [is] [i] [b] [i] [l] [i] [t] [y]
Step 2: ...
```
## ๐ฆ Folder Structure
```
bpe-merge-visualizer/
โโโ bin/
โ โโโ cli.js # CLI entry file
โโโ src/
โ โโโ visualizer.js # Merge logic
โโโ package.json
โโโ README.md
```
## ๐ Features
- Step-by-step visualization of BPE merge rules
- Designed to simulate GPT tokenization logic
- Fully extensible to support real `cl100k_base.json` or other vocab
## ๐ฎ Future Improvements
- Support real BPE vocab files
- Add Web/GUI version
- Export to HTML/SVG
## ๐ License
MIT
## ๐ค Inspired by
- OpenAI's [`tiktoken`](https://github.com/openai/tiktoken)
- HuggingFace's `tokenizers`
## ๐ Author
Made with โค๏ธ by [MOHD RAZA](https://github.com/raza001)