UNPKG

bpe-merge-visualizer

Version:

CLI tool to visualize Byte Pair Encoding (BPE) merge steps

71 lines (46 loc) โ€ข 1.29 kB
# ๐Ÿ”ก BPE Merge Visualizer A simple CLI tool to visualize step-by-step Byte Pair Encoding (BPE) merge operations using a mock rule set. Great for learning how BPE tokenization works (like in GPT models). --- ## ๐Ÿงช Usage After running `npm link`, you can run the CLI tool with: ```bash bpeviz ``` You'll be prompted to enter a word: ``` Enter a word to visualize BPE merging: indivisibility Step 0: [i] [n] [d] [i] [v] [i] [s] [i] [b] [i] [l] [i] [t] [y] Step 1: [i] [n] [d] [i] [v] [is] [i] [b] [i] [l] [i] [t] [y] Step 2: ... ``` --- ## ๐Ÿ“ฆ Folder Structure ``` bpe-merge-visualizer/ โ”œโ”€โ”€ bin/ โ”‚ โ””โ”€โ”€ cli.js # CLI entry file โ”œโ”€โ”€ src/ โ”‚ โ””โ”€โ”€ visualizer.js # Merge logic โ”œโ”€โ”€ package.json โ””โ”€โ”€ README.md ``` --- ## ๐Ÿ›  Features - Step-by-step visualization of BPE merge rules - Designed to simulate GPT tokenization logic - Fully extensible to support real `cl100k_base.json` or other vocab --- ## ๐Ÿ”ฎ Future Improvements - Support real BPE vocab files - Add Web/GUI version - Export to HTML/SVG --- ## ๐Ÿ“œ License MIT --- ## ๐Ÿค– Inspired by - OpenAI's [`tiktoken`](https://github.com/openai/tiktoken) - HuggingFace's `tokenizers` --- ## ๐Ÿ™Œ Author Made with โค๏ธ by [MOHD RAZA](https://github.com/raza001)