ultratoken
Version:
UltraToken Utility - CLI tool for token cost analysis
142 lines (95 loc) ⢠3.04 kB
Markdown
# UltraToken v1.0.3
UltraToken is a CLI utility that replicates TikTokens BPE tokenizer, and utilizes OpenAIs o200K Harmony encodings for fast, precise token cost estimation.
## Features
- **Zero external dependencies** - entirely self contained
- **Complete BPE Implementation** - Full byte-pair encoding algorithm
- **Embedded o200k Vocabulary** - 200k token vocabulary included
- **Accurate Regex Splitting** - Matches OpenAI's text segmentation
- **Optimized Token Mapping** - Fast lookups using Map structures
- **Special Token Support** - Handles control tokens correctly
- **Batch Processing**
## Installation
Install UltraToken with npm
```bash
npm i ultratoken
```
Or clone and run locally:
```bash
git clone https://github.com/TrintechResearch/UltraToken.git
cd UltraToken
npm install -g .
```
## Usage
### Command Overview
| Command | Description |
|---------|-------------|
| `ultratoken` | Start interactive mode |
| `ultratoken <text>` | Get token count for text |
| `ultratoken economy <file.md>` | Process word list file |
| `ultratoken jump` | Exit the program |
| `ultratoken --help` | Show help information |
| `ultratoken --version` | Show version information |
### Interactive Mode
Start the interactive session:
```bash
ultratoken
```
```
š UltraToken TikToken Utility
Interactive Mode - Type words to get token counts
Commands: "jump" to exit, "help" for help
ultratoken hello world
"hello world" = 2 tokens
ultratoken programming
"programming" = 2 tokens
ultratoken The quick brown fox
"The quick brown fox" = 4 tokens
ultratoken jump
UltraToken terminated. Goodbye!
```
### Single Text Analysis
Get token count for any text:
```bash
ultratoken "machine learning"
# Output:
# Word: machine learning
# Tokens: 2
ultratoken "The quick brown fox jumps over the lazy dog"
# Output:
# Word: The quick brown fox jumps over the lazy dog
# Tokens: 9
```
### Economy Mode - Batch Processing
Process a file containing a list of words. UltraToken will append the token count to each line:
**Input file (words.txt):**
```
hello world
artificial intelligence
machine learning
programming
natural language processing
```
**Command:**
```bash
ultratoken economy words.txt
```
**Output file (words.txt) after processing:**
```
hello world 2
artificial intelligence 3
machine learning 2
programming 2
natural language processing 4
```
## Documentation
For detailed documentation and advanced features, see Documentation.md
## License
[MIT](https://choosealicense.com/licenses/mit/)
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgements
- [**Civil McKnight**](https://x.com/LILCUJOCOLLINS) - Project Ultra
- [TrinityAI](https://github.com/matiassingers/awesome-readme)
- [Project Ultra](https://bulldogjob.com/news/449-how-to-write-a-good-readme-for-your-github-project)
## About
UltraToken is developed by [TrinityAI Research](https://research.trinityai.us), specializing in advanced LLM development.
## š Links