UNPKG

ultratoken

Version:

UltraToken Utility - CLI tool for token cost analysis

142 lines (95 loc) • 3.04 kB
# UltraToken v1.0.3 UltraToken is a CLI utility that replicates TikTokens BPE tokenizer, and utilizes OpenAIs o200K Harmony encodings for fast, precise token cost estimation. ## Features - **Zero external dependencies** - entirely self contained - **Complete BPE Implementation** - Full byte-pair encoding algorithm - **Embedded o200k Vocabulary** - 200k token vocabulary included - **Accurate Regex Splitting** - Matches OpenAI's text segmentation - **Optimized Token Mapping** - Fast lookups using Map structures - **Special Token Support** - Handles control tokens correctly - **Batch Processing** ## Installation Install UltraToken with npm ```bash npm i ultratoken ``` Or clone and run locally: ```bash git clone https://github.com/TrintechResearch/UltraToken.git cd UltraToken npm install -g . ``` ## Usage ### Command Overview | Command | Description | |---------|-------------| | `ultratoken` | Start interactive mode | | `ultratoken <text>` | Get token count for text | | `ultratoken economy <file.md>` | Process word list file | | `ultratoken jump` | Exit the program | | `ultratoken --help` | Show help information | | `ultratoken --version` | Show version information | ### Interactive Mode Start the interactive session: ```bash ultratoken ``` ``` šŸš€ UltraToken TikToken Utility Interactive Mode - Type words to get token counts Commands: "jump" to exit, "help" for help ultratoken hello world "hello world" = 2 tokens ultratoken programming "programming" = 2 tokens ultratoken The quick brown fox "The quick brown fox" = 4 tokens ultratoken jump UltraToken terminated. Goodbye! ``` ### Single Text Analysis Get token count for any text: ```bash ultratoken "machine learning" # Output: # Word: machine learning # Tokens: 2 ultratoken "The quick brown fox jumps over the lazy dog" # Output: # Word: The quick brown fox jumps over the lazy dog # Tokens: 9 ``` ### Economy Mode - Batch Processing Process a file containing a list of words. UltraToken will append the token count to each line: **Input file (words.txt):** ``` hello world artificial intelligence machine learning programming natural language processing ``` **Command:** ```bash ultratoken economy words.txt ``` **Output file (words.txt) after processing:** ``` hello world 2 artificial intelligence 3 machine learning 2 programming 2 natural language processing 4 ``` ## Documentation For detailed documentation and advanced features, see Documentation.md ## License [MIT](https://choosealicense.com/licenses/mit/) This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgements - [**Civil McKnight**](https://x.com/LILCUJOCOLLINS) - Project Ultra - [TrinityAI](https://github.com/matiassingers/awesome-readme) - [Project Ultra](https://bulldogjob.com/news/449-how-to-write-a-good-readme-for-your-github-project) ## About UltraToken is developed by [TrinityAI Research](https://research.trinityai.us), specializing in advanced LLM development. ## šŸ”— Links