UNPKG

arela

Version:

AI-powered CTO with multi-agent orchestration, code summarization, visual testing (web + mobile) for blazing fast development.

385 lines (294 loc) 10.4 kB
# Deep Research Request: TOON (Token-Oriented Object Notation) for Arela's Context Compression ## Context Arela is building an AI Technical Co-Founder with a critical challenge: **token efficiency**. From our strategic research: > "Arela is a context router that turns 200k tokens of random junk into 20k tokens of highly compressed, semantically dense input per call." **Current approach:** Use JSON with compression techniques (IDs, delta updates, hierarchical context) **New discovery:** TOON (Token-Oriented Object Notation) - a format specifically designed to minimize LLM token consumption ## The Problem We're Solving ### Token Waste in Current Systems **Example: Sending a file to an LLM** **JSON (current):** ```json { "file": "src/auth/login.ts", "type": "typescript", "functions": [ { "name": "handleLogin", "parameters": ["email", "password"], "returnType": "Promise<User>", "lineStart": 10, "lineEnd": 25 } ], "imports": [ { "source": "./database", "items": ["getUserByEmail"] } ] } ``` **Token count:** ~150 tokens **What if TOON can do this in 50 tokens?** That's 3x compression! ### Our Use Cases 1. **Slice Detection Context** - Send graph of 500 files to LLM - Current: ~50k tokens - With TOON: ~15k tokens? (3x savings) 2. **Contract Generation** - Send API endpoint metadata - Current: ~10k tokens per slice - With TOON: ~3k tokens? 3. **Multi-Agent Communication** - Codex → Claude handoff - Current: Full JSON context - With TOON: Compressed context 4. **MCP Tool Responses** - arela_search results - Current: JSON with full chunks - With TOON: Token-optimized format ## Research Questions ### 1. What is TOON? - **Official definition:** What is Token-Oriented Object Notation? - **Creator/Origin:** Who built it? Academic paper or industry project? - **Specification:** Is there a formal spec? GitHub repo? - **Maturity:** Production-ready or experimental? - **Adoption:** Who's using it? Any case studies? ### 2. How Does It Work? - **Compression techniques:** What makes it more token-efficient than JSON? - **Syntax:** What does TOON actually look like? - **Parsing:** How do you encode/decode TOON? - **Compatibility:** Can LLMs understand TOON natively or needs prompting? - **Trade-offs:** What do you lose vs JSON (readability, structure, types)? ### 3. Token Savings - **Benchmarks:** Real-world compression ratios (TOON vs JSON) - **Use cases:** Where does TOON excel? Where does it fail? - **Diminishing returns:** At what point is JSON already efficient enough? - **Cost analysis:** Token savings → dollar savings at scale ### 4. LLM Compatibility - **GPT-4/Claude:** Do they understand TOON out of the box? - **Local models:** Does Ollama/Llama work with TOON? - **Prompting:** Do you need special system prompts? - **Accuracy:** Does compression hurt LLM understanding? ### 5. Implementation - **Libraries:** Are there TypeScript/Python libraries for TOON? - **Encoding:** How to convert JSON → TOON? - **Decoding:** How to convert TOON → JSON? - **Validation:** How to ensure TOON is valid? - **Debugging:** How to debug TOON (is it human-readable)? ### 6. Integration with Arela **Where to use TOON:** - ✅ MCP tool responses (arela_search, graph queries) - ✅ Agent communication (Codex → Claude handoffs) - ✅ LLM prompts (slice detection, contract generation) - ✅ RAG context (compressed chunks) - ❌ Config files (keep JSON for human readability) - ❌ API responses (keep JSON for compatibility) **Migration strategy:** - Phase 1: Internal LLM communication only - Phase 2: MCP tool responses - Phase 3: Agent handoffs - Phase 4: RAG context compression ### 7. Comparison with Alternatives **TOON vs JSON:** - Token efficiency - Readability - Tooling support - LLM compatibility **TOON vs YAML:** - Token efficiency - Parsing speed - Complexity **TOON vs Protobuf/MessagePack:** - Token efficiency (binary vs text) - LLM compatibility (can LLMs read binary?) **TOON vs Custom Compression:** - Should we build our own token-optimized format? - Or use TOON as-is? ### 8. Real-World Examples **Example 1: File Metadata** **JSON:** ```json { "path": "src/auth/login.ts", "functions": ["handleLogin", "validateCredentials"], "imports": ["bcrypt", "jsonwebtoken"] } ``` **TOON (hypothetical):** ``` src/auth/login.ts|handleLogin,validateCredentials|bcrypt,jsonwebtoken ``` **Token savings:** 50 tokens → 15 tokens (3.3x) **Example 2: Graph Node** **JSON:** ```json { "id": "file_123", "type": "typescript", "dependencies": ["file_456", "file_789"], "exports": ["User", "AuthService"] } ``` **TOON (hypothetical):** ``` 123:ts:456,789:User,AuthService ``` **Token savings:** 80 tokens → 20 tokens (4x) ### 9. Performance Considerations - **Encoding speed:** How fast is JSON → TOON conversion? - **Decoding speed:** How fast is TOON → JSON conversion? - **Memory usage:** Does TOON reduce memory footprint? - **Latency:** Does compression add latency to LLM calls? ### 10. Edge Cases & Limitations - **Nested structures:** How does TOON handle deep nesting? - **Large arrays:** Does TOON compress arrays efficiently? - **Special characters:** How to escape/encode special chars? - **Unicode:** Does TOON support Unicode? - **Null/undefined:** How are these represented? - **Type safety:** Does TOON preserve types (string vs number)? ## What We Need ### 1. Technical Specification - Official TOON spec or documentation - Syntax guide with examples - Encoding/decoding algorithms - Validation rules ### 2. Benchmarks - Token count comparisons (TOON vs JSON) - Real-world use cases with measurements - Cost analysis (token savings → dollar savings) - Performance benchmarks (encoding/decoding speed) ### 3. Implementation Guide - TypeScript library for TOON (if exists) - Code examples for encoding/decoding - Integration patterns with LLMs - Best practices and gotchas ### 4. Compatibility Matrix - Which LLMs support TOON? - Does it require special prompting? - Accuracy impact (does compression hurt understanding?) - Fallback strategies if TOON fails ### 5. Migration Strategy - How to adopt TOON incrementally? - Where to use TOON vs keep JSON? - Backward compatibility considerations - Testing and validation approach ## Success Criteria TOON is worth adopting if: 1. **Token savings ≥ 2x** (vs optimized JSON) 2. **LLM accuracy ≥ 95%** (vs JSON baseline) 3. **Encoding/decoding < 10ms** (per operation) 4. **Works with Ollama** (local model support) 5. **TypeScript library exists** (or easy to build) 6. **Production-ready** (not experimental) ## Specific Questions for Validation 1. **Is TOON real and production-ready?** - Or is it a concept/proposal? - Who's using it at scale? 2. **What are the actual token savings?** - Benchmarks on real data (not toy examples) - Does it work for our use cases (graph data, file metadata)? 3. **Can we integrate it in 1 week?** - Is there a library or do we build from scratch? - How much refactoring is needed? 4. **What's the ROI?** - Token savings → cost savings - Is it worth the complexity? - When does it pay for itself? 5. **What are the risks?** - LLM compatibility issues? - Debugging difficulty? - Maintenance burden? ## Our Specific Use Case: Slice Detection **Current approach (JSON):** ```json { "files": [ { "id": 1, "path": "src/auth/login.ts", "imports": [2, 3], "functions": ["handleLogin"] }, // ... 500 more files ] } ``` **Estimated tokens:** 50,000 **With TOON (hypothetical):** ``` 1:src/auth/login.ts:2,3:handleLogin 2:src/auth/user.ts:4,5:getUser ... ``` **Estimated tokens:** 15,000 (3x savings) **Impact:** - Faster LLM calls (less data to process) - Cheaper (3x fewer tokens) - Better context (can fit more files in context window) ## Expected Output Please provide: 1. **Executive Summary** (1 page) - What is TOON? - Is it production-ready? - Should Arela adopt it? - Recommended approach 2. **Technical Deep Dive** (3-5 pages) - TOON specification and syntax - Encoding/decoding algorithms - Token savings benchmarks - LLM compatibility analysis 3. **Implementation Plan** (2-3 pages) - Phase 1: Proof of concept - Phase 2: MCP integration - Phase 3: Agent communication - Phase 4: Full adoption - Timeline and effort estimates 4. **Comparative Analysis** (2 pages) - TOON vs JSON - TOON vs YAML - TOON vs custom compression - When to use each 5. **Code Examples** (if available) - TypeScript encoding/decoding - Integration with LLM calls - Before/after comparisons 6. **Risk Assessment** (1 page) - What could go wrong? - Mitigation strategies - Fallback plans 7. **References** - Official TOON documentation - GitHub repos or libraries - Blog posts and case studies - Benchmarks and evaluations ## Context from Previous Research We've already identified token efficiency as critical: - **3-layer architecture:** Minimize tokens sent to big models - **Symbol tables:** Use IDs instead of raw data - **Delta updates:** Send diffs, not full files - **Hierarchical context:** Summary → drill down **TOON could be the missing piece** - a standardized format for token-optimized data serialization. ## Integration with Meta-RAG If we adopt both TOON and Meta-RAG: ``` Query → Meta-RAG (classify + route) → Retrieve data → Encode as TOON → Send to LLM ``` **Combined impact:** - Meta-RAG: Right context (quality) - TOON: Compressed context (quantity) - Result: 10x better context efficiency ## Timeline **Urgency:** HIGH - Could impact v4.0.1 (slice extraction) if fast to implement - Definitely want for v4.2.0 (Meta-RAG) - Research should complete in 1 day (faster than Meta-RAG) ## Audience - **Primary:** Arela development team (immediate implementation) - **Secondary:** AI community (potential contribution back) --- **Please research this IMMEDIATELY. If TOON delivers 2-3x token savings, it's a no-brainer for Arela.** **This could be the difference between fitting 500 files vs 1,500 files in context. That's a game-changer for slice detection and architecture analysis.** **Focus on: Is it real? Does it work? Can we ship it in 1 week?**