arela
Version:
AI-powered CTO with multi-agent orchestration, code summarization, visual testing (web + mobile) for blazing fast development.
385 lines (294 loc) • 10.4 kB
Markdown
# Deep Research Request: TOON (Token-Oriented Object Notation) for Arela's Context Compression
## Context
Arela is building an AI Technical Co-Founder with a critical challenge: **token efficiency**.
From our strategic research:
> "Arela is a context router that turns 200k tokens of random junk into 20k tokens of highly compressed, semantically dense input per call."
**Current approach:** Use JSON with compression techniques (IDs, delta updates, hierarchical context)
**New discovery:** TOON (Token-Oriented Object Notation) - a format specifically designed to minimize LLM token consumption
## The Problem We're Solving
### Token Waste in Current Systems
**Example: Sending a file to an LLM**
**JSON (current):**
```json
{
"file": "src/auth/login.ts",
"type": "typescript",
"functions": [
{
"name": "handleLogin",
"parameters": ["email", "password"],
"returnType": "Promise<User>",
"lineStart": 10,
"lineEnd": 25
}
],
"imports": [
{
"source": "./database",
"items": ["getUserByEmail"]
}
]
}
```
**Token count:** ~150 tokens
**What if TOON can do this in 50 tokens?** That's 3x compression!
### Our Use Cases
1. **Slice Detection Context**
- Send graph of 500 files to LLM
- Current: ~50k tokens
- With TOON: ~15k tokens? (3x savings)
2. **Contract Generation**
- Send API endpoint metadata
- Current: ~10k tokens per slice
- With TOON: ~3k tokens?
3. **Multi-Agent Communication**
- Codex → Claude handoff
- Current: Full JSON context
- With TOON: Compressed context
4. **MCP Tool Responses**
- arela_search results
- Current: JSON with full chunks
- With TOON: Token-optimized format
## Research Questions
### 1. What is TOON?
- **Official definition:** What is Token-Oriented Object Notation?
- **Creator/Origin:** Who built it? Academic paper or industry project?
- **Specification:** Is there a formal spec? GitHub repo?
- **Maturity:** Production-ready or experimental?
- **Adoption:** Who's using it? Any case studies?
### 2. How Does It Work?
- **Compression techniques:** What makes it more token-efficient than JSON?
- **Syntax:** What does TOON actually look like?
- **Parsing:** How do you encode/decode TOON?
- **Compatibility:** Can LLMs understand TOON natively or needs prompting?
- **Trade-offs:** What do you lose vs JSON (readability, structure, types)?
### 3. Token Savings
- **Benchmarks:** Real-world compression ratios (TOON vs JSON)
- **Use cases:** Where does TOON excel? Where does it fail?
- **Diminishing returns:** At what point is JSON already efficient enough?
- **Cost analysis:** Token savings → dollar savings at scale
### 4. LLM Compatibility
- **GPT-4/Claude:** Do they understand TOON out of the box?
- **Local models:** Does Ollama/Llama work with TOON?
- **Prompting:** Do you need special system prompts?
- **Accuracy:** Does compression hurt LLM understanding?
### 5. Implementation
- **Libraries:** Are there TypeScript/Python libraries for TOON?
- **Encoding:** How to convert JSON → TOON?
- **Decoding:** How to convert TOON → JSON?
- **Validation:** How to ensure TOON is valid?
- **Debugging:** How to debug TOON (is it human-readable)?
### 6. Integration with Arela
**Where to use TOON:**
- ✅ MCP tool responses (arela_search, graph queries)
- ✅ Agent communication (Codex → Claude handoffs)
- ✅ LLM prompts (slice detection, contract generation)
- ✅ RAG context (compressed chunks)
- ❌ Config files (keep JSON for human readability)
- ❌ API responses (keep JSON for compatibility)
**Migration strategy:**
- Phase 1: Internal LLM communication only
- Phase 2: MCP tool responses
- Phase 3: Agent handoffs
- Phase 4: RAG context compression
### 7. Comparison with Alternatives
**TOON vs JSON:**
- Token efficiency
- Readability
- Tooling support
- LLM compatibility
**TOON vs YAML:**
- Token efficiency
- Parsing speed
- Complexity
**TOON vs Protobuf/MessagePack:**
- Token efficiency (binary vs text)
- LLM compatibility (can LLMs read binary?)
**TOON vs Custom Compression:**
- Should we build our own token-optimized format?
- Or use TOON as-is?
### 8. Real-World Examples
**Example 1: File Metadata**
**JSON:**
```json
{
"path": "src/auth/login.ts",
"functions": ["handleLogin", "validateCredentials"],
"imports": ["bcrypt", "jsonwebtoken"]
}
```
**TOON (hypothetical):**
```
src/auth/login.ts|handleLogin,validateCredentials|bcrypt,jsonwebtoken
```
**Token savings:** 50 tokens → 15 tokens (3.3x)
**Example 2: Graph Node**
**JSON:**
```json
{
"id": "file_123",
"type": "typescript",
"dependencies": ["file_456", "file_789"],
"exports": ["User", "AuthService"]
}
```
**TOON (hypothetical):**
```
123:ts:456,789:User,AuthService
```
**Token savings:** 80 tokens → 20 tokens (4x)
### 9. Performance Considerations
- **Encoding speed:** How fast is JSON → TOON conversion?
- **Decoding speed:** How fast is TOON → JSON conversion?
- **Memory usage:** Does TOON reduce memory footprint?
- **Latency:** Does compression add latency to LLM calls?
### 10. Edge Cases & Limitations
- **Nested structures:** How does TOON handle deep nesting?
- **Large arrays:** Does TOON compress arrays efficiently?
- **Special characters:** How to escape/encode special chars?
- **Unicode:** Does TOON support Unicode?
- **Null/undefined:** How are these represented?
- **Type safety:** Does TOON preserve types (string vs number)?
## What We Need
### 1. Technical Specification
- Official TOON spec or documentation
- Syntax guide with examples
- Encoding/decoding algorithms
- Validation rules
### 2. Benchmarks
- Token count comparisons (TOON vs JSON)
- Real-world use cases with measurements
- Cost analysis (token savings → dollar savings)
- Performance benchmarks (encoding/decoding speed)
### 3. Implementation Guide
- TypeScript library for TOON (if exists)
- Code examples for encoding/decoding
- Integration patterns with LLMs
- Best practices and gotchas
### 4. Compatibility Matrix
- Which LLMs support TOON?
- Does it require special prompting?
- Accuracy impact (does compression hurt understanding?)
- Fallback strategies if TOON fails
### 5. Migration Strategy
- How to adopt TOON incrementally?
- Where to use TOON vs keep JSON?
- Backward compatibility considerations
- Testing and validation approach
## Success Criteria
TOON is worth adopting if:
1. **Token savings ≥ 2x** (vs optimized JSON)
2. **LLM accuracy ≥ 95%** (vs JSON baseline)
3. **Encoding/decoding < 10ms** (per operation)
4. **Works with Ollama** (local model support)
5. **TypeScript library exists** (or easy to build)
6. **Production-ready** (not experimental)
## Specific Questions for Validation
1. **Is TOON real and production-ready?**
- Or is it a concept/proposal?
- Who's using it at scale?
2. **What are the actual token savings?**
- Benchmarks on real data (not toy examples)
- Does it work for our use cases (graph data, file metadata)?
3. **Can we integrate it in 1 week?**
- Is there a library or do we build from scratch?
- How much refactoring is needed?
4. **What's the ROI?**
- Token savings → cost savings
- Is it worth the complexity?
- When does it pay for itself?
5. **What are the risks?**
- LLM compatibility issues?
- Debugging difficulty?
- Maintenance burden?
## Our Specific Use Case: Slice Detection
**Current approach (JSON):**
```json
{
"files": [
{
"id": 1,
"path": "src/auth/login.ts",
"imports": [2, 3],
"functions": ["handleLogin"]
},
// ... 500 more files
]
}
```
**Estimated tokens:** 50,000
**With TOON (hypothetical):**
```
1:src/auth/login.ts:2,3:handleLogin
2:src/auth/user.ts:4,5:getUser
...
```
**Estimated tokens:** 15,000 (3x savings)
**Impact:**
- Faster LLM calls (less data to process)
- Cheaper (3x fewer tokens)
- Better context (can fit more files in context window)
## Expected Output
Please provide:
1. **Executive Summary** (1 page)
- What is TOON?
- Is it production-ready?
- Should Arela adopt it?
- Recommended approach
2. **Technical Deep Dive** (3-5 pages)
- TOON specification and syntax
- Encoding/decoding algorithms
- Token savings benchmarks
- LLM compatibility analysis
3. **Implementation Plan** (2-3 pages)
- Phase 1: Proof of concept
- Phase 2: MCP integration
- Phase 3: Agent communication
- Phase 4: Full adoption
- Timeline and effort estimates
4. **Comparative Analysis** (2 pages)
- TOON vs JSON
- TOON vs YAML
- TOON vs custom compression
- When to use each
5. **Code Examples** (if available)
- TypeScript encoding/decoding
- Integration with LLM calls
- Before/after comparisons
6. **Risk Assessment** (1 page)
- What could go wrong?
- Mitigation strategies
- Fallback plans
7. **References**
- Official TOON documentation
- GitHub repos or libraries
- Blog posts and case studies
- Benchmarks and evaluations
## Context from Previous Research
We've already identified token efficiency as critical:
- **3-layer architecture:** Minimize tokens sent to big models
- **Symbol tables:** Use IDs instead of raw data
- **Delta updates:** Send diffs, not full files
- **Hierarchical context:** Summary → drill down
**TOON could be the missing piece** - a standardized format for token-optimized data serialization.
## Integration with Meta-RAG
If we adopt both TOON and Meta-RAG:
```
Query → Meta-RAG (classify + route) → Retrieve data → Encode as TOON → Send to LLM
```
**Combined impact:**
- Meta-RAG: Right context (quality)
- TOON: Compressed context (quantity)
- Result: 10x better context efficiency
## Timeline
**Urgency:** HIGH
- Could impact v4.0.1 (slice extraction) if fast to implement
- Definitely want for v4.2.0 (Meta-RAG)
- Research should complete in 1 day (faster than Meta-RAG)
## Audience
- **Primary:** Arela development team (immediate implementation)
- **Secondary:** AI community (potential contribution back)
---
**Please research this IMMEDIATELY. If TOON delivers 2-3x token savings, it's a no-brainer for Arela.**
**This could be the difference between fitting 500 files vs 1,500 files in context. That's a game-changer for slice detection and architecture analysis.**
**Focus on: Is it real? Does it work? Can we ship it in 1 week?**