UNPKG

llm-md

Version:

Convert JSON to Markdown optimized for LLM consumption

469 lines (350 loc) 10.4 kB
# llm-md Convert JSON to Markdown optimized for LLM consumption. Automatically detects JSON structure and converts to the most LLM-friendly Markdown format based on research-backed strategies. [![NPM Version](https://img.shields.io/npm/v/llm-md.svg)](https://www.npmjs.com/package/llm-md) [![Test Coverage](https://img.shields.io/badge/coverage-89%25-brightgreen.svg)](https://github.com/mslavov/llm-md) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) ## Why llm-md? Format choice significantly impacts LLM performance - up to **48% accuracy variation** and **16-60% token efficiency differences** based on [empirical research](tmp/research.md). This library implements research-backed conversion strategies that optimize for: - **Accuracy**: Markdown achieves 16% better comprehension than JSON - **Efficiency**: Automatic format selection saves 16% tokens on average - **Intelligence**: YAML for deep nesting (62% accuracy), tables for uniform arrays (52% accuracy), key-value for lookups (61% accuracy) ## Installation ```bash npm install llm-md ``` ## Quick Start ```typescript import llmd from 'llm-md'; // Simple conversion - automatic strategy detection const markdown = llmd({ name: 'Alice', age: 30, active: true }); console.log(markdown); // - **name**: Alice // - **age**: 30 // - **active**: true ``` ## CLI Usage llm-md can be used from the command line to convert JSON files or stdin input to Markdown. ### Installation ```bash # Install globally for CLI usage npm install -g llm-md ``` ### Basic Usage ```bash # Convert from file to stdout llm-md input.json # Pipe JSON through llm-md cat data.json | llm-md # Save output to file llm-md input.json -o output.md llm-md input.json > output.md # Pipe with redirection cat api-response.json | llm-md > formatted.md ``` ### CLI Options ```bash Options: -o, --output <file> Write output to file (default: stdout) --verbose Include metadata about conversion strategy --strategy <name> Force specific strategy (table, yaml-block, key-value, hybrid, numbered-list) --max-depth <n> Set maximum depth for analysis (default: 10) -h, --help Show help message -v, --version Show version number ``` ### CLI Examples ```bash # Show conversion metadata llm-md data.json --verbose # Force table format llm-md users.json --strategy table # Limit depth for deep structures llm-md config.json --max-depth 15 # Combine options llm-md api-data.json --strategy yaml-block --verbose -o output.md ``` ## Features - 🎯 **Automatic Detection**: Analyzes JSON structure and picks optimal format - 📊 **Smart Tables**: Converts uniform arrays to clean Markdown tables (>80% key similarity) - 🌳 **Deep Nesting**: Uses YAML blocks for deeply nested objects (depth > 3) - 🔑 **Key-Value Lists**: Formats simple objects with bold keys for clarity - 📝 **Mixed Arrays**: Handles heterogeneous data with numbered lists - 🚀 **Zero Config**: Works out of the box with sensible, research-backed defaults - 🔒 **Type Safe**: Full TypeScript support with comprehensive type definitions - **Fast**: <100ms for 1MB JSON files ## Conversion Strategies llm-md automatically selects from 5 conversion strategies based on your data structure: ### 1. Markdown Table (Uniform Arrays) **When:** Array of objects with >80% key similarity ```typescript const users = [ { id: 1, name: 'Alice', role: 'admin' }, { id: 2, name: 'Bob', role: 'user' } ]; console.log(llmd(users)); ``` Output: ```markdown | id | name | role | |----|-------|-------| | 1 | Alice | admin | | 2 | Bob | user | ``` ### 2. Key-Value List (Simple Objects) **When:** Shallow objects with depth 1 ```typescript const user = { name: 'John Doe', email: 'john@example.com', active: true }; console.log(llmd(user)); ``` Output: ```markdown - **name**: John Doe - **email**: john@example.com - **active**: true ``` ### 3. YAML Block (Very Deep Nesting) **When:** Objects with depth > 15 ```typescript const config = { app: { settings: { theme: 'dark', features: { chat: true, video: false } } } }; console.log(llmd(config)); ``` Output: ````markdown ```yaml app: settings: theme: dark features: chat: true video: false ``` ```` ### 4. Hybrid Strategy (Medium Depth) **When:** Objects with depth 2-15 and multiple sections, or >10 keys ```typescript const project = { info: { name: 'llm-md', version: '1.0.0' }, author: { name: 'Dev Team', email: 'dev@example.com' } }; console.log(llmd(project)); ``` Output: ```markdown ## info - **name**: llm-md - **version**: 1.0.0 ## author - **name**: Dev Team - **email**: dev@example.com ``` ### 5. Numbered List (Mixed Arrays) **When:** Arrays with different types or non-uniform objects ```typescript const mixed = [1, 'text', { key: 'value' }, [1, 2, 3], null]; console.log(llmd(mixed)); ``` Output: ```markdown 1. `1` 2. `"text"` 3. `{"key":"value"}` 4. `[1,2,3]` 5. `null` ``` ## Advanced Usage ### Verbose Mode (with Metadata) ```typescript import { convertVerbose } from 'llm-md'; const result = convertVerbose(data, { verbose: true }); console.log(result.markdown); console.log(result.metadata); // { // strategy: 'table', // depth: 1, // confidence: 1, // tokensEstimate: 42 // } ``` ### Force Strategy Override automatic detection when you know best: ```typescript import { convert } from 'llm-md'; // Force YAML even for shallow data const markdown = convert(data, { forceStrategy: 'yaml-block' }); ``` ### Custom Options Fine-tune the behavior: ```typescript const markdown = convert(data, { maxTableColumns: 10, // Limit table width (default: 15) yamlThreshold: 4, // Depth to trigger YAML (default: 3) tableSimilarityThreshold: 0.9, // Key similarity for tables (default: 0.8) maxDepth: 20, // Max recursion depth (default: 10) verbose: true // Include metadata (default: false) }); ``` ## API Reference ### `convert(data, options?): string` Main conversion function. Returns markdown string. ```typescript function convert(data: unknown, options?: ConversionOptions): string ``` ### `convertVerbose(data, options?): ConversionResult` Returns markdown with metadata about the conversion. ```typescript function convertVerbose( data: unknown, options?: ConversionOptions ): ConversionResult ``` ### `analyze(data, options?): AnalysisResult` Analyze data structure without converting. Useful for understanding strategy selection. ```typescript import { analyze } from 'llm-md'; const analysis = analyze(myData); console.log(analysis.strategy); // 'table' | 'key-value' | 'yaml-block' | ... console.log(analysis.depth); // Maximum nesting depth console.log(analysis.uniformity); // 0-1 score for array uniformity ``` ### Types ```typescript interface ConversionOptions { verbose?: boolean; // Return metadata forceStrategy?: Strategy; // Override detection maxTableColumns?: number; // Table width limit maxDepth?: number; // Recursion limit yamlThreshold?: number; // Depth for YAML tableSimilarityThreshold?: number; // Key similarity threshold } type Strategy = | 'table' | 'key-value' | 'yaml-block' | 'hybrid' | 'numbered-list'; interface ConversionResult { markdown: string; metadata?: { strategy: Strategy; depth: number; confidence: number; tokensEstimate?: number; }; } ``` ## Real-World Examples ### API Response ```typescript const apiResponse = { status: 'success', data: { users: [ { id: 1, username: 'alice', email: 'alice@example.com' }, { id: 2, username: 'bob', email: 'bob@example.com' } ], total: 2, page: 1 } }; console.log(llmd(apiResponse)); ``` ### Configuration File ```typescript const config = { database: { host: 'localhost', port: 5432, credentials: { username: 'admin', password: '****' } }, cache: { enabled: true, ttl: 3600 } }; console.log(llmd(config)); ``` ### GitHub Repository Data ```typescript const repo = { name: 'llm-md', description: 'Convert JSON to Markdown for LLMs', stars: 100, language: 'TypeScript', owner: { name: 'Developer', email: 'dev@example.com' } }; console.log(llmd(repo)); ``` ## Research-Based Design llm-md implements strategies based on empirical research: - **Tables for uniform data**: 52% LLM comprehension accuracy - **YAML for deep nesting**: 62% accuracy vs 43% for alternatives - **Key-value for lookups**: 61% accuracy with bold key formatting - **Avoid CSV**: Only 44% accuracy despite token efficiency - **>80% similarity threshold**: Optimal point for table conversion - **Depth > 3**: Automatic YAML switch for deeply nested objects - **≤15 columns**: Tables become unwieldy beyond this width These thresholds are baked into the analyzer to ensure optimal LLM comprehension without configuration. ## Edge Cases Handled - Circular references (detected and marked) - Null and undefined values - Boolean values (rendered as ✓/✗ in tables) - Special characters (properly escaped) - Mixed-type arrays - Empty arrays and objects - Deeply nested structures (up to configurable limit) ## Performance - Converts 1MB JSON in <100ms - Memory-efficient streaming for large datasets - Token estimation included in verbose mode - 89% test coverage with comprehensive test suite ## Development ```bash # Install dependencies npm install # Run tests npm test # Run tests with coverage npm run test:coverage # Build npm run build # Lint npm run lint ``` ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT ## Acknowledgments Based on research analyzing LLM format comprehension across 1000+ test cases and 11 data formats. See [research documentation](tmp/research.md) for details. ## Related Projects - [json2md](https://github.com/IonicaBizau/json2md) - JSON to Markdown converter with custom templates - [tablemark](https://github.com/citycide/tablemark) - Specialized Markdown table generator - [TOON](https://github.com/toon-format/spec) - Token-oriented object notation for LLMs --- **Made with ❤️ for better LLM interactions**