llm-md
Version:
Convert JSON to Markdown optimized for LLM consumption
469 lines (350 loc) • 10.4 kB
Markdown
# llm-md
Convert JSON to Markdown optimized for LLM consumption. Automatically detects JSON structure and converts to the most LLM-friendly Markdown format based on research-backed strategies.
[](https://www.npmjs.com/package/llm-md)
[](https://github.com/mslavov/llm-md)
[](LICENSE)
## Why llm-md?
Format choice significantly impacts LLM performance - up to **48% accuracy variation** and **16-60% token efficiency differences** based on [empirical research](tmp/research.md). This library implements research-backed conversion strategies that optimize for:
- **Accuracy**: Markdown achieves 16% better comprehension than JSON
- **Efficiency**: Automatic format selection saves 16% tokens on average
- **Intelligence**: YAML for deep nesting (62% accuracy), tables for uniform arrays (52% accuracy), key-value for lookups (61% accuracy)
## Installation
```bash
npm install llm-md
```
## Quick Start
```typescript
import llmd from 'llm-md';
// Simple conversion - automatic strategy detection
const markdown = llmd({ name: 'Alice', age: 30, active: true });
console.log(markdown);
// - **name**: Alice
// - **age**: 30
// - **active**: true
```
## CLI Usage
llm-md can be used from the command line to convert JSON files or stdin input to Markdown.
### Installation
```bash
# Install globally for CLI usage
npm install -g llm-md
```
### Basic Usage
```bash
# Convert from file to stdout
llm-md input.json
# Pipe JSON through llm-md
cat data.json | llm-md
# Save output to file
llm-md input.json -o output.md
llm-md input.json > output.md
# Pipe with redirection
cat api-response.json | llm-md > formatted.md
```
### CLI Options
```bash
Options:
-o, --output <file> Write output to file (default: stdout)
--verbose Include metadata about conversion strategy
--strategy <name> Force specific strategy (table, yaml-block, key-value,
hybrid, numbered-list)
--max-depth <n> Set maximum depth for analysis (default: 10)
-h, --help Show help message
-v, --version Show version number
```
### CLI Examples
```bash
# Show conversion metadata
llm-md data.json --verbose
# Force table format
llm-md users.json --strategy table
# Limit depth for deep structures
llm-md config.json --max-depth 15
# Combine options
llm-md api-data.json --strategy yaml-block --verbose -o output.md
```
## Features
- 🎯 **Automatic Detection**: Analyzes JSON structure and picks optimal format
- 📊 **Smart Tables**: Converts uniform arrays to clean Markdown tables (>80% key similarity)
- 🌳 **Deep Nesting**: Uses YAML blocks for deeply nested objects (depth > 3)
- 🔑 **Key-Value Lists**: Formats simple objects with bold keys for clarity
- 📝 **Mixed Arrays**: Handles heterogeneous data with numbered lists
- 🚀 **Zero Config**: Works out of the box with sensible, research-backed defaults
- 🔒 **Type Safe**: Full TypeScript support with comprehensive type definitions
- ⚡ **Fast**: <100ms for 1MB JSON files
## Conversion Strategies
llm-md automatically selects from 5 conversion strategies based on your data structure:
### 1. Markdown Table (Uniform Arrays)
**When:** Array of objects with >80% key similarity
```typescript
const users = [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'user' }
];
console.log(llmd(users));
```
Output:
```markdown
| id | name | role |
|----|-------|-------|
| 1 | Alice | admin |
| 2 | Bob | user |
```
### 2. Key-Value List (Simple Objects)
**When:** Shallow objects with depth ≤ 1
```typescript
const user = {
name: 'John Doe',
email: 'john@example.com',
active: true
};
console.log(llmd(user));
```
Output:
```markdown
- **name**: John Doe
- **email**: john@example.com
- **active**: true
```
### 3. YAML Block (Very Deep Nesting)
**When:** Objects with depth > 15
```typescript
const config = {
app: {
settings: {
theme: 'dark',
features: {
chat: true,
video: false
}
}
}
};
console.log(llmd(config));
```
Output:
````markdown
```yaml
app:
settings:
theme: dark
features:
chat: true
video: false
```
````
### 4. Hybrid Strategy (Medium Depth)
**When:** Objects with depth 2-15 and multiple sections, or >10 keys
```typescript
const project = {
info: {
name: 'llm-md',
version: '1.0.0'
},
author: {
name: 'Dev Team',
email: 'dev@example.com'
}
};
console.log(llmd(project));
```
Output:
```markdown
## info
- **name**: llm-md
- **version**: 1.0.0
## author
- **name**: Dev Team
- **email**: dev@example.com
```
### 5. Numbered List (Mixed Arrays)
**When:** Arrays with different types or non-uniform objects
```typescript
const mixed = [1, 'text', { key: 'value' }, [1, 2, 3], null];
console.log(llmd(mixed));
```
Output:
```markdown
1. `1`
2. `"text"`
3. `{"key":"value"}`
4. `[1,2,3]`
5. `null`
```
## Advanced Usage
### Verbose Mode (with Metadata)
```typescript
import { convertVerbose } from 'llm-md';
const result = convertVerbose(data, { verbose: true });
console.log(result.markdown);
console.log(result.metadata);
// {
// strategy: 'table',
// depth: 1,
// confidence: 1,
// tokensEstimate: 42
// }
```
### Force Strategy
Override automatic detection when you know best:
```typescript
import { convert } from 'llm-md';
// Force YAML even for shallow data
const markdown = convert(data, { forceStrategy: 'yaml-block' });
```
### Custom Options
Fine-tune the behavior:
```typescript
const markdown = convert(data, {
maxTableColumns: 10, // Limit table width (default: 15)
yamlThreshold: 4, // Depth to trigger YAML (default: 3)
tableSimilarityThreshold: 0.9, // Key similarity for tables (default: 0.8)
maxDepth: 20, // Max recursion depth (default: 10)
verbose: true // Include metadata (default: false)
});
```
## API Reference
### `convert(data, options?): string`
Main conversion function. Returns markdown string.
```typescript
function convert(data: unknown, options?: ConversionOptions): string
```
### `convertVerbose(data, options?): ConversionResult`
Returns markdown with metadata about the conversion.
```typescript
function convertVerbose(
data: unknown,
options?: ConversionOptions
): ConversionResult
```
### `analyze(data, options?): AnalysisResult`
Analyze data structure without converting. Useful for understanding strategy selection.
```typescript
import { analyze } from 'llm-md';
const analysis = analyze(myData);
console.log(analysis.strategy); // 'table' | 'key-value' | 'yaml-block' | ...
console.log(analysis.depth); // Maximum nesting depth
console.log(analysis.uniformity); // 0-1 score for array uniformity
```
### Types
```typescript
interface ConversionOptions {
verbose?: boolean; // Return metadata
forceStrategy?: Strategy; // Override detection
maxTableColumns?: number; // Table width limit
maxDepth?: number; // Recursion limit
yamlThreshold?: number; // Depth for YAML
tableSimilarityThreshold?: number; // Key similarity threshold
}
type Strategy =
| 'table'
| 'key-value'
| 'yaml-block'
| 'hybrid'
| 'numbered-list';
interface ConversionResult {
markdown: string;
metadata?: {
strategy: Strategy;
depth: number;
confidence: number;
tokensEstimate?: number;
};
}
```
## Real-World Examples
### API Response
```typescript
const apiResponse = {
status: 'success',
data: {
users: [
{ id: 1, username: 'alice', email: 'alice@example.com' },
{ id: 2, username: 'bob', email: 'bob@example.com' }
],
total: 2,
page: 1
}
};
console.log(llmd(apiResponse));
```
### Configuration File
```typescript
const config = {
database: {
host: 'localhost',
port: 5432,
credentials: {
username: 'admin',
password: '****'
}
},
cache: {
enabled: true,
ttl: 3600
}
};
console.log(llmd(config));
```
### GitHub Repository Data
```typescript
const repo = {
name: 'llm-md',
description: 'Convert JSON to Markdown for LLMs',
stars: 100,
language: 'TypeScript',
owner: {
name: 'Developer',
email: 'dev@example.com'
}
};
console.log(llmd(repo));
```
## Research-Based Design
llm-md implements strategies based on empirical research:
- **Tables for uniform data**: 52% LLM comprehension accuracy
- **YAML for deep nesting**: 62% accuracy vs 43% for alternatives
- **Key-value for lookups**: 61% accuracy with bold key formatting
- **Avoid CSV**: Only 44% accuracy despite token efficiency
- **>80% similarity threshold**: Optimal point for table conversion
- **Depth > 3**: Automatic YAML switch for deeply nested objects
- **≤15 columns**: Tables become unwieldy beyond this width
These thresholds are baked into the analyzer to ensure optimal LLM comprehension without configuration.
## Edge Cases Handled
- ✅ Circular references (detected and marked)
- ✅ Null and undefined values
- ✅ Boolean values (rendered as ✓/✗ in tables)
- ✅ Special characters (properly escaped)
- ✅ Mixed-type arrays
- ✅ Empty arrays and objects
- ✅ Deeply nested structures (up to configurable limit)
## Performance
- Converts 1MB JSON in <100ms
- Memory-efficient streaming for large datasets
- Token estimation included in verbose mode
- 89% test coverage with comprehensive test suite
## Development
```bash
# Install dependencies
npm install
# Run tests
npm test
# Run tests with coverage
npm run test:coverage
# Build
npm run build
# Lint
npm run lint
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT
## Acknowledgments
Based on research analyzing LLM format comprehension across 1000+ test cases and 11 data formats. See [research documentation](tmp/research.md) for details.
## Related Projects
- [json2md](https://github.com/IonicaBizau/json2md) - JSON to Markdown converter with custom templates
- [tablemark](https://github.com/citycide/tablemark) - Specialized Markdown table generator
- [TOON](https://github.com/toon-format/spec) - Token-oriented object notation for LLMs
**Made with ❤️ for better LLM interactions**