UNPKG

llmxml

Version:

Convert between markdown and LLM-friendly pseudo-XML

214 lines (169 loc) 5.43 kB
# LLMXML A library for converting between Markdown and LLM-friendly XML formats, with section extraction capabilities. ## Features - Bidirectional conversion between Markdown and LLM-XML - Fuzzy section matching and extraction - Precise heading level control - Configurable tag formatting and attribute output - Automatic preservation of JSON structures - Smart handling of code blocks ## Installation ```bash npm install llmxml ``` ## Quick Start ```typescript import { createLLMXML } from 'llmxml'; const llmxml = createLLMXML(); // Convert Markdown to LLM-XML const xml = await llmxml.toXML(` # Title ## Section Content with JSON: {"name":"John","age":30} `); // Result: // <Title> // Content with JSON: { // "name": "John", // "age": 30 // } // <Section> // Content // </Section> // </Title> // Convert LLM-XML to Markdown const markdown = await llmxml.toMarkdown(xml); // Extract sections const section = await llmxml.getSection(markdown, 'Section'); ``` ## Section Extraction Provides section extraction with fuzzy matching: ```typescript // Extract a single section with options const section = await llmxml.getSection(content, 'Setup Instructions', { level: 2, // Only match h2 headers (1-6) exact: false, // Require exact matches includeNested: true, // Include subsections fuzzyThreshold: 0.8 // Minimum match score (0-1) }); // Extract multiple matching sections const sections = await llmxml.getSections(content, 'setup', { // Same options as getSection fuzzyThreshold: 0.7 }); ``` ## Configuration Configure behavior when creating an instance: ```typescript const llmxml = createLLMXML({ // Default threshold for fuzzy matching (0-1) defaultFuzzyThreshold: 0.7, // Warning emission level warningLevel: 'all', // 'all' | 'none' | 'ambiguous-only', // Control XML attribute output includeTitle: false, // Include title attribute (default: false) includeHlevel: false, // Include hlevel attribute (default: false) verbose: false, // Include both title and hlevel (default: false) // Tag name formatting (default: 'PascalCase') tagFormat: 'PascalCase', // 'snake_case' | 'SCREAMING_SNAKE' | 'camelCase' | 'PascalCase' | 'UPPERCASE' }); // Examples with different configurations: const withAttributes = createLLMXML({ verbose: true }); const xml1 = await withAttributes.toXML('# Long Title'); // <LongTitle title="Long Title" hlevel="1"> const snakeCase = createLLMXML({ tagFormat: 'snake_case' }); const xml2 = await snakeCase.toXML('# Long Title'); // <long_title> ``` ## Round-trip Conversions For preserving document structure during round-trip conversions: ```typescript // Convert markdown to XML and back, preserving all structure const roundTripped = await llmxml.roundTrip(` # Title ## Section Content `); ``` ## Warning System Emits warnings for potentially ambiguous situations: ```typescript // Register warning handler llmxml.onWarning(warning => { // Warning structure: // { // code: 'AMBIGUOUS_MATCH' | 'UNKNOWN_WARNING' | etc, // message: string, // details: { // matches?: Array<{ // title: string, // score: { // exactMatch: boolean, // fuzzyScore: number, // contextualScore: number, // level: number, // // ... other scoring details // } // }>, // } // } }); ``` ## Error Handling Throws typed errors for various failure conditions: ```typescript try { const section = await llmxml.getSection(content, 'nonexistent'); } catch (error) { if (error.code === 'SECTION_NOT_FOUND') { console.log('Section not found:', error.message); } // Other error codes: // - PARSE_ERROR: Failed to parse document // - INVALID_FORMAT: Document format is invalid // - INVALID_LEVEL: Invalid header level // - INVALID_SECTION_OPTIONS: Invalid section extraction options } ``` ## Documentation - [API Reference](docs/API.md) - [LLM-XML Format](docs/LLMXML.md) ## Enhanced Error Handling and Diagnostics The library now provides detailed error information when section extraction fails: ```typescript try { const section = await llmxml.getSection(content, 'nonexistent'); } catch (error) { if (error.code === 'SECTION_NOT_FOUND') { console.log('Error:', error.message); // Access available headings const headings = error.details.availableHeadings; console.log('Available sections:', headings.map(h => h.title).join(', ')); // Access closest matches const suggestions = error.details.closestMatches; console.log('Did you mean:', suggestions.map(m => `"${m.title}" (similarity: ${Math.round(m.similarity * 100)}%)` ).join(', ')); } } ``` ## Document Structure Analysis The library now provides methods to analyze document structure: ```typescript // Get all headings in a document with hierarchical information const headings = await llmxml.getHeadings(content); /* Returns: [ { title: 'Main Title', level: 1, path: ['Main Title'] }, { title: 'Section One', level: 2, path: ['Main Title', 'Section One'] }, { title: 'Subsection', level: 3, path: ['Main Title', 'Section One', 'Subsection'] }, { title: 'Section Two', level: 2, path: ['Main Title', 'Section Two'] } ] */ // Display document outline headings.forEach(h => { console.log(`${' '.repeat(h.level - 1)}${h.title}`); }); ``` ## License MIT