meld

# llmxml ## Overview llmxml is a specialized library in our codebase that handles the conversion between Markdown and LLM-friendly XML formats. It plays a crucial role in our output formatting system, particularly for preparing content for large language model consumption. ## Role in Our Codebase ### Core Integration Points 1. **OutputService** - Primary integration point for llmxml - Uses llmxml for converting processed Meld content to LLM-friendly XML - Handles format selection between markdown and LLM XML - Manages error handling and format conversion failures 2. **CLIService** - Provides format selection options ('markdown' | 'xml') - Routes output through OutputService for llmxml processing - Handles format aliases and defaults 3. **TestContext** - Provides testing utilities for XML conversion - Helps validate output formatting in tests - Supports snapshot testing of XML output ## Key Features We Use 1. **Bidirectional Conversion** - Markdown to LLM-XML conversion - XML to Markdown conversion (when needed) - Preservation of document structure 2. **Section Handling** - Fuzzy section matching capabilities - Precise heading level control - Section extraction for imports 3. **Format Preservation** - Maintains code blocks and their language specifications - Preserves text formatting and structure - Handles special characters and escaping ## Integration Details ### Output Service Implementation ```typescript private async convertToLLMXML( nodes: MeldNode[], state: IStateService, options?: OutputOptions ): Promise<string> { // First convert to markdown format const markdown = await this.convertToMarkdown(nodes, state, opts); // Use llmxml for XML conversion const { createLLMXML } = await import('llmxml'); const llmxml = createLLMXML(); return llmxml.toXML(markdown); } ``` ### CLI Format Options ```typescript export interface CLIOptions { format?: 'markdown' | 'xml'; // llm uses llmxml for conversion // ... other options } ``` ## XML Format Specification The XML format produced by llmxml follows these conventions: 1. **Document Structure** ```xml <Title> Content <Section hlevel="2"> Content </Section> </Title> ``` 2. **Section Attributes** - `hlevel`: Indicates heading level (1-6) - `title`: Original section title - Nested sections maintain hierarchy 3. **Content Handling** - Code blocks preserved with language attributes - Special characters properly escaped - Whitespace and formatting maintained ## Error Handling We wrap llmxml errors in our custom error types: 1. **MeldLLMXMLError** - Handles section extraction failures - Provides context for conversion errors - Includes fuzzy matching details 2. **Error Recovery** - Fallback to plain text when conversion fails - Preservation of original content structure - Detailed error reporting ## Version and Compatibility We use llmxml version ^1.1.2 as specified in our package.json. The library is dynamically imported when needed to optimize loading time. ## Important Notes 1. **Performance Considerations** - Dynamic import for on-demand loading - Section extraction adds minimal overhead - Efficient handling of large documents 2. **Format Selection** - 'xml' format is the default in most cases - Format selection available via CLI and API - Backward compatibility with markdown output 3. **Testing Considerations** - Don't validate the XML formatting -- just presevation of content