meld

Version:

Meld: A template language for LLM prompts

85 lines (58 loc) • 4.07 kB

Markdown

# LLMXML Library Improvement Opportunities ## Background While implementing section extraction fixes in the EmbedDirectiveHandler, we identified several limitations and potential improvement areas in our llmxml library. This document captures these findings to help improve the core library rather than just implementing workarounds. ## Current Limitations ### 1. Section Extraction Reliability - **Issue**: The `getSection()` method fails silently on certain Markdown structures - **Details**: When extracting sections from content with standard markdown headings (e.g., "## Section Two"), the library sometimes fails to find sections that are clearly present - **Impact**: The EmbedDirectiveHandler needs to implement fallback section extraction to work around this reliability issue ### 2. Error Handling and Diagnostics - **Issue**: The library provides limited error information when section extraction fails - **Details**: When a section isn't found, it returns null without any context about available sections or why the match failed - **Impact**: Makes debugging difficult as users don't know if the section name is slightly different or not present at all ### 3. Fuzzy Matching Configurability - **Issue**: The fuzzy matching threshold is set at library creation time - **Details**: Can't configure fuzzy matching on a per-call basis without creating multiple instances - **Impact**: Reduced flexibility for callers who need different matching thresholds for different content types ### 4. Heading Identification - **Issue**: No mechanism to list available headings in content - **Details**: When extraction fails, it would be helpful to know what headings are available - **Impact**: Forces callers to implement their own heading extraction logic for error messages ## Suggested Improvements ### Short-term (High Priority) 1. **Enhanced Error Reporting** - Return structured errors with context about failure reasons - Include list of available sections in the document when section not found - Add diagnostic information about closest matches when using fuzzy matching 2. **Per-call Configuration** - Allow fuzzy threshold to be specified per method call, not just at initialization - Add optional parameters to control behavior like case sensitivity ### Medium-term (Medium Priority) 3. **Metadata Access Functions** - Add method to extract and return all headings in a document - Provide heading hierarchy information (levels, parent-child relationships) - Allow querying document structure without extraction 4. **Flexible Matching Options** - Support regular expressions for section names - Add options for partial matching beyond fuzzy matching - Support matching by heading level (e.g., "get the first h2 section") ### Long-term (Lower Priority) 5. **Performance Optimization** - Benchmark and optimize section extraction for larger documents - Consider caching document structure for repeated extractions - Implement streaming extraction for very large documents 6. **Library Integration** - Better integration with common Markdown parsers - Provide TypeScript type definitions for better developer experience - Add content transformation options (e.g., normalize whitespace) ## Implementation Notes The fallback section extraction we implemented in ResolutionService demonstrates some of these concepts: 1. We extract all headings to provide better error messages 2. We implement manual fuzzy matching based on Levenshtein distance 3. We properly handle heading levels and section boundaries 4. We provide detailed logging for diagnostics These techniques could be incorporated into the llmxml library to improve its capabilities rather than duplicating this logic in every consumer. ## Next Steps 1. Create GitHub issues in the llmxml repository for each major improvement area 2. Consider adding more comprehensive tests to verify behavior with various markdown structures 3. Prioritize the error reporting and per-call configuration improvements to support immediate needs