meld
Version:
Meld: A template language for LLM prompts
165 lines (125 loc) • 7.17 kB
Markdown
# LLMXML Integration Improvements
## Overview
This document outlines improvements made to the LLMXML integration in Meld, focusing on resolving issues with field access in data variables and ensuring proper handling of non-string values in the output pipeline.
## Background
The Meld system previously used a custom LLMXML wrapper to handle limitations in the llmxml library. With recent updates to the llmxml library (version 1.3.0+), many of these limitations have been addressed, allowing us to use the library directly without a custom wrapper.
## Issues Addressed
1. **Field Access in Data Variables**: When accessing fields of objects using dot notation (e.g., `{{person.name}}`), the system would encounter errors because non-string values weren't properly stringified before being passed to the LLMXML converter.
2. **LLMXML Conversion Errors**: The error `TypeError: result.replace is not a function` would occur when attempting to convert output containing object values.
3. **Promise-based API**: The llmxml library uses a Promise-based API for its `toXML` method, which requires proper handling with async/await or Promise chaining.
## Implemented Solutions
### 1. Enhanced OutputService
We've improved the `OutputService.convertToXML` method to handle conversion failures gracefully and properly handle the Promise-based API:
```typescript
private async convertToXML(
nodes: MeldNode[],
state: IStateService,
options?: OutputOptions
): Promise<string> {
try {
// First convert to markdown since XML is based on markdown
const markdown = await this.convertToMarkdown(nodes, state, options);
// Use llmxml directly with version 1.3.0+ which handles JSON content properly
const { createLLMXML } = await import('llmxml');
const llmxml = createLLMXML({
defaultFuzzyThreshold: 0.7,
includeHlevel: false,
includeTitle: false,
tagFormat: 'PascalCase',
verbose: false,
warningLevel: 'all'
});
try {
// Note: toXML returns a Promise that resolves to a string
return await llmxml.toXML(markdown);
} catch (error) {
// If conversion fails due to non-string values, try to preprocess JSON objects
logger.warn('First attempt to convert to XML failed, attempting to preprocess markdown', {
error: error instanceof Error ? error.message : String(error)
});
// Try to find and stringify any JSON objects in the markdown
const processedMarkdown = markdown.replace(/```json\n([\s\S]*?)```/g, (match, jsonContent) => {
try {
// Parse and stringify the JSON to ensure it's valid
const parsed = JSON.parse(jsonContent);
return '```json\n' + JSON.stringify(parsed, null, 2) + '\n```';
} catch (jsonError) {
// If parsing fails, return the original content
return match;
}
});
// Try again with processed markdown
return await llmxml.toXML(processedMarkdown);
}
} catch (error) {
throw new MeldOutputError(
'Failed to convert output',
'xml',
{ cause: error instanceof Error ? error : undefined }
);
}
}
```
### 2. Improved Variable Resolution
We've enhanced the `VariableReferenceResolver` to properly handle field access and ensure values are stringified correctly:
```typescript
private async resolveVariable(varRef: string, context: ResolutionContext): Promise<string> {
// Split by dot for field access
const parts = varRef.split('.');
const baseVar = parts[0];
// Try to get variable from state
let value = await this.getVariable(baseVar, context);
// Handle field access (e.g., user.name)
if (parts.length > 1 && typeof value === 'object' && value !== null) {
try {
// Resolve field access
value = await this.resolveFieldAccess(value, parts.slice(1), context);
} catch (error) {
// Handle errors gracefully
return `Error accessing ${parts.slice(1).join('.')}: ${(error as Error).message}`;
}
}
// Stringification logic - IMPORTANT for avoiding output conversion errors
if (value === undefined || value === null) {
return '';
} else if (typeof value === 'object') {
if (parts.length === 1) {
// We're not doing field access, stringify the whole object
return JSON.stringify(value, null, 2);
} else {
// We were doing field access - only stringify if the result is still an object
return typeof value === 'object' ? JSON.stringify(value, null, 2) : String(value);
}
} else {
return String(value);
}
}
```
## Demonstration
We've created a demonstration script (`scripts/demo-field-access.js`) that shows how field access works with the direct llmxml library integration. The script:
1. Creates a sample data object with nested properties
2. Implements field access using dot notation
3. Processes variable references in a template
4. Converts the processed template to LLMXML format
The demonstration confirms that:
- Basic field access works correctly (e.g., `{{person.name}}` → `John Doe`)
- Nested field access works correctly (e.g., `{{person.contact.email}}` → `john@example.com`)
- Array access works with our implementation (e.g., `{{person.addresses[0].street}}` → `123 Main St`)
- Full objects are properly stringified when accessed directly
Key findings from the demonstration:
1. The llmxml library's `toXML` method returns a Promise, requiring async/await handling
2. The library correctly processes stringified JSON objects without errors
3. Field access must be implemented at the variable resolution level, before the content is passed to the LLMXML converter
## Testing
We've created test files to verify the functionality:
1. `tests/field-access.test.js`: A test suite that verifies proper serialization of JSON objects and field access.
2. `scripts/test-field-access.js`: A script that demonstrates field access using both the direct implementation and the standard processor.
3. `scripts/demo-field-access.js`: A demonstration of field access with the direct llmxml library integration.
## Future Recommendations
1. **Complete Integration**: Fully integrate the field access improvements into the core Meld codebase, ensuring all components handle object values correctly.
2. **Array Access Support**: Extend the field access implementation to support array access using square bracket notation (e.g., `{{person.addresses[0].street}}`).
3. **Error Handling**: Improve error messages for field access failures to provide more helpful debugging information.
4. **Performance Optimization**: Consider caching resolved field access results to improve performance for repeated access to the same fields.
5. **Promise Handling**: Ensure all interactions with the llmxml library properly handle its Promise-based API.
## Conclusion
By directly using the updated llmxml library and ensuring proper stringification of values throughout the pipeline, we've resolved the field access issues in Meld. These improvements make the templating system more robust and flexible, allowing for more complex data structures to be used in templates.