UNPKG

meld-ast

Version:
693 lines (544 loc) 17.6 kB
# meld-ast A spec-compliant AST parser for the Meld scripting language, built with Peggy. This parser produces AST nodes that strictly conform to the `meld-spec` type definitions. ## Features - Full compliance with `meld-spec` type definitions - Built with Peggy for robust parsing - Environment-independent parser loading - Comprehensive error handling with location information - Validation against meld-spec types - Direct access to parser components - Support for both ESM and CommonJS - Source location tracking for all nodes - Comprehensive parsing for all Meld language constructs: - Text blocks - Code fences with advanced features: - Support for 3, 4, and 5 backticks - Proper nesting of code fences - Optional preservation of fence markers - Language identifier support - Comments (`>> comment`) - Variables: - Text variables (`{{var}}`) with format options (previously `${var}`) - Data variables (`{{data}}`) with fields (previously `#{data}`) - Array access for data variables (`{{array.0}}`, `{{data.users.0.name}}`) - Path variables (`$var`) - Directives: - `@run [command]` - `@import [path]` - `@import [https://example.com/file.md]` for URL imports - `@import [var1, var2] from [path.meld]` for named imports - `@import [var1, var2 as alias2] from [path.meld]` for named imports with aliases - `@define name = @directive [...]` with metadata fields - `@data identifier:schema = { ... }` - `@text name = "value"` - `@path name = "path"` - `@path name = "$HOMEPATH/path"` with special variables - `@embed [path]` for embedding file content - `@embed [https://example.com/content.md]` for URL content - `@embed [$path_variable]` for path variables - `@embed [$path_variable/{{variable}}]` for paths and regular variables - `@embed [{{variable}}]` for variables within brackets - `@embed {{variable}}` for direct variable embedding - `@embed [[...]]` for multiline content - Path directives with special variables - Error recovery - Extensible AST format - Multi-file processing ## Installation ```bash npm install meld-ast meld-spec ``` Note: `meld-spec` is a peer dependency and must be installed alongside `meld-ast`. ## Usage ### Basic Parsing ```typescript import { parse } from 'meld-ast'; const input = ` >> This is a comment Hello world @run [echo "Hello"] `; const { ast } = parse(input); ``` ### Advanced Usage with Options The parser supports several configuration options: ```typescript import { parse, ParserOptions, MeldAstError } from 'meld-ast'; const options: ParserOptions = { // Stop on first error (default: true) failFast: true, // Track source locations (default: true) trackLocations: true, // Validate nodes against meld-spec (default: true) validateNodes: true, // Preserve code fence markers in content (default: true) // When true, includes the opening/closing fence markers and language // When false, only includes content between fences preserveCodeFences: true, // Suppress warnings for undefined variables in paths (default: false) // When true, no warnings are emitted for undefined variables // When false, warnings are emitted for undefined variables variable_warning: false, // Custom error handler onError: (error: MeldAstError) => { console.warn(`Parse warning: ${error.toString()}`); } }; try { const { ast, errors } = parse(input, options); // If failFast is false, errors array will contain any non-fatal errors if (errors) { console.warn('Parsing completed with warnings:', errors); } } catch (error) { if (error instanceof MeldAstError) { console.error( `Parse error at line ${error.location?.start.line}, ` + `column ${error.location?.start.column}: ${error.message}` ); } } ``` ### Error Handling The parser provides detailed error information: ```typescript import { MeldAstError, ParseErrorCode } from 'meld-ast'; try { const { ast } = parse(input); } catch (error) { if (error instanceof MeldAstError) { // Location information if (error.location) { console.error( `Error at line ${error.location.start.line}, ` + `column ${error.location.start.column}` ); } // Error details console.error(` Message: ${error.message} Code: ${error.code} ${error.cause ? `Cause: ${error.cause.message}` : ''} `); // JSON representation console.error('Full error:', JSON.stringify(error.toJSON(), null, 2)); } } ``` Error codes indicate specific failure types: - `SYNTAX_ERROR`: Basic syntax errors - `VALIDATION_ERROR`: Node validation failures - `INITIALIZATION_ERROR`: Parser initialization issues - `GRAMMAR_ERROR`: Grammar-level problems ### Package Exports The package provides direct access to its components: ```typescript // Main parser import { parse } from 'meld-ast'; // Direct parser access import { parser } from 'meld-ast/parser'; // Grammar utilities import { grammar } from 'meld-ast/grammar'; // Error types import { MeldAstError } from 'meld-ast/errors'; ``` ## TypeScript Configuration Configure your `tsconfig.json`: ```json { "compilerOptions": { "module": "NodeNext", "moduleResolution": "NodeNext", "esModuleInterop": true } } ``` ## Environment-Specific Behavior The parser automatically detects and adapts to different environments: ### Development Environment When used in development (source code): ```typescript // Grammar is loaded from source directory // src/grammar/meld.pegjs ``` ### Production Environment When installed as a dependency: ```typescript // Pre-built parser is loaded from lib directory // node_modules/meld-ast/lib/grammar/parser.cjs // node_modules/meld-ast/lib/grammar/parser.js ``` The parser uses a robust fallback strategy: 1. Tries pre-built CJS parser first (most compatible) 2. Falls back to pre-built ESM parser 3. Falls back to grammar compilation if needed ### Debug Output Enable detailed debug logging to see path resolution: ```bash DEBUG=meld-ast:* node your-script.js ``` Example debug output: ``` Environment: { currentDir: '/path/to/node_modules/meld-ast/lib/grammar', isDev: false, pkgRoot: '/path/to/node_modules/meld-ast' } Looking for pre-built parsers... Found pre-built parser at: /path/to/node_modules/meld-ast/lib/grammar/parser.cjs Successfully loaded CJS parser ``` ## Troubleshooting ### Common Issues 1. **Parser Initialization** ``` Error: Failed to initialize parser ``` - Check environment detection in debug output - Verify package root resolution - Check pre-built parser availability - Verify grammar file paths 2. **Syntax Errors** ``` Error: Parse error: Unexpected token ``` - Verify Meld syntax - Check directive formatting - Ensure proper code fence closure 3. **Validation Errors** ``` Error: Node validation failed ``` - Check meld-spec compliance - Verify required fields - Check field types ### Debug Mode For verbose output: ```bash DEBUG=meld-ast:* node your-script.js ``` ## Contributing 1. Check current issues 2. Run tests: `npm test` 3. Add tests for new features 4. Submit a PR ## License ISC ## Development Setup ### Prerequisites - Node.js 16 or higher - npm 7 or higher - TypeScript 5.3 or higher ### Initial Setup 1. Clone the repository: ```bash git clone https://github.com/adamavenir/meld-ast.git cd meld-ast ``` 2. Install dependencies: ```bash npm install ``` 3. Build the project: ```bash npm run build ``` The build process includes: - Generating the parser from the PeggyJS grammar - Building ESM and CommonJS versions - Creating TypeScript declaration files - Generating source maps - Verifying the build output ### Project Structure ``` meld-ast/ ├── src/ │ ├── grammar/ # PeggyJS grammar files │ │ └── meld.pegjs # Main grammar definition │ ├── ast/ # AST type definitions │ ├── parser/ # Parser implementation │ └── index.ts # Main entry point ├── lib/ │ └── grammar/ # Pre-built parser files │ ├── parser.js # ESM parser │ └── parser.cjs # CommonJS parser ├── dist/ # Build output ├── test/ # Test files └── scripts/ # Build scripts ``` ### Build Output Structure The build process generates: 1. ESM Build (`dist/`): - Main entry point: `index.js` - Type definitions: `index.d.ts` - Source maps: `*.js.map`, `*.d.ts.map` - Generated parser: `grammar/parser.js` 2. CJS Build (`dist/cjs/`): - CommonJS entry: `index.js` - Type definitions: `index.d.ts` - Source maps: `*.js.map`, `*.d.ts.map` - Generated parser: `grammar/parser.js` ### Debugging 1. **Parser Generation** ```bash DEBUG=meld-ast:grammar npm run build:grammar ``` 2. **Build Process** ```bash DEBUG=meld-ast:* npm run build ``` 3. **Tests** ```bash DEBUG=meld-ast:* npm test ``` ### Common Development Tasks 1. **Adding New Grammar Rules** 1. Edit `src/grammar/meld.pegjs` 2. Add test cases in `tests/` 3. Run `npm run build` 4. Run `npm test` 2. **Modifying Parser Behavior** 1. Edit relevant files in `src/` 2. Update tests as needed 3. Run `npm run build` 4. Run `npm test` 3. **Updating Types** 1. Ensure compatibility with `meld-spec` 2. Update type definitions 3. Run `npm run build` 4. Verify with `npm test` ### Code Fences Code blocks can be fenced with 3, 4, or 5 backticks. The number of backticks in the closing fence must match the opening fence. This allows for proper nesting of code blocks: ```markdown # 3 backticks (basic) ```python print("hello") ``` # 4 backticks (can contain 3-backtick fences) ````markdown Here's some code: ```python print("hello") ``` ```` # 5 backticks (can contain 3 and 4-backtick fences) `````markdown A complex example: ```python print("hello") ``` ````javascript console.log("hi"); ```` ````` ``` The parser will always capture the outermost fence and treat any inner fences as content. ### Multiline Embed Syntax The parser supports multiline embed content using the double bracket syntax: ```markdown @embed [[ This is a multiline embed content that can span multiple lines. ]] ``` You can use variable interpolation within multiline embeds: ```markdown @embed [[ Hello, {{name}}! This is a multi-line content for embed. ]] ``` You can also specify sections within multiline embeds: ```markdown @embed [[ #SectionName This content will be embedded from the specified section. ]] ``` The multiline embed syntax provides a cleaner way to include content directly in your Meld document without having to create separate files for small pieces of content. ### Embed Directive Syntax The `@embed` directive supports several distinct syntax forms, each with specific semantics: 1. **Single Brackets for Paths**: `@embed [path/to/file.md]` - Used for embedding content from external files - Path can be a relative or absolute file path 2. **Path Variables in Single Brackets**: `@embed [$path_variable]` - References a path stored in a variable - The variable must be defined elsewhere in the document 3. **Double Brackets for Inline Content**: `@embed [[content goes here]]` - Used for embedding content directly in the document - Can span multiple lines - Variables within double brackets are treated as literal text 4. **Direct Variable Embedding**: `@embed {{variable}}` - Embeds the content of a variable directly - The variable is resolved at runtime - Shorthand alternative to `@embed [{{variable}}]` 5. **Variables in Brackets**: `@embed [{{variable}}]` - Embeds the content from a path stored in a variable - The variable is resolved at runtime - Path validation is applied to the resolved content These different syntax forms make the `@embed` directive highly flexible for various content embedding scenarios. ### Code Fence Examples Here are examples of how code fences are parsed with different options: ```typescript // Default behavior (preserveCodeFences: true) const input = '```javascript\nconsole.log("hello");\n```'; const { ast } = await parse(input); console.log((ast[0] as CodeFenceNode).content); // Output: ```javascript\nconsole.log("hello");\n``` // Without fence preservation const { ast } = await parse(input, { preserveCodeFences: false }); console.log((ast[0] as CodeFenceNode).content); // Output: console.log("hello"); // Nested fences (4 backticks containing 3 backticks) const nested = '````markdown\n```js\nlet x = 1;\n```\n````'; const { ast } = await parse(nested); // Preserves all fences in content ``` ### Path Directives with Special Variables Path directives support special variables for commonly used paths: ```markdown # Home directory references @path home_config = "$HOMEPATH/config" @path home_alt = "$~/config" # Project root references @path project_config = "$PROJECTPATH/config" @path project_alt = "$./config" ``` These special variables provide consistent ways to reference important paths: | Special Variable | Alias | Description | |------------------|-------|-------------| | `$HOMEPATH` | `$~` | User's home directory | | `$PROJECTPATH` | `$.` | Project root directory | The parser correctly sets the path structure with proper `base` and `segments` properties: ```typescript // For @path config = "$HOMEPATH/config" { type: "PathDirective", identifier: "config", value: { raw: "$HOMEPATH/config", structured: { base: "$HOMEPATH", segments: ["config"] } } } ``` When using these paths in embed or import directives, the runtime system is expected to resolve the special variables to actual filesystem paths. ### URL Support in Path Directives The parser now supports URLs in path directives, allowing you to reference remote content: ```markdown # Import from remote URL @import [https://example.com/docs/file.md] # Embed content from remote URL @embed [https://example.com/snippets/code.js] ``` URLs are detected by checking for the `http://` or `https://` prefix. The parser performs the following: 1. Validates that URLs are well-formed 2. Preserves URLs in their original form during path normalization 3. Adds a `url: true` property to the structured path object: ```typescript // For @import [https://example.com/file.md] { type: "Directive", directive: { kind: "import", path: { raw: "https://example.com/file.md", structured: { base: ".", segments: ["https:", "example.com", "file.md"], variables: {}, url: true }, normalized: "https://example.com/file.md" } } } ``` When using URL paths, the runtime system is expected to retrieve the remote content via HTTP(S) requests. > **Note**: Paths with slashes must either be URLs (starting with `http://` or `https://`) or use special variables (starting with `$`). This validation ensures paths are properly structured. ### Named Imports The parser supports selective imports using named import syntax: ```markdown # Basic named imports @import [var1, var2] from [variables.meld] # Named imports with aliases @import [var1, var2 as alias2] from [variables.meld] # Explicit wildcard import @import [*] from [variables.meld] # Traditional import (equivalent to wildcard) @import [variables.meld] # Empty import list @import [] from [variables.meld] # Named imports with variable path @import [var1, var2] from {{path_variable}} ``` Named imports allow you to selectively import specific variables from a file instead of importing everything. The AST structure for named imports includes an `imports` array with each import's name and optional alias: ```typescript // For @import [var1, var2 as alias2] from [variables.meld] { type: "Directive", directive: { kind: "import", path: { raw: "variables.meld", structured: { base: ".", segments: ["variables.meld"], variables: {}, cwd: true }, normalized: "./variables.meld" }, imports: [ { name: "var1", alias: null }, { name: "var2", alias: "alias2" } ] } } ``` The traditional import syntax (`@import [path.meld]`) is maintained for backward compatibility and is equivalent to a wildcard import (`@import [*] from [path.meld]`). ### Data Directives Data directives allow you to define structured data, which can be referenced elsewhere. ```markdown @data config = { "server": "localhost", "port": 8080 } @data servers = [ { "name": "prod", "url": "example.com" }, { "name": "staging", "url": "staging.example.com" } ] ``` Data directives can also be loaded from external files: ```markdown @data config = @embed [config.json] ``` ### Variable Syntax Meld supports different variable types with specific syntaxes: 1. **Text Variables** ```markdown Hello, {{name}}! ``` 2. **Data Variables with Field Access** ```markdown User: {{user.name}}, Age: {{user.age}} ``` 3. **Data Variables with Array Access** ```markdown First user: {{users.0.name}} Item by index: {{items.2}} Nested arrays: {{matrix.0.1}} Dynamic access: {{data[fieldName]}} ``` 4. **Path Variables** ```markdown File path: $project_path ``` 5. **Format Options** ```markdown Date: {{date>>formatName}} ```