meld-ast
Version:
AST parser for Meld
693 lines (544 loc) • 17.6 kB
Markdown
# meld-ast
A spec-compliant AST parser for the Meld scripting language, built with Peggy. This parser produces AST nodes that strictly conform to the `meld-spec` type definitions.
## Features
- Full compliance with `meld-spec` type definitions
- Built with Peggy for robust parsing
- Environment-independent parser loading
- Comprehensive error handling with location information
- Validation against meld-spec types
- Direct access to parser components
- Support for both ESM and CommonJS
- Source location tracking for all nodes
- Comprehensive parsing for all Meld language constructs:
- Text blocks
- Code fences with advanced features:
- Support for 3, 4, and 5 backticks
- Proper nesting of code fences
- Optional preservation of fence markers
- Language identifier support
- Comments (`>> comment`)
- Variables:
- Text variables (`{{var}}`) with format options (previously `${var}`)
- Data variables (`{{data}}`) with fields (previously `#{data}`)
- Array access for data variables (`{{array.0}}`, `{{data.users.0.name}}`)
- Path variables (`$var`)
- Directives:
- `@run [command]`
- `@import [path]`
- `@import [https://example.com/file.md]` for URL imports
- `@import [var1, var2] from [path.meld]` for named imports
- `@import [var1, var2 as alias2] from [path.meld]` for named imports with aliases
- `@define name = @directive [...]` with metadata fields
- `@data identifier:schema = { ... }`
- `@text name = "value"`
- `@path name = "path"`
- `@path name = "$HOMEPATH/path"` with special variables
- `@embed [path]` for embedding file content
- `@embed [https://example.com/content.md]` for URL content
- `@embed [$path_variable]` for path variables
- `@embed [$path_variable/{{variable}}]` for paths and regular variables
- `@embed [{{variable}}]` for variables within brackets
- `@embed {{variable}}` for direct variable embedding
- `@embed [[...]]` for multiline content
- Path directives with special variables
- Error recovery
- Extensible AST format
- Multi-file processing
## Installation
```bash
npm install meld-ast meld-spec
```
Note: `meld-spec` is a peer dependency and must be installed alongside `meld-ast`.
## Usage
### Basic Parsing
```typescript
import { parse } from 'meld-ast';
const input = `
>> This is a comment
Hello world
@run [echo "Hello"]
`;
const { ast } = parse(input);
```
### Advanced Usage with Options
The parser supports several configuration options:
```typescript
import { parse, ParserOptions, MeldAstError } from 'meld-ast';
const options: ParserOptions = {
// Stop on first error (default: true)
failFast: true,
// Track source locations (default: true)
trackLocations: true,
// Validate nodes against meld-spec (default: true)
validateNodes: true,
// Preserve code fence markers in content (default: true)
// When true, includes the opening/closing fence markers and language
// When false, only includes content between fences
preserveCodeFences: true,
// Suppress warnings for undefined variables in paths (default: false)
// When true, no warnings are emitted for undefined variables
// When false, warnings are emitted for undefined variables
variable_warning: false,
// Custom error handler
onError: (error: MeldAstError) => {
console.warn(`Parse warning: ${error.toString()}`);
}
};
try {
const { ast, errors } = parse(input, options);
// If failFast is false, errors array will contain any non-fatal errors
if (errors) {
console.warn('Parsing completed with warnings:', errors);
}
} catch (error) {
if (error instanceof MeldAstError) {
console.error(
`Parse error at line ${error.location?.start.line}, ` +
`column ${error.location?.start.column}: ${error.message}`
);
}
}
```
### Error Handling
The parser provides detailed error information:
```typescript
import { MeldAstError, ParseErrorCode } from 'meld-ast';
try {
const { ast } = parse(input);
} catch (error) {
if (error instanceof MeldAstError) {
// Location information
if (error.location) {
console.error(
`Error at line ${error.location.start.line}, ` +
`column ${error.location.start.column}`
);
}
// Error details
console.error(`
Message: ${error.message}
Code: ${error.code}
${error.cause ? `Cause: ${error.cause.message}` : ''}
`);
// JSON representation
console.error('Full error:', JSON.stringify(error.toJSON(), null, 2));
}
}
```
Error codes indicate specific failure types:
- `SYNTAX_ERROR`: Basic syntax errors
- `VALIDATION_ERROR`: Node validation failures
- `INITIALIZATION_ERROR`: Parser initialization issues
- `GRAMMAR_ERROR`: Grammar-level problems
### Package Exports
The package provides direct access to its components:
```typescript
// Main parser
import { parse } from 'meld-ast';
// Direct parser access
import { parser } from 'meld-ast/parser';
// Grammar utilities
import { grammar } from 'meld-ast/grammar';
// Error types
import { MeldAstError } from 'meld-ast/errors';
```
## TypeScript Configuration
Configure your `tsconfig.json`:
```json
{
"compilerOptions": {
"module": "NodeNext",
"moduleResolution": "NodeNext",
"esModuleInterop": true
}
}
```
## Environment-Specific Behavior
The parser automatically detects and adapts to different environments:
### Development Environment
When used in development (source code):
```typescript
// Grammar is loaded from source directory
// src/grammar/meld.pegjs
```
### Production Environment
When installed as a dependency:
```typescript
// Pre-built parser is loaded from lib directory
// node_modules/meld-ast/lib/grammar/parser.cjs
// node_modules/meld-ast/lib/grammar/parser.js
```
The parser uses a robust fallback strategy:
1. Tries pre-built CJS parser first (most compatible)
2. Falls back to pre-built ESM parser
3. Falls back to grammar compilation if needed
### Debug Output
Enable detailed debug logging to see path resolution:
```bash
DEBUG=meld-ast:* node your-script.js
```
Example debug output:
```
Environment: {
currentDir: '/path/to/node_modules/meld-ast/lib/grammar',
isDev: false,
pkgRoot: '/path/to/node_modules/meld-ast'
}
Looking for pre-built parsers...
Found pre-built parser at: /path/to/node_modules/meld-ast/lib/grammar/parser.cjs
Successfully loaded CJS parser
```
## Troubleshooting
### Common Issues
1. **Parser Initialization**
```
Error: Failed to initialize parser
```
- Check environment detection in debug output
- Verify package root resolution
- Check pre-built parser availability
- Verify grammar file paths
2. **Syntax Errors**
```
Error: Parse error: Unexpected token
```
- Verify Meld syntax
- Check directive formatting
- Ensure proper code fence closure
3. **Validation Errors**
```
Error: Node validation failed
```
- Check meld-spec compliance
- Verify required fields
- Check field types
### Debug Mode
For verbose output:
```bash
DEBUG=meld-ast:* node your-script.js
```
## Contributing
1. Check current issues
2. Run tests: `npm test`
3. Add tests for new features
4. Submit a PR
## License
ISC
## Development Setup
### Prerequisites
- Node.js 16 or higher
- npm 7 or higher
- TypeScript 5.3 or higher
### Initial Setup
1. Clone the repository:
```bash
git clone https://github.com/adamavenir/meld-ast.git
cd meld-ast
```
2. Install dependencies:
```bash
npm install
```
3. Build the project:
```bash
npm run build
```
The build process includes:
- Generating the parser from the PeggyJS grammar
- Building ESM and CommonJS versions
- Creating TypeScript declaration files
- Generating source maps
- Verifying the build output
### Project Structure
```
meld-ast/
├── src/
│ ├── grammar/ # PeggyJS grammar files
│ │ └── meld.pegjs # Main grammar definition
│ ├── ast/ # AST type definitions
│ ├── parser/ # Parser implementation
│ └── index.ts # Main entry point
├── lib/
│ └── grammar/ # Pre-built parser files
│ ├── parser.js # ESM parser
│ └── parser.cjs # CommonJS parser
├── dist/ # Build output
├── test/ # Test files
└── scripts/ # Build scripts
```
### Build Output Structure
The build process generates:
1. ESM Build (`dist/`):
- Main entry point: `index.js`
- Type definitions: `index.d.ts`
- Source maps: `*.js.map`, `*.d.ts.map`
- Generated parser: `grammar/parser.js`
2. CJS Build (`dist/cjs/`):
- CommonJS entry: `index.js`
- Type definitions: `index.d.ts`
- Source maps: `*.js.map`, `*.d.ts.map`
- Generated parser: `grammar/parser.js`
### Debugging
1. **Parser Generation**
```bash
DEBUG=meld-ast:grammar npm run build:grammar
```
2. **Build Process**
```bash
DEBUG=meld-ast:* npm run build
```
3. **Tests**
```bash
DEBUG=meld-ast:* npm test
```
### Common Development Tasks
1. **Adding New Grammar Rules**
1. Edit `src/grammar/meld.pegjs`
2. Add test cases in `tests/`
3. Run `npm run build`
4. Run `npm test`
2. **Modifying Parser Behavior**
1. Edit relevant files in `src/`
2. Update tests as needed
3. Run `npm run build`
4. Run `npm test`
3. **Updating Types**
1. Ensure compatibility with `meld-spec`
2. Update type definitions
3. Run `npm run build`
4. Verify with `npm test`
### Code Fences
Code blocks can be fenced with 3, 4, or 5 backticks. The number of backticks in the closing fence must match the opening fence. This allows for proper nesting of code blocks:
```markdown
# 3 backticks (basic)
```python
print("hello")
```
# 4 backticks (can contain 3-backtick fences)
````markdown
Here's some code:
```python
print("hello")
```
````
# 5 backticks (can contain 3 and 4-backtick fences)
`````markdown
A complex example:
```python
print("hello")
```
````javascript
console.log("hi");
````
`````
```
The parser will always capture the outermost fence and treat any inner fences as content.
### Multiline Embed Syntax
The parser supports multiline embed content using the double bracket syntax:
```markdown
@embed [[
This is a multiline
embed content that can span
multiple lines.
]]
```
You can use variable interpolation within multiline embeds:
```markdown
@embed [[
Hello, {{name}}!
This is a multi-line
content for embed.
]]
```
You can also specify sections within multiline embeds:
```markdown
@embed [[ #SectionName
This content will be embedded
from the specified section.
]]
```
The multiline embed syntax provides a cleaner way to include content directly in your Meld document without having to create separate files for small pieces of content.
### Embed Directive Syntax
The `@embed` directive supports several distinct syntax forms, each with specific semantics:
1. **Single Brackets for Paths**: `@embed [path/to/file.md]`
- Used for embedding content from external files
- Path can be a relative or absolute file path
2. **Path Variables in Single Brackets**: `@embed [$path_variable]`
- References a path stored in a variable
- The variable must be defined elsewhere in the document
3. **Double Brackets for Inline Content**: `@embed [[content goes here]]`
- Used for embedding content directly in the document
- Can span multiple lines
- Variables within double brackets are treated as literal text
4. **Direct Variable Embedding**: `@embed {{variable}}`
- Embeds the content of a variable directly
- The variable is resolved at runtime
- Shorthand alternative to `@embed [{{variable}}]`
5. **Variables in Brackets**: `@embed [{{variable}}]`
- Embeds the content from a path stored in a variable
- The variable is resolved at runtime
- Path validation is applied to the resolved content
These different syntax forms make the `@embed` directive highly flexible for various content embedding scenarios.
### Code Fence Examples
Here are examples of how code fences are parsed with different options:
```typescript
// Default behavior (preserveCodeFences: true)
const input = '```javascript\nconsole.log("hello");\n```';
const { ast } = await parse(input);
console.log((ast[0] as CodeFenceNode).content);
// Output: ```javascript\nconsole.log("hello");\n```
// Without fence preservation
const { ast } = await parse(input, { preserveCodeFences: false });
console.log((ast[0] as CodeFenceNode).content);
// Output: console.log("hello");
// Nested fences (4 backticks containing 3 backticks)
const nested = '````markdown\n```js\nlet x = 1;\n```\n````';
const { ast } = await parse(nested);
// Preserves all fences in content
```
### Path Directives with Special Variables
Path directives support special variables for commonly used paths:
```markdown
# Home directory references
@path home_config = "$HOMEPATH/config"
@path home_alt = "$~/config"
# Project root references
@path project_config = "$PROJECTPATH/config"
@path project_alt = "$./config"
```
These special variables provide consistent ways to reference important paths:
| Special Variable | Alias | Description |
|------------------|-------|-------------|
| `$HOMEPATH` | `$~` | User's home directory |
| `$PROJECTPATH` | `$.` | Project root directory |
The parser correctly sets the path structure with proper `base` and `segments` properties:
```typescript
// For @path config = "$HOMEPATH/config"
{
type: "PathDirective",
identifier: "config",
value: {
raw: "$HOMEPATH/config",
structured: {
base: "$HOMEPATH",
segments: ["config"]
}
}
}
```
When using these paths in embed or import directives, the runtime system is expected to resolve the special variables to actual filesystem paths.
### URL Support in Path Directives
The parser now supports URLs in path directives, allowing you to reference remote content:
```markdown
# Import from remote URL
@import [https://example.com/docs/file.md]
# Embed content from remote URL
@embed [https://example.com/snippets/code.js]
```
URLs are detected by checking for the `http://` or `https://` prefix. The parser performs the following:
1. Validates that URLs are well-formed
2. Preserves URLs in their original form during path normalization
3. Adds a `url: true` property to the structured path object:
```typescript
// For @import [https://example.com/file.md]
{
type: "Directive",
directive: {
kind: "import",
path: {
raw: "https://example.com/file.md",
structured: {
base: ".",
segments: ["https:", "example.com", "file.md"],
variables: {},
url: true
},
normalized: "https://example.com/file.md"
}
}
}
```
When using URL paths, the runtime system is expected to retrieve the remote content via HTTP(S) requests.
> **Note**: Paths with slashes must either be URLs (starting with `http://` or `https://`) or use special variables (starting with `$`). This validation ensures paths are properly structured.
### Named Imports
The parser supports selective imports using named import syntax:
```markdown
# Basic named imports
@import [var1, var2] from [variables.meld]
# Named imports with aliases
@import [var1, var2 as alias2] from [variables.meld]
# Explicit wildcard import
@import [*] from [variables.meld]
# Traditional import (equivalent to wildcard)
@import [variables.meld]
# Empty import list
@import [] from [variables.meld]
# Named imports with variable path
@import [var1, var2] from {{path_variable}}
```
Named imports allow you to selectively import specific variables from a file instead of importing everything. The AST structure for named imports includes an `imports` array with each import's name and optional alias:
```typescript
// For @import [var1, var2 as alias2] from [variables.meld]
{
type: "Directive",
directive: {
kind: "import",
path: {
raw: "variables.meld",
structured: {
base: ".",
segments: ["variables.meld"],
variables: {},
cwd: true
},
normalized: "./variables.meld"
},
imports: [
{ name: "var1", alias: null },
{ name: "var2", alias: "alias2" }
]
}
}
```
The traditional import syntax (`@import [path.meld]`) is maintained for backward compatibility and is equivalent to a wildcard import (`@import [*] from [path.meld]`).
### Data Directives
Data directives allow you to define structured data, which can be referenced elsewhere.
```markdown
@data config = {
"server": "localhost",
"port": 8080
}
@data servers = [
{ "name": "prod", "url": "example.com" },
{ "name": "staging", "url": "staging.example.com" }
]
```
Data directives can also be loaded from external files:
```markdown
@data config = @embed [config.json]
```
### Variable Syntax
Meld supports different variable types with specific syntaxes:
1. **Text Variables**
```markdown
Hello, {{name}}!
```
2. **Data Variables with Field Access**
```markdown
User: {{user.name}}, Age: {{user.age}}
```
3. **Data Variables with Array Access**
```markdown
First user: {{users.0.name}}
Item by index: {{items.2}}
Nested arrays: {{matrix.0.1}}
Dynamic access: {{data[fieldName]}}
```
4. **Path Variables**
```markdown
File path: $project_path
```
5. **Format Options**
```markdown
Date: {{date>>formatName}}
```