json-pipeline

# Data format for the compiler ## Conventions Opcode names MUST not contain `:` unless: * They are going to be consumed and renamed by one of the next stages * They represent platform specific low-level instructions. In this case the `<platform-name>:prefix` should be used. `platform-name` should not contain any characters except ones defined by the set `[a-zA-Z0-9_\-\.]` Every node may take several literal inputs, several regular inputs, and possibly a control input. ## Process Stage receive and produce data with indexes instead of pointers between structures, thus ensuring that the data might be serialized between all stages. Stage might transform input data to some internal representation with pointers instead of the indexes before performing any kind of operations. The compilation goes in a following steps. ### AST to SSA-less CFG AST is naively transformed into CFG graph with: * `ssa:load(index)` * `ssa:store(index, value)` for all local variable lookups. Context and global variable lookups should be generally done using language-specific opcodes, and are not subject to the SSA phase defined below. `index` is just a number literal (see literal inputs to nodes), and represents local variable stack cell to put/load the value from. SSA may replace some `ssa:load()`s with `ssa:undefined` if the value was not defined at that point. It is up to next phases to figure out what to do with such nodes. Every node (except region nodes) MUST be in the `nodes` list of one block. First block in `cfg` section is considered a `start` node. SSA-less CFG information is propagated to the next stage. ### SSA-less CFG to CFG+SSA Using dominator tree information, `ssa:load`s are replaced with proper `ssa:store`'s values or `ssa:phi` nodes. `ssa:phi` takes two inputs and has a control dependency on the CFG block that holds it. `start` and `region`'s `control` field points to the last control nodes in the predecessor blocks. `ssa:phi`, `jump`, and `if` nodes have the parent `region` or `start` node in the `control` field (or should have `region` or `start` node reachable through the `control` field chain). `ssa:phi` MUST be at the start of the block, or right after other `ssa:phi`. `if`, `jump` MUST be the last intruction in the block. CFG and dominance information is propagated to the next stage. ### CFG to Sea-of-Nodes Some optimizations may happen here. Nothing is propagated to the next stage, except the `nodes` DAG. ### Optimizations on Sea-of-Nodes Some optimizations may happen here. Propagate all inputs. ### Sea-of-Nodes to Low-level Sea-of-Nodes **Platform specific stage** Opcodes are replaced with their low-level counterparts, possibly generating more nodes that was previously in the graph. Propagate all inputs. ### Sea-of-Nodes to CFG Scheduling algorithm generates blocks and puts the nodes at proper positions into them, thus creating CFG out of DAG representation. Blocks are ordered to optimize amount of jumps. This stage propagates CFG and dominance information. ### Optional optimizations on CFG Some optimizations may happen here. Propagate all inputs. ### Register Allocation **Platform specific stage** Allocate register for each input/output of the node, and possibly additional temporary regiters. Propagate data in register allocator format (TBD). ### Generate machine code **Platform specific stage** Generate machine code using register allocator data. ## Representation ### JSON ```js { // Lookup by `node id` "nodes": [ { "opcode": "opcode id", "control": [ ...node ids... ], "literals": [ ...literal values... ], "inputs": [ ...node ids... ] } ], // Optional CFG information "cfg": { // Number of blocks MUST match number of `region` and `start` nodes "blocks": [ { "node": ...node id... // Points to region node "successors": [ ...node ids... ], "nodes": [ ...node ids... ] } ] }, // Optional dominance information "dominance": { // Number of blocks MUST match number of blocks in `cfg` section "blocks": [ { "node": ...node id... // Point to region node "parent": ...node id... or `null`, "frontier": [ [ ... ], ... [ ... ] ] } ] } } ``` ### Printable `iN` - where `N >= 0`, for each node `bN` - where `N >= 0`, for each block `iX = opcode <literals>, <nodes>` - for every node. Optionally nodes, might be placed into `block bN { ... }` section. `bX -> bY, ..., bZ` at the end of the block section to specify block successors. `bX => bY, ..., bZ` to specify children in dominance tree. It is in reverse in JSON only for the compactness of the representation. `bX ~> bY, ..., bZ` for dominance frontier ``` pipeline { # First block b0 { # NOTE: `^i1`/`^b0` specifies optional control dependency # (possibly multiple) i1 = opcode ^b0, 1, 2, ...literals, i2, i3, ... nodes } b0 -> b1, b2 b0 => b1, b2, b3 b0 ~> b1, b2, b3 b1 { ... } b1 -> b3 b2 { ... } b2 -> b3 b3 { ... } } ``` ### Binary To be defined