UNPKG

closure-builder

Version:

Simple Closure, Soy and JavaScript Build system

647 lines (543 loc) 17.1 kB
# HTML Validation The [HTML spec](https://html.spec.whatwg.org/multipage/) defines what is valid HTML. There are many existing tools (both first-party and third-party) that can validate plain HTML. In `Closure Templates`, we also provide some HTML validation support by default. Different browsers treat invalid HTML snippets differently, but we want to enforce a strict HTML mode that throws a compiler error for invalid HTML. NOTE: We do not support the full HTML spec. The HTML spec is actually a collection of separate specifications. Because `Closure Templates` itself is complex (it is turing-complete after all), any HTML validation we implement must be useful yet not overly restrictive. Were we to implement the full HTML spec it would be overly restrictive. [TOC] ## Supported rules For templates that enforce strict HTML mode, the following rules are supported by `Closure Templates`. ### Each block must contain balanced tags ```soy {.good} // This is a valid HTML snippet. {template .t} <div> {for $foo in $fooList} // Tags in this block should be closed within the same block. In this block, // <div> tag is closed and <input> tag is self-closing. <div>foo<p>bar</p></div><input> {/for} {call .foo} {param content kind="html"} // A param with kind="html" is also a block. Since the template sets // stricthtml to true, this part of the template should also close all // tags. <div><span><input></span></div> {/param} {/call} </div> {/template} ``` This is an example of valid HTML snippet. In this example, in each `for` block, every tag that is opened, must be closed (`input` is a [void tag](https://www.w3.org/TR/HTML51/syntax.HTML#void-elements) and is self-closing). The `param` block that explicitly sets `kind="html"` should also contain self-closed tags. Note that a template is also a block and the `div` tag at the very beginning has been closed at the end. ### Void elements According to the [HTML spec](https://www.w3.org/TR/2016/REC-html51-20161101/syntax.html#void-elements), void elements only have a start tag; end tags must not be specified for void elements. The compiler will enforce the following rules: 1. Void elements must only have a start tag without an end tag. `</input>` is invalid HTML. 2. Start tags for void elements can be self-closing (for example, `<input/>`) or normal (for example, `<input>`). Both are valid HTML. 3. Start tags for non-void elements can *not* be self-closing. `<div/>` is invalid HTML. NOTE: For tags with dynamic tag names, it is a little more complicated. In particular, we assume users know what they are doing. See the Dynamic tag names section for detailed examples. ### Dynamic tag names For strict HTML mode, `Closure Templates` supports dynamic tag names. The compiler enforces that a HTML open tag with dynamic tag name (for example, `{$tagName1}`) must be closed by a HTML close tag with the same tag name in the same block. **Good code:** ```soy // This is a valid HTML snippet. {template .t} {@param tagName1: string} {@param tagName2: Foo} <{$tagName1}> <{$tagName2|fooToTag}> </{$tagName2|fooToTag}> </{$tagName1}> {/template} ``` Note that print directives should also match, since it might change the evaluated values during run time. **Bad code:** ```soy {.bad} // This is invalid since the close tag has an additional directive, while the // open tag does not have any directives. {template .t} {@param tagName: string} <{$tagName}> </{$tagName|fooToTag}> {/template} ``` For self-closing tags with dynamic tag name, we simply trust the users that it is valid. **Good code:** ```soy {.good} // This is valid HTML since we cannot statically decide if $tagName is a valid // name for void elements. We simply trust the users that they are doing the // right things. {template .t} {@param tagName: string} // We trust users and assume that tagName is self-closing. <{$tagName}/> {/template} ``` **Bad code:** ```soy {.bad} {template .t} {@param tagName: string} // When we open $tagName (that is not self-closing), we assume that it is // not a void element, i.e., it must be closed. <{$tagName}>foo {/template} ``` Matching raw text with dynamic expressions is not supported, even if the tag name is compile-time constant. **Bad code:** ```soy {.bad} // This is invalid since we do not support matching static and dynamic tags. {template .t} {let tagName: "div" /} <{$tagName}></div> {/template} ``` ### Control flow The compiler supports tag balancing within control flows. The following example is a simple but common use case. **Good code:** ```soy {.good} // An example of if conditions. {template .t} {@param b: bool} {@param i: bool} {@param em: bool} {if $b}<b>{/if} {if $i}<i></if} {if $em}<em>{/if} content {if $em}</em>{/if} {if $i}</i>{/if} {if $b}</b>{/if} {/template} ``` **Good code:** ```soy {.good} // An example of switch conditions. {template .t} {@param foo: string} {@param a: string} {@param b: string} {switch $foo} {case $a} <div> {case $b} // <input> is a void element. <p><input> {default} // <a> has been closed within this block. <em><a></a> {/switch} {switch $foo} {case $a} </div> {case $b} </p> {default} // <span> has been closed within this block. // </em> matches with the <em> in the previous switch branch. <span></span></em> {/switch} {/template} ``` We only perform static analysis, so mixing if and switch conditions are unsupported. **Bad code:** ```soy {.bad} {template .t} {@param case: int} {switch $case} {case 1} {case 2} {case 3} <em> {case 4} {/switch} {if $case == 3}</em>{/if} {/template} ``` Also, we do not evaluate the expressions, so the following example will be treated as an error. **Bad code:** ```soy {.bad} // Although each pair of open tag and close tag has the same conditions, we do // not evaluate the expressions and cannot decide if they match or not. {template .t} {@param foo: bool} {@param bar: bool} {@param tag: string} {if $foo} <b> // Condition: $foo {elseif $bar} <i> // Condition: not $foo and $bar {else} <{$tag}> // Condition: not $foo and not $bar {/if} {if not $foo and not $bar} </{$tag}> // Condition: not $foo and not $bar {elseif $foo} </b> // Condition: $foo {else} </i> // Condition: not $foo and $bar {/if} {/template} ``` **Bad code:** ```soy {.bad} {template .t} {@param foo: bool} {let $bar: $foo} {if $foo}<b>{/if} // We do not evaluate $bar to $foo, so we don't know if these two are the // same conditions at this point. The compiler will throw an error. {if $bar}</b>{/if} {/template} ``` On the other hand, complicated nested control flows are supported, as long as all the expressions match by text. **Good code:** ```soy {.good} {template .t} {@param foo: bool} {@param bar: bool} {if $foo} <div> {if $bar} <p><input/> {/if} {/if} {if $foo} {if $bar} </p> {/if} </div> {/if} {/template} ``` Also, the compiler is able to match common tags across all possible conditions. **Good code:** ```soy {.good} {template .t} {@param foo: string} {@param bar: bool} {@param a: string} {@param b: string} <div> // All possible branches have a close div tag. {if $foo} foo</div> {elseif $bar} bar</div> {else} xxx</div> {/if} // Similarly, all possible branches have a open div tag. {switch $foo} {case $a} <div>foo_a {case $b} <div>foo_b {default} <div>foo_x {/switch} </div> {/template} ``` Besides that, in general, we do static analyses for the control flows, i.e., all the conditions and tags within conditions should be exactly matched. The following templates which contain control blocks that partially match the previous blocks, are not supported. **Bad code:** ```soy {.bad} {template .t} {@param foo: bool} {if $foo} <div><div> {/if} {if $foo} </div> {/if} {if $foo} </div> {/if} {/template} ``` **Bad code:** ```soy {.bad} {template .t} {@param foo: bool} {@param bar: bool} {if $foo} <div> {else} {if $bar} <div><div> {else} <div> {/if} {/if} // This matches with the extra tag in the first IfNode {if $foo} {else} {if $bar} </div> {else} {/if} {/if} // This matches the common prefix in the first IfNode </div> {/template} ``` ### `for` The following template is technically a valid HTML. The `for` loop opens three div tags, and then we manually close all of them after the loop. However, we decided to not support this case since the conditions of `for` could be dynamic and there is no easy way to statically decide if this template a is valid HTML. In this particular example, the compiler will report an exception at the location of `<div>` tag in for loop. It will complain that this HTML open tag is not closed within the current block. **Bad code:** ```soy {.bad} {template .t} {for i in range(3)} <div> {/for} </div></div></div> {/template} ``` You can use a recursive template to create nesting: **Good code:** ```soy {.good} {template .t} {@param level: int} {@param content: html} {if $level > 0} <div> {call .t} {param level: $level - 1 /} {param content: $content /} {/call} </div> {else} {$content} {/if} {/template} ``` The following template is another example that is a valid HTML but is not supported by the compiler. **Bad code:** ```soy {.bad} {template .t} {@param a: list<string>} {for $x in $a} {if isFirst($x)}<ul>{/if} <li>{$x} {if isLast($x)}</ul>{/if} {/for} {/template} ``` `isFirst` and `isLast` are [built-in commands](control-flow#for) that check the position of the current iterator. Although this template produces valid HTML (it opens and closes `<ul>` exactly once if the list is non-empty), supporting this pattern adds an additional layer of complexity to the compiler. It requires checking function names, validating this particular AST structure, and do special handling for these functions. For this example, we recommend you to use the following template. It renders exactly the same HTML, and is supported by the compiler. **Good code:** ```soy {.good} {template .t} {@param a: list<string>} {if length($a) > 0} <ul> {for $x in $a} <li>{$x} {/for} </ul> {/if} {/template} ``` ### Optional tags According to the HTML spec, some tags [can be omitted](https://www.w3.org/TR/html5/syntax.html#optional-tags). In particular, if certain criteria are met, the HTML parser can imply the start tags and/or end tags. `Closure Templates` supports part of these rules. The major differences between what we support and the HTML spec: * **Start tags must be presented.** In HTML spec, it is possible to omit start tags under some circumstances. An example is that `<html>` and `<head>` might be omitted. However, due to the rendering model of `Closure Templates`, we don't support that. * **Strict criteria are not enforced.** In HTML spec, most of the end tags can be omitted based on the content model. For example, `</body>` may be omitted if the `<body>` is not immediately followed by a comment. `Closure Templates` assumes all unclosed optional tags are fine. Additional conditions such as "is not immediately followed by a comment" are not enforced. Some examples are: **Good code:** ```soy {.good} {template .t} {@param foo: bool} {@param bar: bool} <html> <head> <ul> // optional tag that does not close <li>foo <li> // optional tag that closes and contains if nodes as its children <li>b{if $foo}<b>{/if}a{if $bar}<i>{/if}r{if $bar}</i>{/if}{if $foo}</b>{/if}</li> // optional tag that closes <li>baz</li> <li></li> // optional tag that does not close and contains if nodes as its children <li>b{if $foo}<b>{/if}a{if $bar}<i>{/if}r{if $bar}</i>{/if}{if $foo}</b>{/if} </ul> // html and head are automatically closed at the end {/template} ``` Mixing complicated control flows and optional tags is also supported. For example... **Good code:** ```soy {.good} {template .t} {@param foo: bool} {@param bar: bool} <ul> <li>foo <li>b {if $foo}<b>{/if} a {if $bar}<i>{/if} r {if $bar}</i>{/if} {if $foo}</b>{/if} <li>baz</li> <li> {if $foo}<li>{/if}{if $foo}</li>{/if} </ul> {/template} ``` ### Foreign elements HTML allows elements from non-HTML namespaces (such as MathML and SVG) to appear in some contexts. MathML is *not* supported since it is deprecated and has already been removed from Chrome. SVG is quite popular and supported. Notably SVG uses XML spec, which means all tags must be closed and every tag is allowed to be self-closing. **Good code:** ```soy {.good} {template .foreign_elements_simple} <svg> <path/> <path></path> <rect/> <rect></rect> </svg> {/template} ``` **Good code:** ```soy {.good} {template .foreign_elements_control_flow} {@param foo: bool} <svg> <path/> <path></path> {if $foo}<rect/>{/if} <rect/> <rect></rect> {if $foo}<path>{/if} {if $foo}</path>{/if} </svg> {/template} ``` We also enforce that SVG tags can only be opened and closed within the same block. **Bad code:** ```soy {.bad} {template .fail_foreign_elements_across_block} {@param foo: bool} {if $foo} <svg> {else} <svg> {/if} </svg> {/template} ``` ## Known Issues As we said before, we do not support full HTML spec. There are several general limitations/complexities to keep in mind: * Users write recursive templates, so the depth and structure of the DOM tree is fully dynamic. * The `Closure Templates` rendering model provides very limited information about what we are rendering. For example, we don't know the current `DOCTYPE`, the current `Content-Type headers`, or which `<meta>` tags have been sent/specified. So we need to not care about the specifics that are controlled by these details. * Users render tag names dynamically, so we won't necessarily know the specific tag names statically. * `Closure Templates` allows for opaque content transclusions of HTML content, so we won't necessarily know (statically) the number of predecessors of a given tag. On the other hand, there are some things we could theoretically validate, but we don't currently since we believe them to be of limited utility. An example is HTML [content models](https://www.w3.org/TR/html5/dom.html#content-models). Supporting full content models is infeasible in `Closure Templates`; we can still add separate rules such as don't put `<div>` inside `<p>` (`p` tag should only [contain phrasing contents](https://dev.w3.org/html5/spec-preview/the-p-element.html#the-p-element)), but enforcing this in `Closure Templates` does not bring many benefits. ## Disabling HTML Validation HTML validation is enabled by default. To disable it: * Add `stricthtml="false"` to your templates or namespaces. * Setting strict HTML mode in namespaces will set a default value for all HTML templates in this file. * Each HTML template can still override the default value inheriting from the namespace. * Non-HTML templates will not be affected at all. * The compiler will report errors for invalid HTML templates. In the following example, template `foo` will enforce strict HTML mode, but templates `bar` and `baz` will not. ```soy // By default, all HTML templates will enable stricthtml mode. {namespace ns} // This template does not override the default value, so it will have stricthtml // mode enabled. {template .foo} ... {/template} // This is a non-HTML template, and stricthtml mode does not apply for it. {template .bar kind="text"} ... {/template} // This template override the default value and disable stricthtml mode. {template baz stricthtml="false"} ... {/template} ``` To enforce `stricthtml` in all templates, add a custom [Soy conformance](../dev/conformance) rule `com.google.template.soy.conformance.RequireStrictHtml`.