closure-builder
Version:
Simple Closure, Soy and JavaScript Build system
647 lines (543 loc) • 17.1 kB
Markdown
# HTML Validation
The [HTML spec](https://html.spec.whatwg.org/multipage/) defines what is valid
HTML. There are many existing tools (both first-party and third-party) that can
validate plain HTML. In `Closure Templates`, we also provide some HTML validation
support by default.
Different browsers treat invalid HTML snippets differently, but we want to
enforce a strict HTML mode that throws a compiler error for invalid HTML.
NOTE: We do not support the full HTML spec. The HTML spec is actually a
collection of separate specifications. Because `Closure Templates` itself is
complex (it is turing-complete after all), any HTML validation we implement must
be useful yet not overly restrictive. Were we to implement the full HTML spec it
would be overly restrictive.
[TOC]
## Supported rules
For templates that enforce strict HTML mode, the following rules are supported
by `Closure Templates`.
### Each block must contain balanced tags
```soy {.good}
// This is a valid HTML snippet.
{template .t}
<div>
{for $foo in $fooList}
// Tags in this block should be closed within the same block. In this block,
// <div> tag is closed and <input> tag is self-closing.
<div>foo<p>bar</p></div><input>
{/for}
{call .foo}
{param content kind="html"}
// A param with kind="html" is also a block. Since the template sets
// stricthtml to true, this part of the template should also close all
// tags.
<div><span><input></span></div>
{/param}
{/call}
</div>
{/template}
```
This is an example of valid HTML snippet. In this example, in each `for` block,
every tag that is opened, must be closed (`input` is a [void
tag](https://www.w3.org/TR/HTML51/syntax.HTML#void-elements) and is
self-closing). The `param` block that explicitly sets `kind="html"` should also
contain self-closed tags. Note that a template is also a block and the `div` tag
at the very beginning has been closed at the end.
### Void elements
According to the [HTML
spec](https://www.w3.org/TR/2016/REC-html51-20161101/syntax.html#void-elements),
void elements only have a start tag; end tags must not be specified for void
elements. The compiler will enforce the following rules:
1. Void elements must only have a start tag without an end tag. `</input>` is
invalid HTML.
2. Start tags for void elements can be self-closing (for example, `<input/>`)
or normal (for example, `<input>`). Both are valid HTML.
3. Start tags for non-void elements can *not* be self-closing. `<div/>` is
invalid HTML.
NOTE: For tags with dynamic tag names, it is a little more complicated. In
particular, we assume users know what they are doing. See the Dynamic tag names
section for detailed examples.
### Dynamic tag names
For strict HTML mode, `Closure Templates` supports dynamic tag names. The compiler
enforces that a HTML open tag with dynamic tag name (for example, `{$tagName1}`)
must be closed by a HTML close tag with the same tag name in the same block.
**Good code:**
```soy
// This is a valid HTML snippet.
{template .t}
{ tagName1: string}
{ tagName2: Foo}
<{$tagName1}>
<{$tagName2|fooToTag}>
</{$tagName2|fooToTag}>
</{$tagName1}>
{/template}
```
Note that print directives should also match, since it might change the
evaluated values during run time.
**Bad code:**
```soy {.bad}
// This is invalid since the close tag has an additional directive, while the
// open tag does not have any directives.
{template .t}
{ tagName: string}
<{$tagName}>
</{$tagName|fooToTag}>
{/template}
```
For self-closing tags with dynamic tag name, we simply trust the users that it
is valid.
**Good code:**
```soy {.good}
// This is valid HTML since we cannot statically decide if $tagName is a valid
// name for void elements. We simply trust the users that they are doing the
// right things.
{template .t}
{ tagName: string}
// We trust users and assume that tagName is self-closing.
<{$tagName}/>
{/template}
```
**Bad code:**
```soy {.bad}
{template .t}
{ tagName: string}
// When we open $tagName (that is not self-closing), we assume that it is
// not a void element, i.e., it must be closed.
<{$tagName}>foo
{/template}
```
Matching raw text with dynamic expressions is not supported, even if the tag
name is compile-time constant.
**Bad code:**
```soy {.bad}
// This is invalid since we do not support matching static and dynamic tags.
{template .t}
{let tagName: "div" /}
<{$tagName}></div>
{/template}
```
### Control flow
The compiler supports tag balancing within control flows. The following example
is a simple but common use case.
**Good code:**
```soy {.good}
// An example of if conditions.
{template .t}
{ b: bool}
{ i: bool}
{ em: bool}
{if $b}<b>{/if}
{if $i}<i></if}
{if $em}<em>{/if}
content
{if $em}</em>{/if}
{if $i}</i>{/if}
{if $b}</b>{/if}
{/template}
```
**Good code:**
```soy {.good}
// An example of switch conditions.
{template .t}
{ foo: string}
{ a: string}
{ b: string}
{switch $foo}
{case $a}
<div>
{case $b}
// <input> is a void element.
<p><input>
{default}
// <a> has been closed within this block.
<em><a></a>
{/switch}
{switch $foo}
{case $a}
</div>
{case $b}
</p>
{default}
// <span> has been closed within this block.
// </em> matches with the <em> in the previous switch branch.
<span></span></em>
{/switch}
{/template}
```
We only perform static analysis, so mixing if and switch conditions are
unsupported.
**Bad code:**
```soy {.bad}
{template .t}
{ case: int}
{switch $case}
{case 1}
{case 2}
{case 3}
<em>
{case 4}
{/switch}
{if $case == 3}</em>{/if}
{/template}
```
Also, we do not evaluate the expressions, so the following example will be
treated as an error.
**Bad code:**
```soy {.bad}
// Although each pair of open tag and close tag has the same conditions, we do
// not evaluate the expressions and cannot decide if they match or not.
{template .t}
{ foo: bool}
{ bar: bool}
{ tag: string}
{if $foo}
<b> // Condition: $foo
{elseif $bar}
<i> // Condition: not $foo and $bar
{else}
<{$tag}> // Condition: not $foo and not $bar
{/if}
{if not $foo and not $bar}
</{$tag}> // Condition: not $foo and not $bar
{elseif $foo}
</b> // Condition: $foo
{else}
</i> // Condition: not $foo and $bar
{/if}
{/template}
```
**Bad code:**
```soy {.bad}
{template .t}
{ foo: bool}
{let $bar: $foo}
{if $foo}<b>{/if}
// We do not evaluate $bar to $foo, so we don't know if these two are the
// same conditions at this point. The compiler will throw an error.
{if $bar}</b>{/if}
{/template}
```
On the other hand, complicated nested control flows are supported, as long as
all the expressions match by text.
**Good code:**
```soy {.good}
{template .t}
{ foo: bool}
{ bar: bool}
{if $foo}
<div>
{if $bar}
<p><input/>
{/if}
{/if}
{if $foo}
{if $bar}
</p>
{/if}
</div>
{/if}
{/template}
```
Also, the compiler is able to match common tags across all possible conditions.
**Good code:**
```soy {.good}
{template .t}
{ foo: string}
{ bar: bool}
{ a: string}
{ b: string}
<div>
// All possible branches have a close div tag.
{if $foo}
foo</div>
{elseif $bar}
bar</div>
{else}
xxx</div>
{/if}
// Similarly, all possible branches have a open div tag.
{switch $foo}
{case $a}
<div>foo_a
{case $b}
<div>foo_b
{default}
<div>foo_x
{/switch}
</div>
{/template}
```
Besides that, in general, we do static analyses for the control flows, i.e., all
the conditions and tags within conditions should be exactly matched. The
following templates which contain control blocks that partially match the
previous blocks, are not supported.
**Bad code:**
```soy {.bad}
{template .t}
{ foo: bool}
{if $foo}
<div><div>
{/if}
{if $foo}
</div>
{/if}
{if $foo}
</div>
{/if}
{/template}
```
**Bad code:**
```soy {.bad}
{template .t}
{ foo: bool}
{ bar: bool}
{if $foo}
<div>
{else}
{if $bar}
<div><div>
{else}
<div>
{/if}
{/if}
// This matches with the extra tag in the first IfNode
{if $foo}
{else}
{if $bar}
</div>
{else}
{/if}
{/if}
// This matches the common prefix in the first IfNode
</div>
{/template}
```
### `for`
The following template is technically a valid HTML. The `for` loop opens three
div tags, and then we manually close all of them after the loop. However, we
decided to not support this case since the conditions of `for` could be dynamic
and there is no easy way to statically decide if this template a is valid HTML.
In this particular example, the compiler will report an exception at the
location of `<div>` tag in for loop. It will complain that this HTML open tag is
not closed within the current block.
**Bad code:**
```soy {.bad}
{template .t}
{for i in range(3)}
<div>
{/for}
</div></div></div>
{/template}
```
You can use a recursive template to create nesting:
**Good code:**
```soy {.good}
{template .t}
{ level: int}
{ content: html}
{if $level > 0}
<div>
{call .t}
{param level: $level - 1 /}
{param content: $content /}
{/call}
</div>
{else}
{$content}
{/if}
{/template}
```
The following template is another example that is a valid HTML but is not
supported by the compiler.
**Bad code:**
```soy {.bad}
{template .t}
{ a: list<string>}
{for $x in $a}
{if isFirst($x)}<ul>{/if}
<li>{$x}
{if isLast($x)}</ul>{/if}
{/for}
{/template}
```
`isFirst` and `isLast` are [built-in commands](control-flow#for) that check the
position of the current iterator. Although this template produces valid HTML (it
opens and closes `<ul>` exactly once if the list is non-empty), supporting this
pattern adds an additional layer of complexity to the compiler. It requires
checking function names, validating this particular AST structure, and do
special handling for these functions.
For this example, we recommend you to use the following template. It renders
exactly the same HTML, and is supported by the compiler.
**Good code:**
```soy {.good}
{template .t}
{ a: list<string>}
{if length($a) > 0}
<ul>
{for $x in $a}
<li>{$x}
{/for}
</ul>
{/if}
{/template}
```
### Optional tags
According to the HTML spec, some tags [can be
omitted](https://www.w3.org/TR/html5/syntax.html#optional-tags). In particular,
if certain criteria are met, the HTML parser can imply the start tags and/or end
tags.
`Closure Templates` supports part of these rules. The major differences between
what we support and the HTML spec:
* **Start tags must be presented.** In HTML spec, it is possible to omit start
tags under some circumstances. An example is that `<html>` and `<head>`
might be omitted. However, due to the rendering model of `Closure Templates`,
we don't support that.
* **Strict criteria are not enforced.** In HTML spec, most of the end tags can
be omitted based on the content model. For example, `</body>` may be omitted
if the `<body>` is not immediately followed by a comment. `Closure Templates`
assumes all unclosed optional tags are fine. Additional conditions such as
"is not immediately followed by a comment" are not enforced.
Some examples are:
**Good code:**
```soy {.good}
{template .t}
{ foo: bool}
{ bar: bool}
<html>
<head>
<ul>
// optional tag that does not close
<li>foo
<li>
// optional tag that closes and contains if nodes as its children
<li>b{if $foo}<b>{/if}a{if $bar}<i>{/if}r{if $bar}</i>{/if}{if $foo}</b>{/if}</li>
// optional tag that closes
<li>baz</li>
<li></li>
// optional tag that does not close and contains if nodes as its children
<li>b{if $foo}<b>{/if}a{if $bar}<i>{/if}r{if $bar}</i>{/if}{if $foo}</b>{/if}
</ul>
// html and head are automatically closed at the end
{/template}
```
Mixing complicated control flows and optional tags is also supported. For
example...
**Good code:**
```soy {.good}
{template .t}
{ foo: bool}
{ bar: bool}
<ul>
<li>foo
<li>b
{if $foo}<b>{/if}
a
{if $bar}<i>{/if}
r
{if $bar}</i>{/if}
{if $foo}</b>{/if}
<li>baz</li>
<li>
{if $foo}<li>{/if}{if $foo}</li>{/if}
</ul>
{/template}
```
### Foreign elements
HTML allows elements from non-HTML namespaces (such as MathML and SVG) to appear
in some contexts. MathML is *not* supported since it is deprecated and has
already been removed from Chrome. SVG is quite popular and supported.
Notably SVG uses XML spec, which means all tags must be closed and every tag is
allowed to be self-closing.
**Good code:**
```soy {.good}
{template .foreign_elements_simple}
<svg>
<path/>
<path></path>
<rect/>
<rect></rect>
</svg>
{/template}
```
**Good code:**
```soy {.good}
{template .foreign_elements_control_flow}
{ foo: bool}
<svg>
<path/>
<path></path>
{if $foo}<rect/>{/if}
<rect/>
<rect></rect>
{if $foo}<path>{/if}
{if $foo}</path>{/if}
</svg>
{/template}
```
We also enforce that SVG tags can only be opened and closed within the same
block.
**Bad code:**
```soy {.bad}
{template .fail_foreign_elements_across_block}
{ foo: bool}
{if $foo}
<svg>
{else}
<svg>
{/if}
</svg>
{/template}
```
## Known Issues
As we said before, we do not support full HTML spec. There are several general
limitations/complexities to keep in mind:
* Users write recursive templates, so the depth and structure of the DOM tree
is fully dynamic.
* The `Closure Templates` rendering model provides very limited information about
what we are rendering. For example, we don't know the current `DOCTYPE`, the
current `Content-Type headers`, or which `<meta>` tags have been
sent/specified. So we need to not care about the specifics that are
controlled by these details.
* Users render tag names dynamically, so we won't necessarily know the
specific tag names statically.
* `Closure Templates` allows for opaque content transclusions of HTML content, so
we won't necessarily know (statically) the number of predecessors of a given
tag.
On the other hand, there are some things we could theoretically validate, but we
don't currently since we believe them to be of limited utility. An example is
HTML [content models](https://www.w3.org/TR/html5/dom.html#content-models).
Supporting full content models is infeasible in `Closure Templates`; we can still
add separate rules such as don't put `<div>` inside `<p>` (`p` tag should only
[contain phrasing
contents](https://dev.w3.org/html5/spec-preview/the-p-element.html#the-p-element)),
but enforcing this in `Closure Templates` does not bring many benefits.
## Disabling HTML Validation
HTML validation is enabled by default. To disable it:
* Add `stricthtml="false"` to your templates or namespaces.
* Setting strict HTML mode in namespaces will set a default value for all
HTML templates in this file.
* Each HTML template can still override the default value inheriting from
the namespace.
* Non-HTML templates will not be affected at all.
* The compiler will report errors for invalid HTML templates.
In the following example, template `foo` will enforce strict HTML mode, but
templates `bar` and `baz` will not.
```soy
// By default, all HTML templates will enable stricthtml mode.
{namespace ns}
// This template does not override the default value, so it will have stricthtml
// mode enabled.
{template .foo}
...
{/template}
// This is a non-HTML template, and stricthtml mode does not apply for it.
{template .bar kind="text"}
...
{/template}
// This template override the default value and disable stricthtml mode.
{template baz stricthtml="false"}
...
{/template}
```
To enforce `stricthtml` in all templates, add a custom [Soy
conformance](../dev/conformance) rule
`com.google.template.soy.conformance.RequireStrictHtml`.