@worker-tools/parsed-html-rewriter
Version:
A DOM-based implementation of Cloudflare Worker's HTMLRewriter.
58 lines (43 loc) • 2.58 kB
Markdown
# Parsed HTML Rewriter
A DOM-based implementation of [Cloudflare Worker's `HTMLRewriter`](https://developers.cloudflare.com/workers/runtime-apis/html-rewriter).
***
___UPDATE: While this module works just fine, I've made [a new verison](https://github.com/worker-tools/html-rewriter) that is WASM/streaming based for much better performance.___
***
Unlike the original, this implementation parses the entire DOM (provided by [`linkedom`](https://github.com/WebReflection/linkedom)),
and runs selectors against this representation. As a result, it is slower, more memory intensive, and can't process streaming data.
Note that this approach was chosen to quickly implement the functionality of `HTMLRewriter`, as there is currently no JS implementation available.
A better implementation would replicate the streaming approach of [`lol-html`](https://github.com/cloudflare/lol-html), or even use a WebAssembly version of it. _Update: [Now available here](https://github.com/worker-tools/html-rewriter)_.
However, this implementation should run in most JS contexts (including Web Workers, Service Workers and Deno) without modification and handle many, if not most, use cases of `HTMLRewriter`.
It should be good enough for testing and offline Workers development.
## Usage
This module can be used in two ways.
As a standalone module:
```ts
import { ParsedHTMLRewriter } from '-tools/parsed-html-rewriter'
await new ParsedHTMLRewriter()
.transform(new Response('<body></body>'))
.text();
```
Or as a polyfill:
```ts
import '-tools/parsed-html-rewriter/polyfill'
await new HTMLRewriter() // Will use the native version when running in a Worker
.transform(new Response('<body></body>'))
.text();
```
### innerHTML
Unlike the current (March 2021) version on CF Workers, this implementation already supports the [proposed `innerHTML` handler](https://github.com/cloudflare/lol-html/issues/40#issuecomment-567126687).
Note that this feature is unstable and will likely change as the real version materializes.
```ts
await new HTMLRewriter()
.on('body', {
innerHTML(html) {
console.log(html) // => '<div id="foo">bar</div>'
},
})
.transform(new Response('<body><div id="foo">bar</div></body>'))
.text();
```
## Caveats
- Because this version isn't based on streaming data, the order in which handlers are called can differ. Some measure have been taken to simulate the order, but differences may occur.
- Texts never arrive in chunks. There is always just one chunk, followed by an empty one with `lastInTextNode` set to `true`.