@jsdevtools/rehype-url-inspector
Version:
A rehype plugin to inspect, validate, or rewrite URLs anywhere in an HTML document
246 lines (184 loc) • 10.3 kB
Markdown
Rehype URL Inspector
==============================
### A [rehype](https://github.com/rehypejs/rehype) plugin to inspect, validate, or rewrite URLs anywhere in an HTML document
[](https://github.com/JS-DevTools/rehype-url-inspector/actions)
[](https://github.com/JS-DevTools/rehype-url-inspector/actions)
[](https://coveralls.io/github/JS-DevTools/rehype-url-inspector)
[](https://david-dm.org/JS-DevTools/rehype-url-inspector)
[](https://www.npmjs.com/package/@jsdevtools/rehype-url-inspector)
[](LICENSE)
[](https://plant.treeware.earth/JS-DevTools/rehype-url-inspector)
Features
--------------------------
- Inspect every URL on an HTML page and do whatever you want to, such as:
- Normalize URLs
- Check for broken links
- Replace URLs with different URLs
- Add attributes (like `target="blank"`) to certain links
- Finds **all types of URLs** by default, such as:
- `<a href="http://example.com">`
- `<img src="img/logo.png">`
- `<link rel="stylesheet" href="/css/main.css">`
- `<link rel="manifest" href="/site.manifest">`
- `<meta rel="canonical" content="https://example.com/some/page/">`
- `<meta property="og:image" content="img/logo.png">`
- `<script src="//example.com/script.js">`
- `<script type="application/ld+json">{"url": "www.example.com"}</script>`
- `<style>body { background: url("/img/background.png"); }</style>`
- You can remove the built-in URL rules
- You can add your own **custom URL rules**
- You can abort the URL search at any time
Example
--------------------------
**example.html**<br>
This HTML file contains many different types of URLs:
```html
<html>
<head>
<link rel="canonical" href="http://example.com/some/page/">
<link rel="manifest" href="/site.webmanifest">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon.png">
<link rel="stylesheet" type="text/css" href="/css/main.css?v=5">
<meta name="twitter:url" content="http://example.com/some/page/">
<meta name="twitter:image" content="http://example.com/img/logo.png">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"headline": "Hello, World!",
"url": "http://example.com/some/page/",
"image": "http://example.com/img/logo.png"
}
</script>
<style>
body {
background: #ffffff url("img/background.png") center center no-repeat;
}
</style>
</head>
<body>
<h1>
<a href="/">
<img src="/img/logo.png"> Hello World
</a>
</h1>
<p>
<a href="//external.com" target="_blank">Lorem ipsum</a> dolor sit amet,
non dignissim dolor. Sed diam tellus, <a href="some-page.html">malesuada, dictum nulla</a>.
</p>
<script src="//external.com/script.js"></script>
</body>
</html>
```
**example.js**<br>
This script reads the `example.html` file above and finds all the URLs in it. The script uses [unified](https://unifiedjs.com/), [rehype-parse](https://github.com/rehypejs/rehype/tree/master/packages/rehype-parse), [rehype-stringify](https://github.com/rehypejs/rehype/tree/master/packages/rehype-stringify), and [to-vfile](https://github.com/vfile/to-vfile).
```javascript
const unified = require("unified");
const parse = require("rehype-parse");
const inspectUrls = require("@jsdevtools/rehype-url-inspector");
const stringify = require("rehype-stringify");
const toVFile = require("to-vfile");
async function example() {
// Create a Rehype processor with the inspectUrls plugin
const processor = unified()
.use(parse)
.use(inspectUrls, {
inspectEach({ url }) {
// Log each URL
console.log(url);
}
})
.use(stringify);
// Read the example HTML file
let file = await toVFile.read("example.html");
// Crawl the HTML file and find all the URLs
await processor.process(file);
}
example();
```
Running this script produces the following output:
```
http://example.com/some/page/
/site.webmanifest
/img/favicon.png
/css/main.css?v=5
http://example.com/some/page/
http://example.com/img/logo.png
http://schema.org
http://example.com/some/page/
http://example.com/img/logo.png
img/background.png
/
/img/logo.png
//external.com
some-page.html
//external.com/script.js
```
Installation
--------------------------
You can install Rehype URL Inspector via [npm](https://docs.npmjs.com/about-npm/).
```bash
npm install @jsdevtools/rehype-url-inspector
```
You'll probably want to install [unified](https://unifiedjs.com/), [rehype-parse](https://github.com/rehypejs/rehype/tree/master/packages/rehype-parse), [rehype-stringify](https://github.com/rehypejs/rehype/tree/master/packages/rehype-stringify), and [to-vfile](https://github.com/vfile/to-vfile) as well.
```bash
npm install unified rehype-parse rehype-stringify to-vfile
```
Usage
--------------------------
Using the URL Inspector plugin requires an understanding of how to use Unified and Rehype. [Here is an excelleng guide](https://unifiedjs.com/using-unified.html) to learn the basics.
The URL Inspector plugin works just like any other Rehype plugin. Pass it to [the `.use()` method](https://github.com/unifiedjs/unified#processoruseplugin-options) with an [options object](#options).
```javascript
const unified = require("unified");
const inspectUrls = require("@jsdevtools/rehype-url-inspector");
// Use the Rehype URL Inspector plugin with custom options
unified().use(inspectUrls, {
inspect(urls) { ... }, // This function is called once with ALL of the URLs
inspectEach(url) { ... }, // This function is called for each URL as it's found
selectors: [
"a[href]", // Only search for links, not other types of URLs
"div[data-image]" // CSS selectors for custom URL attributes
]
});
```
Options
--------------------------
Rehype URL Inspector supports the following options:
|Option |Type |Default |Description
|:---------------------|:-------------------|:----------------------|:-----------------------------------------
|`selectors` |array of strings, objects, and/or functions |[built-in selectors](src/selectors.ts) |Selectors indicate where to look for URLs in the document. Each selector can be a CSS attribute selector string, like `a[href]` or `img[src]`, or a function that accepts a [HAST node](https://github.com/syntax-tree/hast) and returns its URL(s). See [`extractors.ts`](src/extractors.ts) for examples.
|`keepDefaultSelectors`|boolean |`false` |Whether to keep the default selectors in addition to any custom ones.
|`inspect` |function |no-op |A function that is called _once_ and receives an array containing all the URLs in the document
|`inspectEach` |function |no-op |A function that is called for _each_ URL in the document as it's found. Return `false` to abort the search and skip the rest of the document.
URL Objects
--------------------------
The `inspectEach()` function receives a [`UrlMatch` object](src/types.ts). The `inspect()` function receves an array of `UrlMatch` objects. Each object has the following properties:
|Property |Type |Description
|:----------------------|:--------------------|:------------------------------------
|`url` |string |The URL that was found
|`propertyName` |string or undefined |The name of the [HAST node property](https://github.com/syntax-tree/hast#properties) where the URL was found, such as `"src"` or `"href"`. If the URL was found in the text content of the node, then `propertyName` is `undefined`.
|`node` |object |The [HAST Element node](https://github.com/syntax-tree/hast#element) where the URL was found. **You can make changes to this node**, such as re-writing the URL, adding additional attributes, etc.
|`root` |object |The [HAST Root node](https://github.com/syntax-tree/hast#root). This gives you access to the whole document if you need it.
|`file` |object |The [File object](https://github.com/vfile/vfile) that gives you information about the HTML file itself, such as the path and file name.
Contributing
--------------------------
Contributions, enhancements, and bug-fixes are welcome! [Open an issue](https://github.com/JS-DevTools/rehype-url-inspector/issues) on GitHub and [submit a pull request](https://github.com/JS-DevTools/rehype-url-inspector/pulls).
#### Building
To build the project locally on your computer:
1. __Clone this repo__<br>
`git clone https://github.com/JS-DevTools/rehype-url-inspector.git`
2. __Install dependencies__<br>
`npm install`
3. __Build the code__<br>
`npm run build`
4. __Run the tests__<br>
`npm test`
License
--------------------------
Rehype URL Inspector is 100% free and open-source, under the [MIT license](LICENSE). Use it however you want.
This package is [Treeware](http://treeware.earth). If you use it in production, then we ask that you [**buy the world a tree**](https://plant.treeware.earth/JS-DevTools/rehype-url-inspector) to thank us for our work. By contributing to the Treeware forest you’ll be creating employment for local families and restoring wildlife habitats.
Big Thanks To
--------------------------
Thanks to these awesome companies for their support of Open Source developers ❤
[](https://travis-ci.com)
[](https://saucelabs.com)
[](https://coveralls.io)