metalsmith-broken-link-checker
Version:
Metalsmith plugin to check for internal broken links
159 lines (107 loc) • 5.79 kB
Markdown
# metalsmith-broken-link-checker
[![npm version][npm-badge]][npm-url]
[![Build Status][travis-badge]][travis-url]
[![Dependency Status][david-badge]][david-url]
[Metalsmith][] plugin to check for internal broken links
## About
Small typos can often result in *unexpected* broken links. This plugin aims to catch them as early as possible.
It checks for *relative* and *root-relative* links which do not have a corresponding file in the Metalsmith pipeline. It (currently) ignores all *absolute* links.
Any broken links will cause an `Error` to be thrown (or a warning to be printed if `options.warn` is set).
By default, all `href` attributes of `<a>` tags and all `src` attributes of `<img>` tags are checked.
This plugin uses [cheerio](https://www.npmjs.com/package/cheerio) to find link and image tags and [urijs](https://www.npmjs.com/package/urijs) to manipulate URLs.
I also wrote a [blog post about what I learned developing this plugin](https://davidxmoody.com/publishing-my-first-npm-package/).
## Example
In your Metalsmith source dir, you have the following file (`dir1/test-file.html`):
```html
<!DOCTYPE html>
<html>
<body>
<a href="/a.html">(Root-relative link) Error if 'a.html' not in files</a>
<a href="a.html">(Relative link) Error if 'dir1/a.html' not in files</a>
<a href="./a.html">(Relative link) Error if 'dir1/a.html' not in files</a>
<a href="../a.html">(Relative link) Error if 'a.html' not in files</a>
<a href="/">(Root-relative link to dir) Error if 'index.html' not in files</a>
<a href="dir2/">(Relative link to dir) Error if 'dir1/dir2/index.html' not in files</a>
<a href="#fragment">(Hash fragment link) Always valid</a>
<a href="/dir2/#fragment">(Hash fragment link) Error if 'dir2/index.html' not in files</a>
<a>Missing href attribute, always broken</a>
<img src="testimg.jpg" alt="(Relative link) Error if 'dir1/testimg.jpg' not in files">
<img src="/testimg.jpg" alt="(Root-relative link) Error if 'testimg.jpg' not in files">
</body>
</html>
```
Note that links to directories are allowed if they have a trailing slash (the `index.html` file will be looked for). However links to directories without a trailing slash are not allowed unless the `allowRedirects` option is set.
## Installation
```
$ npm install --save metalsmith-broken-link-checker
```
## CLI Usage
In `metalsmith.json`:
```json
{
"source": "src",
"destination": "build",
"plugins": {
"metalsmith-broken-link-checker": true
}
}
```
## JavaScript Usage
```javascript
var Metalsmith = require('metalsmith')
var blc = require('metalsmith-broken-link-checker')
Metalsmith(__dirname)
// Build your full site here...
.use(blc(options))
.build()
```
### Options
#### `allowRegex` (optional, default: *null* )
- Optional regex gets matched against every found URL
- Use it if you want to allow some specific URLs which would otherwise get recognised as broken
#### `allowRedirects` (optional, default: *false* )
- If *false* then links to directories will only be allowed if the link ends with a trailing slash (e.g. `dir1/`)
- If *true* then links to directories will be allowed with or without a trailing slash (e.g. `dir1/` or `dir1`)
#### `allowAnchors` (optional, default: *true* )
- An anchor is an `<a>` tag with a `name` attribute but no `href` attribute (used for jumping to elements on a page with hash fragments)
- For example, with `allowAnchors` set to `true`, the following would be allowed: `<a name="anchor">Anchor text</a>`
#### `checkImages` (optional, default: *true* )
- Specifies whether or not to check `src` attributes of `<img>` tags
#### `checkLinks` (optional, default: *true* )
- Specifies whether or not to check `href` attributes of `<a>` tags
#### `checkAnchors` (optional, default: *false* )
- Specifies whether or not to check the validity of hash fragments in links
- For example `file.html#someid` could link to a valid file but the file content could be missing the `someid` id or name on an element
#### `warn` (optional, default: *false* )
- If *false* then throw an `Error` when encountering the first broken link
- If *true* then print warnings to stderr for every broken link
#### `baseURL` (optional, default: *null* )
- Should start with a slash, e.g. `/base`
- Meant for sites which will be hosted within a subdirectory of another site
- For example, if the output of your Metalsmith build will be hosted at `http://example.com/base/` then links to `/base/dir/file.html` will be valid if `dir/file.html` exists in the metalsmith pipeline
## History
- **1.0.2** Handle spaces in filenames correctly
- **1.0.1** Add filenames to warnings
- **1.0.0**
- CoffeeScript -> JavaScript rewrite
- Requires Node v6+ to run (breaking change)
- Add anchor target checking
- **0.1.11** Allow anchors with id attributes
- **0.1.10** Add baseURL option
- **0.1.9** Add CI badges to README
- **0.1.8** Add allowRedirects option
- **0.1.7** Change URIjs to urijs
- **0.1.6** Fix forward slash Windows path bug
- **0.1.5** Add allowAnchors option
- **0.1.4** Normalize file paths for Windows
- **0.1.3** Fail in the correct way for missing href attribute
- **0.1.2** Updated README
- **0.1.1** Updated README
- **0.1.0** First release
[Metalsmith]: https://github.com/metalsmith/metalsmith
[npm-badge]: https://img.shields.io/npm/v/metalsmith-broken-link-checker.svg
[npm-url]: https://npmjs.com/package/metalsmith-broken-link-checker
[travis-badge]: https://travis-ci.org/davidxmoody/metalsmith-broken-link-checker.svg
[travis-url]: https://travis-ci.org/davidxmoody/metalsmith-broken-link-checker
[david-badge]: https://david-dm.org/davidxmoody/metalsmith-broken-link-checker.svg
[david-url]: https://david-dm.org/davidxmoody/metalsmith-broken-link-checker