url-regex-unsafe
Version:
Regular expression matching for URL's. Maintained, and browser-friendly version of url-regex. This package is vulnerable to CVE-2020-7661. Works in Node v10.12.0+ and browsers.
163 lines (104 loc) • 10.9 kB
Markdown
# url-regex-unsafe
[](https://travis-ci.com/ocavue/url-regex-unsafe)
[](https://codecov.io/gh/ocavue/url-regex-unsafe)
[](https://github.com/sindresorhus/xo)
[](https://github.com/prettier/prettier)
[](https://lass.js.org)
[](LICENSE)
[](https://npm.im/url-regex-unsafe)
> Regular expression matching for URL's. Maintained, and browser-friendly version of url-regex. This package is vulnerable to [CVE-2020-7661](cve). Works in Node v10.12.0+ and browsers.
## Table of Contents
* [Foreword](#foreword)
* [Install](#install)
* [Usage](#usage)
* [Node](#node)
* [Browser](#browser)
* [Options](#options)
* [Quick tips and migration from url-regex](#quick-tips-and-migration-from-url-regex)
* [Contributors](#contributors)
* [License](#license)
## Foreword
url-regex-unsafe is a fork of [url-regex-safe][], which is a fork of [url-regex][]. [url-regex-safe][] has resolved [CVE-2020-7661][cve] on Node by including [RE2][] for Node.js usage. However, [RE2][] does not support lookahead assertions in regular expressions, which leads to some [limitations][url-regex-safe-limitations]. To avoid these limitations, url-regex-unsafe gets rid of [RE2][] and uses built-in RegExp instead. This means that url-regex-unsafe is still vulnerable to [CVE-2020-7661][cve].
## Install
[npm][]:
```sh
npm install url-regex-unsafe
```
[yarn][]:
```sh
yarn add url-regex-unsafe
```
## Usage
### Node
```js
const urlRegexUnsafe = require('url-regex-unsafe');
const str = 'some long string with url.com in it';
const matches = str.match(urlRegexUnsafe());
for (const match of matches) {
console.log('match', match);
}
console.log(urlRegexUnsafe({ exact: true }).test('github.com'));
```
### Browser
#### VanillaJS
This is the solution for you if you're just using `<script>` tags everywhere!
```html
<script src="https://unpkg.com/url-regex-unsafe"></script>
<script type="text/javascript">
(function () {
var str = 'some long string with url.com in it';
var matches = str.match(urlRegexUnsafe());
for (var i = 0; i < matches.length; i++) {
console.log('match', matches[i]);
}
console.log(urlRegexUnsafe({ exact: true }).test('github.com'));
})();
</script>
```
#### Bundler
Assuming you are using [browserify][], [webpack][], [rollup][], or another bundler, you can simply follow [Node](#node) usage above.
#### TypeScript
This package has built-in support for [TypeScript](https://www.typescriptlang.org/).
## Options
| Property | Type | Default Value | Description | |
| ---------------- | ------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - |
| `exact` | Boolean | `false` | Only match an exact String. Useful with `regex.test(str)` to check if a String is a URL. We set this to `false` by default in order to match String values such as `github.com` (as opposed to requiring a protocol or `www` subdomain). We feel this closely more resembles real-world intended usage of this package. | |
| `strict` | Boolean | `false` | Force URL's to start with a valid protocol or `www` if set to `true`. If `true`, then it will allow any TLD as long as it is a minimum of 2 valid characters. If it is `false`, then it will match the TLD against the list of valid TLD's using [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
| `auth` | Boolean | `false` | Match against Basic Authentication headers. We set this to `false` by default since [it was deprecated in Chromium](https://bugs.chromium.org/p/chromium/issues/detail?id=82250#c7), and otherwise it leaves the user with unwanted URL matches (more closely resembles real-world intended usage of this package by having it set to `false` by default too). | |
| `localhost` | Boolean | `true` | Allows localhost in the URL hostname portion. See the [test/test.js](test/test.js) for more insight into the localhost test and how it will return a value which may be unwanted. A pull request would be considered to resolve the "pic.jp" vs. "pic.jpg" issue. | |
| `parens` | Boolean | `false` | Match against Markdown-style trailing parenthesis. We set this to `false` because it should be up to the user to parse for Markdown URL's. | |
| `apostrophes` | Boolean | `false` | Match against apostrophes. We set this to `false` because we don't want the String `background: url('http://example.com/pic.jpg');` to result in `http://example.com/pic.jpg'`. See this [issue](https://github.com/kevva/url-regex/pull/55) for more information. | |
| `trailingPeriod` | Boolean | `false` | Match against trailing periods. We set this to `false` by default since real-world behavior would want `example.com` versus `example.com.` as the match (this is different than [url-regex][] where it matches the trailing period in that package). | |
| `ipv4` | Boolean | `true` | Match against IPv4 URL's. | |
| `ipv6` | Boolean | `true` | Match against IPv6 URL's. | |
| `tlds` | Array | [tlds](https://github.com/stephenmathieson/node-tlds#readme) | Match against a specific list of tlds, or the default list provided by [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
| `returnString` | Boolean | `false` | Return the RegExp as a String instead of a `RegExp` (useful for custom logic, such as we did with [Spam Scanner][spam-scanner]). | |
## Quick tips and migration from url-regex
You must override the default and set `strict: true` if you do not wish to match `github.com` by itself (though `www.github.com` will work if `strict: false`).
Unlike the deprecated and unmaintained package [url-regex][], we do a few things differently:
* We set `strict` to `false` by default ([url-regex][] had this set to `true`)
* We added an `auth` option, which is set to `false` by default ([url-regex][] matches against Basic Authentication; had this set to `true` - however this is a deprecated behavior in Chromium).
* We added `parens` and `ipv6` options, which are set to `false` and `true` by default ([url-regex][] had `parens` set to `true` and `ipv6` was non-existent or set to `false` rather).
* We added an `apostrophe` option, which is set to `false` by default ([url-regex][] had this set to `true`).
* We added a `trailingPeriod` option, which is set to `false` by default (which means matches won't contain trailing periods, whereas [url-regex][] had this set to `true`).
## Contributors
| Name | Website |
| -------------------- | ---------------------------- |
| **ocavue** | <https://github.com/ocavue/> |
| **Nick Baugh** | <http://niftylettuce.com/> |
| **Kevin Mårtensson** | |
| **Diego Perini** | |
## License
[MIT](LICENSE) © ocavue
##
[npm]: https://www.npmjs.com/
[yarn]: https://yarnpkg.com/
[cve]: https://nvd.nist.gov/vuln/detail/CVE-2020-7661
[re2]: https://github.com/uhop/node-re2
[browserify]: https://github.com/browserify/browserify
[webpack]: https://github.com/webpack/webpack
[rollup]: https://github.com/rollup/rollup
[url-regex]: https://github.com/kevva/url-regex
[url-regex-safe]: https://github.com/spamscanner/url-regex-safe
[url-regex-safe-limitations]: https://github.com/spamscanner/url-regex-safe/tree/v3.0.0#limitations
[spam-scanner]: https://spamscanner.net