normalize-url
Version:
396 lines (264 loc) • 10.2 kB
Markdown
# normalize-url [](https://codecov.io/gh/sindresorhus/normalize-url)
> [Normalize](https://en.wikipedia.org/wiki/URL_normalization) a URL
Useful when you need to display, store, deduplicate, sort, compare, etc, URLs.
> [!NOTE]
> This package does **not** do URL sanitization. [Garbage in, garbage out.](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out) If you use this in a server context and accept URLs as user input, it's up to you to protect against invalid URLs, [path traversal attacks](https://owasp.org/www-community/attacks/Path_Traversal), etc.
## Install
```sh
npm install normalize-url
```
## Usage
```js
import normalizeUrl from 'normalize-url';
normalizeUrl('sindresorhus.com');
//=> 'http://sindresorhus.com'
normalizeUrl('//www.sindresorhus.com:80/../baz?b=bar&a=foo');
//=> 'http://sindresorhus.com/baz?a=foo&b=bar'
```
## API
### normalizeUrl(url, options?)
URLs with custom protocols are not normalized and just passed through by default. Supported protocols are: `https`, `http`, `file`, and `data`. Use the [`customProtocols`](#customprotocols) option to add support for additional protocols.
Human-friendly URLs with basic auth (for example, `user:password@sindresorhus.com`) are not handled because basic auth conflicts with custom protocols. [Basic auth URLs are also deprecated.](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#access_using_credentials_in_the_url)
#### url
Type: `string`
URL to normalize, including [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs).
#### options
Type: `object`
##### defaultProtocol
Type: `string`\
Default: `'http'`\
Values: `'https' | 'http'`
##### customProtocols
Type: `string[]`\
Default: `undefined`
Protocols to normalize in addition to the built-in ones (`https`, `http`, `file`, `data`).
Useful for HTTP-like custom protocols such as Electron schemes or app-specific protocols.
The protocols should be specified without `:`.
```js
normalizeUrl('sindre://www.sorhus.com', {customProtocols: ['sindre']});
//=> 'sindre://sorhus.com'
normalizeUrl('sindre://www.sorhus.com/foo/', {customProtocols: ['sindre']});
//=> 'sindre://sorhus.com/foo'
```
##### normalizeProtocol
Type: `boolean`\
Default: `true`
Prepend `defaultProtocol` to the URL if it's protocol-relative.
```js
normalizeUrl('//sindresorhus.com');
//=> 'http://sindresorhus.com'
normalizeUrl('//sindresorhus.com', {normalizeProtocol: false});
//=> '//sindresorhus.com'
```
##### forceHttp
Type: `boolean`\
Default: `false`
Normalize HTTPS to HTTP.
```js
normalizeUrl('https://sindresorhus.com');
//=> 'https://sindresorhus.com'
normalizeUrl('https://sindresorhus.com', {forceHttp: true});
//=> 'http://sindresorhus.com'
```
##### forceHttps
Type: `boolean`\
Default: `false`
Normalize HTTP to HTTPS.
```js
normalizeUrl('http://sindresorhus.com');
//=> 'http://sindresorhus.com'
normalizeUrl('http://sindresorhus.com', {forceHttps: true});
//=> 'https://sindresorhus.com'
```
This option cannot be used with the `forceHttp` option at the same time.
##### stripAuthentication
Type: `boolean`\
Default: `true`
Strip the [authentication](https://en.wikipedia.org/wiki/Basic_access_authentication) part of the URL.
```js
normalizeUrl('https://user:password@sindresorhus.com');
//=> 'https://sindresorhus.com'
normalizeUrl('https://user:password@sindresorhus.com', {stripAuthentication: false});
//=> 'https://user:password@sindresorhus.com'
```
##### stripHash
Type: `boolean`\
Default: `false`
Strip the hash part of the URL.
```js
normalizeUrl('sindresorhus.com/about.html#contact');
//=> 'http://sindresorhus.com/about.html#contact'
normalizeUrl('sindresorhus.com/about.html#contact', {stripHash: true});
//=> 'http://sindresorhus.com/about.html'
```
##### stripProtocol
Type: `boolean`\
Default: `false`
Remove the protocol from the URL: `http://sindresorhus.com` → `sindresorhus.com`.
It will only remove `https://` and `http://` protocols.
```js
normalizeUrl('https://sindresorhus.com');
//=> 'https://sindresorhus.com'
normalizeUrl('https://sindresorhus.com', {stripProtocol: true});
//=> 'sindresorhus.com'
```
##### stripTextFragment
Type: `boolean`\
Default: `true`
Strip the [text fragment](https://web.dev/text-fragments/) part of the URL.
**Note:** The text fragment will always be removed if the `stripHash` option is set to `true`, as the hash contains the text fragment.
```js
normalizeUrl('http://sindresorhus.com/about.html#:~:text=hello');
//=> 'http://sindresorhus.com/about.html#'
normalizeUrl('http://sindresorhus.com/about.html#section:~:text=hello');
//=> 'http://sindresorhus.com/about.html#section'
normalizeUrl('http://sindresorhus.com/about.html#:~:text=hello', {stripTextFragment: false});
//=> 'http://sindresorhus.com/about.html#:~:text=hello'
normalizeUrl('http://sindresorhus.com/about.html#section:~:text=hello', {stripTextFragment: false});
//=> 'http://sindresorhus.com/about.html#section:~:text=hello'
```
##### stripWWW
Type: `boolean`\
Default: `true`
Remove `www.` from the URL.
```js
normalizeUrl('http://www.sindresorhus.com');
//=> 'http://sindresorhus.com'
normalizeUrl('http://www.sindresorhus.com', {stripWWW: false});
//=> 'http://www.sindresorhus.com'
```
##### removeQueryParameters
Type: `Array<RegExp | string> | boolean`\
Default: `[/^utm_\w+/i]`
Remove query parameters that matches any of the provided strings or regexes.
Global and sticky regex flags are stripped.
```js
normalizeUrl('www.sindresorhus.com?foo=bar&ref=test_ref', {
removeQueryParameters: ['ref']
});
//=> 'http://sindresorhus.com/?foo=bar'
```
If a boolean is provided, `true` will remove all the query parameters.
```js
normalizeUrl('www.sindresorhus.com?foo=bar', {
removeQueryParameters: true
});
//=> 'http://sindresorhus.com'
```
`false` will not remove any query parameter.
```js
normalizeUrl('www.sindresorhus.com?foo=bar&utm_medium=test&ref=test_ref', {
removeQueryParameters: false
});
//=> 'http://www.sindresorhus.com/?foo=bar&ref=test_ref&utm_medium=test'
```
##### keepQueryParameters
Type: `Array<RegExp | string>`\
Default: `undefined`
Keeps only query parameters that matches any of the provided strings or regexes.
**Note:** It overrides the `removeQueryParameters` option.
Global and sticky regex flags are stripped.
```js
normalizeUrl('https://sindresorhus.com?foo=bar&ref=unicorn', {
keepQueryParameters: ['ref']
});
//=> 'https://sindresorhus.com/?ref=unicorn'
```
##### removeTrailingSlash
Type: `boolean`\
Default: `true`
Remove trailing slash.
**Note:** Trailing slash is always removed if the URL doesn't have a pathname unless the `removeSingleSlash` option is set to `false`.
```js
normalizeUrl('http://sindresorhus.com/redirect/');
//=> 'http://sindresorhus.com/redirect'
normalizeUrl('http://sindresorhus.com/redirect/', {removeTrailingSlash: false});
//=> 'http://sindresorhus.com/redirect/'
normalizeUrl('http://sindresorhus.com/', {removeTrailingSlash: false});
//=> 'http://sindresorhus.com'
```
##### removeSingleSlash
Type: `boolean`\
Default: `true`
Remove a sole `/` pathname in the output. This option is independent of `removeTrailingSlash`.
```js
normalizeUrl('https://sindresorhus.com/');
//=> 'https://sindresorhus.com'
normalizeUrl('https://sindresorhus.com/', {removeSingleSlash: false});
//=> 'https://sindresorhus.com/'
```
##### removeDirectoryIndex
Type: `boolean | Array<RegExp | string>`\
Default: `false`
Removes the default directory index file from path that matches any of the provided strings or regexes.
When `true`, the regex `/^index\.[a-z]+$/` is used.
Global and sticky regex flags are stripped.
```js
normalizeUrl('www.sindresorhus.com/foo/default.php', {
removeDirectoryIndex: [/^default\.[a-z]+$/]
});
//=> 'http://sindresorhus.com/foo'
```
##### removeExplicitPort
Type: `boolean`\
Default: `false`
Removes an explicit port number from the URL.
Port 443 is always removed from HTTPS URLs and 80 is always removed from HTTP URLs regardless of this option.
```js
normalizeUrl('sindresorhus.com:123', {
removeExplicitPort: true
});
//=> 'http://sindresorhus.com'
```
##### sortQueryParameters
Type: `boolean`\
Default: `true`
Sorts the query parameters alphabetically by key.
```js
normalizeUrl('www.sindresorhus.com?b=two&a=one&c=three', {
sortQueryParameters: false
});
//=> 'http://sindresorhus.com/?b=two&a=one&c=three'
```
##### emptyQueryValue
Type: `string`\
Default: `'preserve'`\
Values: `'preserve' | 'always' | 'never'`
Controls how query parameters with empty values are formatted.
- `'preserve'` - Keep the original format (`?key` stays `?key`, `?key=` stays `?key=`). If the same key appears with both formats (`?a&a=`), all instances will use the format without `=`.
- `'always'` - Always include `=` for empty values (`?key` becomes `?key=`)
- `'never'` - Never include `=` for empty values (`?key=` becomes `?key`)
```js
normalizeUrl('www.sindresorhus.com?a&b=', {emptyQueryValue: 'always'});
//=> 'http://sindresorhus.com/?a=&b='
normalizeUrl('www.sindresorhus.com?a&b=', {emptyQueryValue: 'never'});
//=> 'http://sindresorhus.com/?a&b'
```
##### removePath
Type: `boolean`\
Default: `false`
Removes the entire URL path, leaving only the domain.
```js
normalizeUrl('https://example.com/path/to/page', {
removePath: true
});
//=> 'https://example.com'
```
##### transformPath
Type: `Function`\
Default: `false`
Custom function to transform the URL path components. The function receives an array of non-empty path components and should return a modified array.
```js
// Keep only the first path component
normalizeUrl('https://example.com/api/v1/users', {
transformPath: (pathComponents) => pathComponents.slice(0, 1)
});
//=> 'https://example.com/api'
// Remove specific components
normalizeUrl('https://example.com/admin/users', {
transformPath: (pathComponents) => pathComponents.filter(c => c !== 'admin')
});
//=> 'https://example.com/users'
```
## Related
- [compare-urls](https://github.com/sindresorhus/compare-urls) - Compare URLs by first normalizing them