reregexp
Version:
Generate a random string match a given regular expression, suitable for mocking strings.
198 lines (144 loc) • 6.32 kB
Markdown
# reregexp
[](https://badge.fury.io/js/reregexp) [](https://travis-ci.org/suchjs/reregexp)
[](https://coveralls.io/github/suchjs/reregexp?branch=master)
Generate a matched string with a given regular expression, it's useful if you want to mock some strings from a regexp rule. It strictly abide by the standard javascript regex rule, but you still need pay attentions with the [Special cases](#special-cases).
## Goals
- Support named capture group, e.g. `(?<named>\w)\k<named>`, and also allowing to override it by expose a config field `namedGroupConf`.
- Support unicode property class `\p{Lu}` by setting the static `UPCFactory` handle, see the example for more details.
- Support `u` flag, so you can use unicode ranges.
- Allow you get the capture group values.
## Installation
```bash
# npm
npm install --save reregexp
# or yarn
yarn add reregexp
```
## Usage
```javascript
// Commonjs module
const ReRegExp = require('reregexp').default;
// ESM module
// since v1.6.1
import ReRegExp from 'reregexp';
// before v1.6.1
import re from 'reregexp';
const ReRegExp = re.default;
// For the first parameter of the constructor
// You can use a regex literal or a RegExp string
// if you need use some features that are not well supported by all browsers
// such as a named group, you should always choose a RegExp string
// Example 1: use group reference
const r1 = new ReRegExp(/([a-z0-9]{3})_\1/);
r1.build(); // => 'a2z_a2z' '13d_13d'
// Example 2: use named group
const r2 = new ReRegExp(/(?<named>\w{1,2})_\1_\k<named>/);
r2.build(); // => 'b5_b5_b5' '9_9_9'
// Example 3: use named group and with `namedGroupConf` config
// it will use the string in the config insteadof the string that will generated by the named group
// of course, it will trigger an error if the string in config not match the rule of named group.
const r3 = new ReRegExp('/(a)\\1(?<named>b)\\k<named>(?<override>\\w+)/', {
namedGroupConf: {
override: ['cc', 'dd'],
},
});
r3.build(); // => "aabbcc" "aabbdd"
// Example 4: use a character set
const r4 = new ReRegExp(/[^\w\W]+/);
r4.build(); // will throw error, because the [^\w\W] will match nothing.
// Example 5: also a character set with negative operator
const r5 = new ReRegExp(/[^a-zA-Z0-9_\W]/);
r5.build(); // will throw error, this is the same as [^\w\W]
// Example 6: with the `i` flag, ignore the case.
const r6 = new ReRegExp(/[a-z]{3}/i);
r6.build(); // => 'bZD' 'Poe'
// Example 7: with the `u` flag, e.g. make some chinese characters.
const r7 = new ReRegExp('/[\\u{4e00}-\\u{9fcc}]{5,10}/u');
r7.build(); // => '偤豄酌菵呑', '孜垟与醽奚衜踆猠'
// Example 8: set a global `maxRepeat` when use quantifier such as '*' and '+'.
ReRegExp.maxRepeat = 10;
const r8 = new ReRegExp(/a*/);
r8.build(); // => 'aaaaaaa', 'a' will repeated at most 10 times.
// Example 9: use a `maxRepeat` in constructor config, it will override `maxRepeat` of the global.
const r9 = new ReRegExp(/a*/, {
maxRepeat: 20,
});
r9.build(); // => 'aaaaaaaaaaaaaa', 'a' will repeated at most 20 times
// Example 10: use a `extractSetAverage` config for character sets.
const r10 = new ReRegExp(/[\Wa-z]/, {
// \W will extract as all the characters match \W, a-z now doesn't have the same chance as \W
extractSetAverage: true,
});
// Example 11: use a `capture` config if cared about the capture data
const r11 = new ReRegExp(/(aa?)b(?<named>\w)/), {
capture: true, // if you cared about the group capture data, set the `capture` config true
});
r11.build(); // => 'abc'
console.log(r11.$1); // => 'a'
console.log(r11.$2); // => 'c'
console.log(r11.groups); // => {named: 'c'}
// Example 12: use the unicode property class by setting the `UPCFactory`
ReRegExp.UPCFactory = (data: UPCData) => {
/*
UPCData: {
negate: boolean; // if the symbol is 'P'
short: boolean; // take '\pL' as a short for '\p{Letter}'
key?: string; // if has a unicode property name, such as `Script`
value: string; // unicode property value, binary or non-binary
}
*/
return {
generate(){
return 'x'; // return an object that has a `generate` method.
}
}
};
const r12 = new ReRegExp('/\\p{Lu}/u');
console.log(r12.build()); // => 'x', should handle in the `UPCFactory` method.
```
## Config
```typescript
// The meaning of the config fields can seen in the examples.
{
maxRepeat?: number;
namedGroupConf?: {
[index: string]: string[]|boolean;
};
extractSetAverage?: boolean;
capture?: boolean;
}
```
## Supported flags
- `i` ignore case, `/[a-z]/i` is same as `/[a-zA-Z]/`
- `u` unicode flag
- `s` dot all flag
the flags `g` `m` `y` will ignore.
## Methods
`.build()`
build a string that match the regexp.
`.info()`
get a regexp parsed queues, flags, lastRule after remove named captures.
```javascript
{
rule: '',
context: '',
flags: [],
lastRule: '',
queues: [],
}
```
## Build precautions,do not use any regexp anchors.
1. `^` `$` the start,end anchors will be ignored.
2. `(?=)` `(?!)` `(?<=)` `(?<!)` the regexp lookhead,lookbehind will throw an error when run `build()`.
3. `\b` `\B` will be ignored.
## Special cases
1. `/\1(o)/` the capture group `\1` will match null, the `build()` will just output `o`, and `/^\1(o)$/.test('o') === true`
2. `/(o)\1\2/` the capture group `\2` will treated as code point of unicode. so the `build()` will output `oo\u0002`. `/^(o)\1\2$/.test('oo\u0002') === true`
3. `/(o\1)/` the capture group `\1` will match null, `build()` will output `o`, `/^(o\1)$/.test('o') === true`
4. `/[]/` empty character class, the `build()` method will throw an error, because no character will match it.
5. `/[^]/` negative empty character class, the `build()` method will output any character.
6. `/[^\w\W]/` for the negative charsets, if all the characters are eliminated, the `build()` will throw an error. the same such as `/[^a-zA-Z0-9_\W]/`、`/[^\s\S]/`...
## Questions & Bugs?
Welcome to report to us with [issue](https://github.com/suchjs/reregexp/issues) if you meet any question or bug.
## License
[MIT License](./LICENSE).