RegExp

The RegExp object provides the function of regular expressions.

Overview

Regular expression is a way of expressing text patterns (ie string structure). It is a bit like a string template and is often used to match text according to a "given pattern". For example, a regular expression gives an email address pattern, and then uses it to determine whether a string is an email address. JavaScript's regular expression system is based on Perl 5.

There are two ways to create a new regular expression. One is to use literals, with slashes indicating the beginning and the end.

var regex = /xyz/;

The other is to use the RegExp constructor.

var regex = new RegExp("xyz");

The above two wordings are equivalent, and both create a new regular expression object with the content xyz. The main difference between them is that the first method creates a new regular expression when the engine compiles the code, and the second method creates a new regular expression at runtime, so the former is more efficient. Moreover, the former is more convenient and intuitive, so in practical applications, literals are basically used to define regular expressions.

The RegExp constructor can also accept a second parameter, which represents a modifier (see below for detailed explanation).

var regex = new RegExp("xyz", "i");
// Equivalent to
var regex = /xyz/i;

In the above code, the regular expression /xyz/ has a modifier i.

Instance attributes

The instance properties of regular objects are divided into two categories.

One type is modifier-related, which is used to understand what modifiers are set.

-RegExp.prototype.ignoreCase: returns a boolean value indicating whether the i modifier is set. -RegExp.prototype.global: returns a boolean value indicating whether the g modifier is set. -RegExp.prototype.multiline: returns a boolean value indicating whether the m modifier is set. -RegExp.prototype.flags: returns a string containing all modifiers that have been set, sorted alphabetically.

The above four attributes are all read-only.

var r = /abc/gim;

r.ignoreCase; // true
r.global; // true
r.multiline; // true
r.flags; //'gim'

The other category is attributes that have nothing to do with modifiers, mainly the following two.

-RegExp.prototype.lastIndex: returns an integer, indicating the position where the search will start next time. This attribute is readable and writable, but it is meaningful only when performing continuous search. For details, please refer to the following text. -RegExp.prototype.source: Returns the string form of the regular expression (excluding the backslash). This property is read-only.

var r = /abc/gim;

r.lastIndex; // 0
r.source; // "abc"

Example method

RegExp.prototype.test()

The test method of the regular instance object returns a boolean value indicating whether the current pattern can match the parameter string.

/cat/.test("cats and dogs"); // true

The above code verifies whether the parameter string contains cat, and the result returns true.

If the regular expression has the g modifier, each time the test method will start and match backwards from the position where it ended last time.

var r = /x/g;
var s = "_x_x";

r.lastIndex; // 0
r.test(s); // true

r.lastIndex; // 2
r.test(s); // true

r.lastIndex; // 4
r.test(s); // false

The regular expression of the above code uses the g modifier, which means that it is a global search and there will be multiple results. Next, use the test method three times, and each time the search starts is the position after the previous match.

With the g modifier, you can specify the location to start the search through the lastIndex property of the regular object.

var r = /x/g;
var s = "_x_x";

r.lastIndex = 4;
r.test(s); // false

r.lastIndex; // 0
r.test(s);

The above code specifies to start the search from the fifth position of the string. This position is empty, so false is returned. At the same time, the lastIndex property is reset to 0, so the second execution of r.test(s) will return true.

Note that with the g modifier, the regular expression will remember the last lastIndex property. At this time, the string to be matched should not be replaced, otherwise there will be some undetectable errors.

var r = /bb/g;
r.test("bb"); // true
r.test("-bb-"); // false

In the above code, because the regular expression r is matched from the previous lastIndex position, an unexpected result occurs when the test method is executed for the second time.

The lastIndex attribute is only valid for the same regular expression, so the following is wrong.

var count = 0;
while (/a/g.test("babaa")) count++;

The above code will cause an infinite loop, because each matching condition of the while loop is a new regular expression, resulting in the lastIndex property always being equal to 0.

If the regular pattern is an empty string, all strings are matched.

new RegExp("").test("abc");
// true

RegExp.prototype.exec()

The exec() method of the regular instance object is used to return the matching result. If a match is found, an array is returned, and the members are substrings that match successfully, otherwise null is returned.

var s = "_x_x";
var r1 = /x/;
var r2 = /y/;

r1.exec(s); // ["x"]
r2.exec(s); // null

In the above code, the regular object r1 matches successfully and returns an array, and the members are the matching results; the regular object r2 fails to match and returns null.

If the regular expression contains parentheses (that is, contains "group matching"), the returned array will include multiple members. The first member is the result of the entire match, and the following members are the matched group corresponding to the parentheses. In other words, the second member corresponds to the first bracket, the third member corresponds to the second bracket, and so on. The length property of the entire array is equal to the number of group matches plus one.

var s = "_x_x";
var r = /_(x)/;

r.exec(s); // ["_x", "x"]

The exec() method of the above code returns an array. The first member is the result of the entire match, and the second member is the result of the parentheses match.

The return array of the exec() method also contains the following two attributes:

-input: The entire original string. -index: The starting position of successful pattern matching (counting from 0).

var r = /a(b+)a/;
var arr = r.exec("_abbba_aba_");

arr; // ["abbba", "bbb"]

arr.index; // 1
arr.input; // "_abbba_aba_"

The index attribute in the above code is equal to 1, because the match is successful starting from the second position of the original string.

If the regular expression is added with the g modifier, the exec() method can be used multiple times, and the position of the next search starts from the position where the previous match successfully ended.

var reg = /a/g;
var str = "abc_abc_abc";

var r1 = reg.exec(str);
r1; // ["a"]
r1.index; // 0
reg.lastIndex; // 1

var r2 = reg.exec(str);
r2; // ["a"]
r2.index; // 4
reg.lastIndex; // 5

var r3 = reg.exec(str);
r3; // ["a"]
r3.index; // 8
reg.lastIndex; // 9

var r4 = reg.exec(str);
r4; // null
reg.lastIndex; // 0

The above code uses the exec() method four times in a row, and the first three times are matched backwards from the position where the last match ended. When the third match is over, the entire string has reached the end, the matching result returns null, and the lastIndex property of the regular instance object is also reset to 0, which means that the fourth match will start from the beginning.

Using the feature that the g modifier allows multiple matches, all matches can be completed in one loop.

var reg = /a/g;
var str = "abc_abc_abc";

while (true) {
  var match = reg.exec(str);
  if (!match) break;
  console.log("#" + match.index + ":" + match[0]);
}
// #0:a
// #4:a
// #8:a

In the above code, as long as the exec() method does not return null, it will loop forever, and output the matched position and matched text each time.

The lastIndex property of a regular instance object is not only readable, but also writable. When the g modifier is set, as long as the value of lastIndex is manually set, the match will start from the specified position.

String instance method

Among the string instance methods, 4 are related to regular expressions.

-String.prototype.match(): returns an array, the members are all matched substrings. -String.prototype.search(): Search according to the given regular expression and return an integer indicating the position where the match starts. -String.prototype.replace(): Replace according to the given regular expression and return the replaced string. -String.prototype.split(): Split the string according to the given rules and return an array containing the split members.

String.prototype.match()

The match method of the string instance object performs regular matching on the string and returns the matching result.

var s = "_x_x";
var r1 = /x/;
var r2 = /y/;

s.match(r1); // ["x"]
s.match(r2); // null

As can be seen from the above code, the match method of a string is very similar to the exec method of a regular object: an array is returned if the match is successful, and null is returned if the match fails.

If the regular expression has the g modifier, this method will behave differently from the exec method of the regular object and will return all successful matching results at one time.

var s = "abba";
var r = /a/g;

s.match(r); // ["a", "a"]
r.exec(s); // ["a"]

Setting the lastIndex property of the regular expression is invalid for the match method. The match always starts from the first character of the string.

var r = /a|b/g;
r.lastIndex = 7;
"xaxb".match(r); // ['a','b']
r.lastIndex; // 0

The above code indicates that setting the lastIndex property of the regular object is invalid.

The search method of the string object returns the position of the first matching result that satisfies the condition in the entire string. If there is no match, -1 is returned.

"_x_x".search(/x/);
// 1

In the above code, the first matching result appears at position 1 in the string.

String.prototype.replace()

The replace method of the string object can replace the matched value. It accepts two parameters, the first is a regular expression, which represents the search pattern, and the second is the content to be replaced.

str.replace(search, replacement);

If the regular expression does not add the g modifier, it replaces the first matched value, otherwise, replaces all matched values.

"aaa".replace("a", "b"); // "baa"
"aaa".replace(/a/, "b"); // "baa"
"aaa".replace(/a/g, "b"); // "bbb"

In the above code, the last regular expression uses the g modifier, causing all a to be replaced.

One application of the replace method is to eliminate spaces at the beginning and end of a string.

var str = "#id div.class";

str.replace(/^\s+|\s+$/g, "");
// "#id div.class"

The second parameter of the replace method can use the dollar sign $ to refer to the content to be replaced.

-$&: The matched substring. -$`: Match the text before the result. -$': Match the text after the result. -$n: The content of the nth group that matches successfully, n is a natural number starting from 1. -$$: Refers to the dollar sign $.

"hello world".replace(/(\w+)\s(\w+)/, "$2 $1");
// "world hello"

"abc".replace("b", "[$`-$&-$']");
// "a[abc]c"

In the above code, the first example is to swap the positions of the matched group, and the second example is to rewrite the matched value.

The second parameter of the replace method can also be a function, which replaces every match with the return value of the function.

"3 and 5".replace(/[0-9]+/g, function (match) {
  return 2 * match;
});
// "6 and 10"

var a = "The quick brown fox jumped over the lazy dog.";
var pattern = /quick|brown|lazy/gi;

a.replace(pattern, function replacer(match) {
  return match.toUpperCase();
});
// The QUICK BROWN fox jumped over the LAZY dog.

As a replacement function for the second parameter of the replace method, it can accept multiple parameters. Among them, the first parameter is the captured content, and the second parameter is the captured group matching (there are as many matching parameters as there are groups). In addition, two parameters can be added at the end. The second to last parameter is the position of the captured content in the entire string (for example, starting from the fifth position), and the last parameter is the original string. The following is an example of web page template replacement.

var prices = {
  p1: "$1.99",
  p2: "$9.99",
  p3: "$5.00",
};

var template =
  '<span id="p1"></span>' + '<span id="p2"></span>' + '<span id="p3"></span>';

template.replace(
  /(<span id=")(.*?)(">)(<\/span>)/g,
  function (match, $1, $2, $3, $4) {
    return $1 + $2 + $3 + prices[$2] + $4;
  }
);
// "<span id="p1">$1.99</span><span id="p2">$9.99</span><span id="p3">$5.00</span>"

In the capture mode of the above code, there are four parentheses, so four group matches will be generated, which are represented by $1 to $4 in the matching function. The role of the matching function is to insert the price into the template.

String.prototype.split()

The split method of the string object splits the string according to regular rules, and returns an array of the divided parts.

str.split(separator, [limit]);

This method accepts two parameters. The first parameter is a regular expression, which represents the separation rule, and the second parameter is the maximum number of members of the returned array.

// Irregular separation
"a, b,c, d".split(",");
// ['a', 'b','c',' d']

// Regular separation, remove extra spaces
"a, b,c, d".split(/, */);
// ['a','b','c','d']

// Specify the largest member of the returned array
"a, b,c, d".split(/, */, 2)[("a", "b")];

The above code uses regular expressions to remove the space after the comma in the substring.

// Example 1
"aaa*a*".split(/a*/);
// ['','*','*']

// Example 2
"aaa**a*".split(/a*/);
// ["", "*", "*", "*"]

The segmentation rule of the above code is 0 or more times of a. Since the regular rule is greedy matching by default, the first separator in Example 1 is aaa, and the second separator is a. The string Divided into three parts, containing the empty string at the beginning. In the second example, the first separator is aaa, the second separator is 0 a (i.e. empty characters), and the third separator is a, so the string is divided into four parts.

If the regular expression has parentheses, the part that matches the parentheses is also returned as an array member.

"aaa*a*".split(/(a*)/);
// ['','aaa','*','a','*']

The regular expression in the code above uses parentheses. The first group match is aaa, and the second group match is a. They are all returned as array members.

Matching rules

The rules of regular expressions are very complicated. The following describes these rules one by one.

Literal characters and metacharacters

Most characters in regular expressions have literal meanings, such as /a/ matches a, /b/ matches b. If a character in a regular expression only represents its literal meaning (like the previous a and b), then they are called "literal characters".

/dog/.test("old dog"); // true

The dog in the regular expression in the code above is a literal character, so /dog/ matches old dog, because it means that the three letters d, o, and g are connected together.

In addition to literal characters, some characters have special meanings and do not represent literal meanings. They are called "metacharacters" and mainly include the following.

(1) Dot character (.)

The dot character (.) matches all characters except carriage return (\r), line feed (\n), line separator (\u2028), and paragraph separator (\u2029). Note that for characters with a code point greater than 0xFFFF, the dot character cannot be matched correctly, and it will be considered as two characters.

/ct/;

In the above code, ct matches any character between c and t, as long as these three characters are on the same line, such as cat, c2t, ct, etc., but not Matches coot.

(2) Position character

The position character is used to indicate the position of the character, and there are mainly two characters.

-^ indicates the beginning of the string -$ indicates the end position of the string

// test must appear at the beginning
/^test/.test('test123') // true

// test must appear at the end
/test$/.test('new test') // true

// There is only test from the start position to the end position
/^test$/.test('test') // true
/^test$/.test('test test') // false

(3) selector (|)

The vertical bar symbol (|) means "or relationship" (OR) in regular expressions, that is, cat|dog means matching cat or dog.

/11|22/.test("911"); // true

In the above code, the regular expression specifies that it must match 11 or 22.

Multiple selectors can be used in combination.

// Match one of fred, barney, betty
/fred|barney|betty/;

The selector will include multiple characters before and after it. For example, /ab|cd/ refers to matching ab or cd instead of matching b or c. If you want to modify this behavior, you can use parentheses.

/a( |\t)b/.test("a\tb"); // true

The above code means that there is a space or a tab between a and b.

Other metacharacters include \, *, +, ?, (), [], {}, etc., which will be explained below.

Escapes

For those metacharacters with special meaning in regular expressions, if you want to match them, you need to add a backslash in front of them. For example, to match +, it must be written as \+.

/1+1/.test('1+1')
// false

/1\+1/.test('1+1')
// true

In the above code, the first regular expression does not match because the plus sign is a metacharacter and does not represent itself. The second regular expression uses a backslash to escape the plus sign, and it matches successfully.

In regular expressions, there are a total of 12 characters that need to be escaped with a backslash: ^, ., [, $, (,), |, * , +, ?, { and \. It is important to note that if you use the RegExp method to generate a regular object, you need to use two slashes for escaping, because the string will be escaped first.

new RegExp("1+1")
  .test("1+1")(
    // false

    new RegExp("1\\+1")
  )
  .test("1+1");
// true

In the above code, RegExp is used as the constructor and the parameter is a string. However, inside the string, the backslash is also an escape character, so it will be escaped by the backslash first, and then by the regular expression, so it needs two backslashes to escape.

Special characters

Regular expressions provide expression methods for some special characters that cannot be printed.

-\cX means Ctrl-[X], where X is any English letter from AZ, used to match control characters. -[\b] matches the backspace key (U+0008), not to be confused with \b. -\n matches the newline key. -\r matches the enter key. -\t matches the tab character tab (U+0009). -\v matches the vertical tab character (U+000B). -\f matches the form feed character (U+000C). -\0 matches null character (U+0000). -\xhh matches a character represented by two hexadecimal digits (\x00-\xFF). -\uhhhh matches a Unicode character represented by a four-digit hexadecimal number (\u0000-\uFFFF).

Character class

Character class (class) means that there is a series of characters to choose from, as long as it matches one of them. All available characters are placed in square brackets. For example, [xyz] means any one of x, y, and z matches.

/[abc]/.test('hello world') // false
/[abc]/.test('apple') // true

In the above code, the string hello world does not contain any of the three letters a, b, and c, so it returns false; the string apple contains the letter a, so Return true.

Two characters have special meanings in character classes.

(1) Caret (^)

If the first character in the square brackets is [^], it means that all characters except the characters in the character class can be matched. For example, [^xyz] means that it can match except for x, y, and z.

/[^abc]/.test('bbc news') // true
/[^abc]/.test('bbc') // false

In the above code, the string bbc news contains characters other than a, b, and c, so it returns true; the string bbc does not contain a, b, c Characters other than , so false is returned.

If there are no other characters in the square brackets, that is, only [^], it means that all characters are matched, including newline characters. In contrast, the period as a metacharacter (.) does not include a newline character.

var s = "Please yes\nmake my day!";

s.match(/yes.*day/); // null
s.match(/yes[^]*day/); // ['yes\nmake my day']

In the above code, the string s contains a newline character, and the period does not include the newline character, so the first regular expression fails to match; the second regular expression [^] contains all characters, so the match succeeds.

Note that the caret only has a special meaning in the first position of the character class, otherwise it has a literal meaning.

(2) Hyphen (-)

In some cases, for a continuous sequence of characters, the hyphen (-) is used to provide a shorthand form to indicate a continuous range of characters. For example, [abc] can be written as [ac], and [0123456789] can be written as [0-9]. Similarly, [AZ] means 26 capital letters.

/az/.test('b') // false
/[az]/.test('b') // true

In the above code, when the dash does not appear in the square brackets, it does not have the function of abbreviation and only represents the literal meaning, so it does not match the character b. Only when the hyphen is used in square brackets, it means a continuous sequence of characters.

The following are all legal abbreviations for character classes.

[0-9.,]
[0-9a-fA-F]
[a-zA-Z0-9-]
[1-31]

The last character class [1-31] in the above code does not represent 1 to 31, but only represents 1 to 3.

Hyphens can also be used to specify the range of Unicode characters.

var str = "\u0130\u0131\u0132";
/[\u0128-\uFFFF]/.test(str);
// true

In the above code, \u0128-\uFFFF means to match all characters with code points between 0128 and FFFF.

In addition, don't use hyphens too much, set a large range, otherwise it is possible to select unexpected characters. The most typical example is [Az]. On the surface, it selects 52 letters from uppercase A to lowercase z, but because of the ASCII code, there is still a gap between uppercase and lowercase letters. With other characters, the result will be unexpected.

/[Az]/.test("\\"); // true

In the above code, because the ASCII code of the backslash ('\') is between uppercase and lowercase letters, the result will be selected.

Predefined Mode

Predefined patterns refer to shorthands for some common patterns.

-\d matches any number between 0-9, which is equivalent to [0-9]. -\D matches all characters other than 0-9, which is equivalent to [^0-9]. -\w matches any letter, number and underscore, equivalent to [A-Za-z0-9_]. -\W Characters except all letters, numbers and underscores are equivalent to [^A-Za-z0-9_]. -\s matches spaces (including line breaks, tabs, spaces, etc.), which is equivalent to [ \t\r\n\v\f]. -\S matches non-space characters, equivalent to [^ \t\r\n\v\f]. -\b matches the boundary of a word. -\B matches a non-word boundary, that is, inside a word.

Here are some examples.

// Examples of \s
/\s\w*/.exec('hello world') // [" world"]

// \b example
/\bworld/.test('hello world') // true
/\bworld/.test('hello-world') // true
/\bworld/.test('helloworld') // false

// \B example
/\Bworld/.test('hello-world') // false
/\Bworld/.test('helloworld') // true

In the above code, \s means space, so the matching result will include spaces. \b indicates the boundary of the word, so the beginning of the word of world must be independent (whether the end of the word is independent is not specified), in order to match. In the same way, \B represents the boundary of non-words, and only the beginning of the word of world is not independent, will it match.

Usually, the regular expression will stop matching when it encounters a newline character (\n).

var html = "<b>Hello</b>\n<i>world!</i>";

/.*/.exec(html)[0];
// "<b>Hello</b>"

In the above code, the string html contains a newline character. As a result, the dot character (.) does not match the newline character, which may not match the original intent. At this time, use the \s character class to include line breaks.

var html = "<b>Hello</b>\n<i>world!</i>";

/[\S\s]*/.exec(html)[0];
// "<b>Hello</b>\n<i>world!</i>"

In the above code, [\S\s] refers to all characters.

Duplicate class

The number of exact matches of the pattern is indicated by braces ({}). {n} means to repeat exactly n times, {n,} means to repeat at least n times, and {n,m} means to repeat no less than n times and no more than m Times.

/lo{2}k/.test('look') // true
/lo{2,5}k/.test('looook') // true

In the above code, the first pattern specifies that o appears twice in a row, and the second pattern specifies that o appears between 2 and 5 times in a row.

Quantifier

The quantifier is used to set the number of occurrences of a pattern.

-? The question mark means that a certain pattern appears 0 or 1 times, which is equivalent to {0, 1}. -* The asterisk indicates that a certain pattern appears 0 or more times, which is equivalent to {0,}. -+ The plus sign means that a certain pattern appears one or more times, which is equivalent to {1,}.

// t occurs 0 or 1 time
/t?est/.test('test') // true
/t?est/.test('est') // true

// t occurs 1 or more times
/t+est/.test('test') // true
/t+est/.test('ttest') // true
/t+est/.test('est') // false

// t occurs 0 or more times
/t*est/.test('test') // true
/t*est/.test('ttest') // true
/t*est/.test('tttest') // true
/t*est/.test('est') // true

Greedy Mode

The three quantifiers in the previous section are all the maximum possible matches by default, that is, they are matched until the next character does not satisfy the matching rules. This is called greedy mode.

var s = "aaa";
s.match(/a+/); // ["aaa"]

In the above code, the pattern is /a+/, which means to match one a or multiple as, then how many as will it match? Because the default is greedy mode, it will match until the character a does not appear, so the matching result is 3 a.

In addition to the greedy mode, there is also a non-greedy mode, which is the smallest possible match. As soon as a match is found, the result is returned, don't go further. If you want to change the greedy mode to non-greedy mode, you can add a question mark after the quantifier.

var s = "aaa";
s.match(/a+?/); // ["a"]

In the above example, a question mark /a+?/ is added to the end of the pattern, and then it is changed to non-greedy mode. Once the condition is met, no further matching will be made. +? means that as long as one a is found, it will No more matching down.

In addition to the plus sign (+?) in non-greedy mode, there are asterisk (*?) in non-greedy mode and question mark (??) in non-greedy mode.

-+?: Indicates that a certain pattern appears one or more times, and the non-greedy pattern is used when matching. -*?: Indicates that a certain pattern appears 0 or more times, and the non-greedy pattern is used when matching. -??: A certain pattern in the table appears 0 or 1 times, and the non-greedy pattern is used when matching.

"abb".match(/ab*/); // ["abb"]
"abb".match(/ab*?/); // ["a"]

"abb".match(/ab?/); // ["ab"]
"abb".match(/ab??/); // ["a"]

In the above example, /ab*/ means that if there are multiple b after a, then as many b as possible are matched; /ab*?/ means to match as few b as possible, That is 0 b.

Modifiers

The modifier represents the additional rules of the pattern and is placed at the end of the regular pattern.

Modifiers can be used singly or multiple together.

// single modifier
var regex = /test/i;

// multiple modifiers
var regex = /test/gi;

(1) g modifier

By default, after the first match is successful, the regular object stops matching downwards. The g modifier represents global matching (global). After adding it, the regular object will match all the results that meet the conditions, mainly for search and replacement.

var regex = /b/;
var str = "abba";

regex.test(str); // true
regex.test(str); // true
regex.test(str); // true

In the above code, the regular pattern does not contain the g modifier, and every time it matches from the beginning of the string. So, after three consecutive matches, true is returned.

var regex = /b/g;
var str = "abba";

regex.test(str); // true
regex.test(str); // true
regex.test(str); // false

In the above code, the regular pattern contains the g modifier, and each time it starts to match backwards from the last successful match. Because the string abba only has two b, the first two matches are true, and the third match is false.

(2) i modifier

By default, regular objects distinguish between uppercase and lowercase letters, and after adding the i modifier, it means ignore case (ignoreCase).

/abc/.test("ABC") / // false
  abc /
  i.test("ABC"); // true

The above code indicates that after the i modifier is added, case is not considered, so the pattern abc matches the string ABC.

(3) m modifier

The m modifier indicates multiline mode, which modifies the behavior of ^ and $. By default (that is, when the m modifier is not added), ^ and $ match the beginning and end of the string. After adding the m modifier, ^ and $ will still Match the beginning and end of the line, that is, ^ and $ will recognize the newline character (\n).

/world$/.test("hello world\n") / // false
  world$ /
  m.test("hello world\n"); // true

In the above code, there is a newline character at the end of the string. If you do not add the m modifier, the match is not successful because the end of the string is not world; after adding it, $ can match the end of the line.

/^b/m.test("a\nb"); // true

The above code requires matching b at the beginning of the line. If the m modifier is not added, it means that b can only be at the beginning of the string. After adding the m modifier, the newline character \n will also be regarded as the beginning of a line.

Group match

(1 Overview

The parentheses of the regular expression indicate group matching, and the pattern in the parentheses can be used to match the content of the group.

/fred+/.test("fredd") / // true
  fred +
  /.test('fredfred') / / true;

In the above code, the first pattern has no parentheses, the result + only means repeating the letter d, and the second pattern has parentheses, the result + means matching the word fred.

The following is another example of packet capture.

var m = "abcabc".match(/(.)b(.)/);
m;
// ['abc','a','c']

In the above code, the regular expression /(.)b(.)/ uses a total of two parentheses, the first parenthesis captures a, and the second parenthesis captures c.

Note that when using group matching, it is not advisable to use the g modifier at the same time, otherwise the match method will not capture the content of the group.

var m = "abcabc".match(/(.)b(.)/g);
m; // ['abc','abc']

The above code uses a regular expression with the g modifier. As a result, the match method only captures the part that matches the entire expression. At this time, the exec method of regular expressions must be used in conjunction with the loop to read each round of matching group captures.

var str = "abcabc";
var reg = /(.)b(.)/g;
while (true) {
  var result = reg.exec(str);
  if (!result) break;
  console.log(result);
}
// ["abc", "a", "c"]
// ["abc", "a", "c"]

Inside the regular expression, you can also use \n to quote the content matched by the parentheses. n is a natural number starting from 1, which represents the parentheses in the corresponding order.

/(.)b(.)\1b\2/.test("abcabc");
// true

In the above code, \1 represents the content matched by the first bracket (ie a), and \2 represents the content matched by the second bracket (ie c).

Here is another example.

/y(..)(.)\2\1/.test("yabccab"); // true

Parentheses can also be nested.

/y((..)\2)\1/.test("yabababab"); // true

In the above code, \1 points to the outer brackets, and \2 points to the inner brackets.

Group matching is very useful. Below is an example of matching webpage tags.

var tagName = /<([^>]+)>[^<]*<\/\1>/;

tagName.exec("<b>bold</b>")[1];
//'b'

In the above code, the parentheses match the tags in the angle brackets, and \1 means the corresponding closed tag.

The above code is slightly modified to capture tags with attributes.

var html = '<b class="hello">Hello</b><i>world</i>';
var tag = /<(\w+)([^>]*)>(.*?)<\/\1>/g;

var match = tag.exec(html);

match[1]; // "b"
match[2]; // "class="hello""
match[3]; // "Hello"

match = tag.exec(html);

match[1]; // "i"
match[2]; // ""
match[3]; // "world"

(2) Non-capturing group

(?:x) is called a non-capturing group, which means that the matched content of the group is not returned, that is, the brackets are not included in the matching result.

The role of non-capturing groups Please consider such a scenario, assuming that you need to match foo or foofoo, the regular expression should be written as /(foo){1, 2}/, but this will occupy a group match. At this time, you can use a non-capturing group, change the regular expression to /(?:foo){1, 2}/, its function is the same as the previous regular, but it will not output the parentheses separately content.

Please see the example below.

var m = "abc".match(/(?:.)b(.)/);
m; // ["abc", "c"]

The pattern in the code above uses a total of two parentheses. The first parenthesis is a non-capturing group, so there is no first parenthesis in the final returned result, only the content that matches the second parenthesis.

Below is the regular expression used to decompose the URL.

// normal match
var url = /(http|ftp):\/\/([^/\r\n]+)(\/[^\r\n]*)?/;

url.exec("http://google.com/");
// ["http://google.com/", "http", "google.com", "/"]

// Non-capturing group matching
var url = /(?:http|ftp):\/\/([^/\r\n]+)(\/[^\r\n]*)?/;

url.exec("http://google.com/");
// ["http://google.com/", "google.com", "/"]

In the above code, the first regular expression is a normal match, and the first bracket returns the network protocol; the latter regular expression is a non-capturing match, and the returned result does not include the network protocol.

(3) Advance assertion

x(?=y) is called positive look-ahead, x only matches before y, y will not be counted in the returned result. For example, to match a number followed by a percent sign, you can write it as /\d+(?=%)/.

In the "pre-assertion", the part in brackets will not be returned.

var m = "abc".match(/b(?=c)/);
m; // ["b"]

The above code uses a look-ahead assertion. b is matched before c, but the c corresponding to the parenthesis will not be returned.

(4) First negative assertion

x(?!y) is called Negative look-ahead, x only matches if it is not before y, y will not be counted in the returned result. For example, to match a number that is not followed by a percent sign, it must be written as /\d+(?!%)/.

/\d+(?!\.)/.exec("3.14");
// ["14"]

In the above code, the regular expression specifies that only the numbers not before the decimal point will be matched, so the returned result is 14.

In the "first negative assertion", the part in parentheses will not be returned.

var m = "abd".match(/b(?!c)/);
m; // ['b']

The above code uses an advance negative assertion, b is not before c so it is matched, and the d corresponding to the parenthesis will not be returned.