Explanation of regular expression usage (basics, atomic characters, pattern modifications, atomic tables, atomic groups, duplicate matching)

Fundamentals

Selector

| This symbolic band table selects the interpreter, that is, there is a match on the left and right sides of |.

let tel = "010-12345678";
//Error Result: Only match either | left or right
console.log(tel.match(/010|020\-\d{7,8}/)); 

//Correct result: so it needs to be used in the atomic group
console.log(tel.match(/(010|020)\-\d{7,8}/));

Character Escape

\This symbolic band table selects the interpreter, that is, to escape the right special character such as /$^, etc.

const url = "https://www.baidu.com";
console.log(/https:\/\//.test(url)); //true

Boundary characterExplain
^Beginning of matching string
$Matches the end of the string, ignoring line breaks

Atomic Characters

All atomic characters:

MetacharacterExplainExample
\dMatch any number[0-9]
\DMatches any character except a number[^0-9]
\wMatch any letter, number or underscore in English[a-zA-Z_]
\WMatch any character except letters, numbers or underscores[^a-zA-Z_]
\sAny white space character match, such as space, tab\t, line break\n[\n\f\r\t\v]
\SAny character match except whitespace[^\n\f\r\t\v]
.Match any characters except line breaks

Match all characters:

You can use [\s\S] or [\d\D] to match all characters

let yq = `
  <span>
    @#&^&*!
    123
    asda
  </span>
`;
let res = yq .match(/<span>[\s\S]+<\/span>/);
console.log(res[0]);

Pattern Modification

ModifierExplain
iCase insensitive matching
gGlobal search for all matches
mTreat as multiline
sTreat as single line ignoring line breaks, use. Can match all characters
yFrom regexp.lastIndex Start Matching
uCorrect handling of four-character UTF-16 encoding

lastIndex

RegExp object lastIndex property can return or set the position where the regular expression begins to match

  • Must be used with g modifier
  • Valid for exec method
  • When the match is complete, lastIndex is reset to zero
let yq = `1234561`;
let reg = /1/g;
reg.lastIndex = 1; //Search from index 1
console.log(reg.exec(yq));
console.log(reg.lastIndex);

Atomic Table

Matching a metacharacter in a set of characters, done in a regular expression through a metacharacter table, is put into [] (square brackets), and some characters in the atomic table do not need to be escaped, for example. Is the decimal point.

Atomic TableExplain
[]Match only one of the atoms
[^]Match only any atom except one of the characters
[0-9]Match any number of 0-9
[a-z]Match any lowercase a-z letter
[A-Z]Match any letter of capital A-Z

You can use [\s\S] or [\d\D] to match all characters, including line breaks

const reg = /[\s\S]+/g;

Letters and numbers must be in ascending order or errors will be reported

const num = "2";
console.log(/[3-0]/.test(num)); //SyntaxError

const yq = "asdasdasd";
console.log(/[f-a]/.test(yq)); //SyntaxError

Atomic Groups

Basic Use

Match only to the first when no g pattern modifier is added. The information matched contains the following data

variableExplain
0Complete Matched Content
1,2....Matched Atomic Groups
indexPosition in original string
inputOriginal string
groupsNamed Groups

Using atomic group matching in a match returns each group's data to the result

  • 0 for matched completion
  • 1/2 is atomic
  • Starting position of index match
  • input raw data
  • groups Group Alias
let hd = "baidu.com";
console.log(hd.match(/bai(du)\.(com)/)); 
//["baidu.com", "du", "com", index: 0, input: "baidu.com", groups: undefined]

Reference to sublease and sublease aliases

\n Reference atomic groups when matching, do not want to be returned to the result if you only want the group to participate in matching (?: Processing (?: Matching Characters)). If you want the group data returned to be clearer, you can number the atomic group, and the result will be saved in the returned group field, with the group alias used?<> Formal definition, replace label with p label below.

let yq = `
  <h1>yq</h1>
  <span>20</span>
  <h2>180</h2>
`;
let reg = /<(?<tag>h[1-6])>(?<con>[\s\S]*)<\/\1>/gi;
console.log(yq.replace(reg, `<p>$<con></p>`));

Duplicate Matching

If you want to match something repeatedly, we use the Repeat Match modifier, including the following.

SymbolExplain
*Repeat zero or more times
+Repeat one or more times
?Repeat zero or once
{n}Repeat n times
{n,}Repeat n or more times
{n,m}Repeat n to m times

When a regular expression is matched repeatedly, the default is greedy matching, which means that it will try to match as much as possible, but sometimes we don't want it to match as much as possible. Can we do that? Modify to prevent duplicate matching.

UseExplain
*?Repeat any number of times, but as few as possible
+?Repeat once or more, but as little as possible
??Repeat 0 or 1 times, but repeat as little as possible
{n,m}?Repeat n to m times, but as little as possible
{n,}?Repeat more than n times, but as little as possible
let str = "aaa";
console.log(str.match(/a+/)); //aaa
console.log(str.match(/a+?/)); //a
console.log(str.match(/a{2,3}?/)); //aa
console.log(str.match(/a{2,}?/)); //aa

Keywords: Javascript Front-end regex

Added by sintax63 on Sun, 06 Feb 2022 19:12:30 +0200