A concise tutorial on regular expressions

Introduction and examples

Regular expression describes a pattern of string matching, which can be used to extract sub strings of a specific format contained in a large string. Regular expressions are text patterns composed of ordinary characters and special characters.

1. Extract digital part
# Extract the numeric part from the string "abc123def"
var str = "abc123def";
var patt1 = /[0-9]+/;
document.write(str.match(patt1));

# Output result: 123
2. Find adjacent and identical words
# Is is the cost of of gasoline going up up?
# Find out that all two adjacent words in the above string are the same characters (case insensitive)

var str = "Is is the cost of of gasoline going up up";
var patt1 = /\b([a-z]+) \1\b/ig;
document.write(str.match(patt1));

# result
Is is
of of
up up

# explain
 Two\b Indicates a word boundary;
[a-z]+ Represents a word;
([a-z]+) All words in the string will be matched and stored;
 \1 Indicates access to the first word stored above;
3. url identification
var str = "http://www.runoob.com:80/html/html-tutorial.html";
var patt1 = /(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/;
arr = str.match(patt1);
for (var i = 0; i < arr.length ; i++) {
    document.write(arr[i]);
    document.write("<br>");
}
4. Two ways to use regular expressions
<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>
        // (patt1 is equivalent to patt2) two ways to use regular expressions
        var str = "Is is the cost of of gasoline going up up";
        var patt1 = /\b([a-z]+) \1\b/ig;
        document.write("Example 1:", str.match(patt1));

        document.write("<br><br>");
        var patt2 = new RegExp("\\b([a-z]+) \\1\\b", "ig")
        document.write("Example 2:"+str.match(patt2));

    </script>

</body>

</html>
5. Global and non global matching
<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>

        var str = "Google smallpdf.cn taobao smallpdf.cn";
        var n1 = str.match(/smallpdf.cn/);   // Find first match
        var n2 = str.match(/smallpdf.cn/g);  // Find all matches

        document.write("Example 1:", n1);
        document.write("<br><br>");
        document.write("Example 2:", n2);

    </script>

</body>

</html>
6. Match e-mail (mailbox)
<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>
        var str = "abcd test@runoob.com 1234";
        var patt1 = /\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,6}\b/g;
        document.write(str.match(patt1));
    </script>

</body>

</html>
7. Do it yourself
regular grammars
1. Locator

Locators can fix regular expressions at the beginning and end of a line, within a word, at the beginning and end of a word. Locators cannot be used with qualifiers, such as: ^ * this is wrong because a string has only 1 start and there are no 0 or more starts.

regularmeaningcharacter stringregular expression result
^Represents the beginning of a string"An E"/^A/'A'
$Indicates the end of the string"eat"/t$/'t'
\bFront and back boundaries of words"moon"/\bm/'m '(find the word beginning with m)
\BNon boundary part of a word"noonday"/\Boo/'oo' (the word contains oo and is not at the word boundary)
/Terminator of regular expression
\Escape character
2. Ordinary character
regularmeaningcharacter stringregular expression result
\dMatch a number, equivalent to [0-9]"B2 is the suite number."/\d/'2'
\DMatch a non numeric character, equivalent to [^ 0-9]"B2 is the suite number."/\D/'B'
\wMatch one character (number, letter, underscore), equivalent to [A-Za-z0-9#]."apple,"/\w/'a'
\WMatch one character, equivalent to [^ A-Za-z0-9_]."50%."/\W/'%'
\sMatch a blank character (space, tab, page feed, line feed)"foo bar."/\s\w*/' bar'
[\S]Match a non white space character"foo bar."/\S\w*/'foo'
.Match any character except line breaks (\ n, \ r), equivalent to [^ \ n\r]"nay, an apple is on the tree"/.n/'an','on'
[abc]Match any character in a, b and c, * and. In parentheses only represent the character itself and have no other special meaning"asdfiobab"/[abc]/'a','b','a','b'
[^abc]Does not contain all characters of a, b, c
[A-Z]Match any character from A to Z
[a-z]Match any character from a to z
[0-9]Match any number from 0 to 9
3. Qualifier
regularmeaningcharacter stringregular expression Matching results
Match 0 or 1 times
Equivalent to {0,1}.
"angel"/e?le?/'el'
*Matches 0 or more times
Equivalent to {0,}
"

smallpdf.cn

"
/<.*>/'

smallpdf.cn

'
*?Eliminate greed and match as little as possible"

smallpdf.cn

"

/<.*?>/'

’And‘

'
+Matching times ≥ 1, equivalent to {1,}"

smallpdf.cn

"
/<.+>/'

smallpdf.cn

'
+?Eliminate greed and match as little as possible"

smallpdf.cn

"
/<.+?>/'

’And‘

'
{n}N is a positive integer, matching times = n
{n,}N is a positive integer, matching times ≥ n
{n,m}n and m are integers
n ≤ matching times ≤ m
n or m is 0, ignored
4. Logical operation
regularmeaningcharacter stringregular expression Matching results
x|yMatch x or y"red apple"/green|red/'red'
(x)Match x and store the matching value,
\Numbers to access stored values,
\1 is the first stored value.
Look at the following example
\numReturns the num th cache value. Num is an integer starting from 1."apple, orange, cherry, peach."/apple(,)\sorange\1/'apple, orange,'
(?:x)Match x, but no matching characters,
industry|industries
= industr(?:y|ies)
x(?=y)Match x followed by y, no matching value"JackSpa"/Jack(?=Spa)/'Jack'
x(?!y)Matches x and is not followed by y, no matching value is saved"JackSp"/Jack(?!Spa)/'Jack'
(?<=y)xMatches x and is preceded by y, no matching value is saved"JackSpa"/(?<=Jack)Spa/'Spa'
(?<!y)xMatches x and is not preceded by y, no matching value is saved"JacSpa"/(?<!Jack)Spa/'Spa'
5. Non printing character
regularcontain
[\b]Match a backspace (U+0008)
\fMatch a page feed (U+000C)
\nMatch a newline character (U+000A)
\rMatch a carriage return (U+000D)
\tMatch a horizontal tab (U+0009)
\vMatch a vertical tab (U+000B)
\0Match the NULL (U+0000) character, and do not follow it with other decimals, because \ 0 < digits > is an octal escape sequence.
\xhhMatches a character represented by a two digit hexadecimal number (\ x00-\xFF)
\uhhhhMatches the UTF-16 code unit represented by a four digit hexadecimal number
\u{hhhh}Matches Unicode characters represented by a hexadecimal number
6. Mode setting
regularcontain
gRepresents a global search option or tag that will find and return all matching results throughout the string.
iIndicates case insensitive
mMultiline search
sAllow. Match newline
uUse the pattern of unicode code for matching
yPerforms a sticky search, starting with the current position of the target string.
7. Operator priority

Regular expressions are calculated from left to right. Those with higher priority are calculated first, and the same priority is calculated from left to right. In the following table, the priority decreases from top to bottom, and the priority of the same row is the same:

Regular primitive algorithm
\
() []
^ $ \
|
Relevant reference links:

Keywords: node.js npm regex Yarn

Added by rishiraj on Fri, 19 Nov 2021 02:28:50 +0200