Regular expression concise tutorial!

Introduction and examples

Regular expression describes a pattern of string matching, which can be used to extract sub strings of a specific format contained in a large string. Regular expressions are text patterns composed of ordinary characters and special characters.

1. Extract digital part

# Extract the numeric part from the string "abc123def"
var str = "abc123def";
var patt1 = /[0-9]+/;
document.write(str.match(patt1));

# Output result: 123

2. Find adjacent and identical words

# Is is the cost of of gasoline going up up?
# Find out that all two adjacent words in the above string are the same characters (case insensitive)

var str = "Is is the cost of of gasoline going up up";
var patt1 = /\b([a-z]+) \1\b/ig;
document.write(str.match(patt1));

# result
Is is
of of
up up

# explain
 Two\b Indicates a word boundary;
[a-z]+ Represents a word;
([a-z]+) All words in the string will be matched and stored;
 \1 Indicates access to the first word stored above;

3. url identification

var str = "http://www.runoob.com:80/html/html-tutorial.html";
var patt1 = /(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/;
arr = str.match(patt1);
for (var i = 0; i < arr.length ; i++) {
    document.write(arr[i]);
    document.write("<br>");
}

4. Two ways to use regular expressions

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>
        // (patt1 is equivalent to patt2) two ways to use regular expressions
        var str = "Is is the cost of of gasoline going up up";
        var patt1 = /\b([a-z]+) \1\b/ig;
        document.write("Example 1:", str.match(patt1));

        document.write("<br><br>");
        var patt2 = new RegExp("\\b([a-z]+) \\1\\b", "ig")
        document.write("Example 2:"+str.match(patt2));

    </script>

</body>

</html>

5. Global and non global matching

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>

        var str = "Google smallpdf.cn taobao smallpdf.cn";
        var n1 = str.match(/smallpdf.cn/);   // Find first match
        var n2 = str.match(/smallpdf.cn/g);  // Find all matches

        document.write("Example 1:", n1);
        document.write("<br><br>");
        document.write("Example 2:", n2);

    </script>

</body>

</html>

6. Match e-mail (mailbox)

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>smallpdf.cn</title>
</head>

<body>

    <script>
        var str = "abcd test@runoob.com 1234";
        var patt1 = /\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,6}\b/g;
        document.write(str.match(patt1));
    </script>

</body>

</html>

7. Do it yourself

Hands on > >

regular grammars

1. Locator

Locators can fix regular expressions at the beginning and end of a line, within a word, at the beginning and end of a word. Locators cannot be used with qualifiers, such as: ^ * this is wrong because a string has only 1 start and there are no 0 or more starts.

regular	meaning	character string	regular expression	result
^	Represents the beginning of a string	"An E"	/^A/	'A'
$	Indicates the end of the string	"eat"	/t$/	't'
\b	Front and back boundaries of words	"moon"	/\bm/	'm '(find the word beginning with m)
\B	Non boundary part of a word	"noonday"	/\Boo/	'oo' (the word contains oo and is not at the word boundary)
/	Terminator of regular expression
\\|Escape character\|\|\|\|

2. Ordinary character

regular	meaning	character string	regular expression	result
\d	Match a number, equivalent to [0-9]	"B2 is the suite number."	/\d/	'2'
\D	Match a non numeric character, equivalent to [^ 0-9]	"B2 is the suite number."	/\D/	'B'
\w	Match one character (number, letter, underscore), equivalent to [A-Za-z0-9#].	"apple,"	/\w/	'a'
\W	Match one character, equivalent to [^ A-Za-z0-9_].	"50%."	/\W/	'%'
\s	Match a blank character (space, tab, page feed, line feed)	"foo bar."	/\s\w*/	' bar'
[\S]	Match a non white space character	"foo bar."	/\S\w*/	'foo'
.	Match any character except line breaks (\ n, \ r), equivalent to [^ \ n\r]	"nay, an apple is on the tree"	/.n/	'an','on'
[abc]	Match any character in a, b and c, * and. In parentheses only represent the character itself and have no other special meaning	"asdfiobab"	/[abc]/	'a','b','a','b'
[^abc]	Does not contain all characters of a, b, c
[A-Z]	Match any character from A to Z
[a-z]	Match any character from a to z
[0-9]	Match any number from 0 to 9

3. Qualifier

regular	meaning	character string	regular expression	Matching results
？	Match 0 or 1 times Equivalent to {0,1}.	"angel"	/e?le?/	'el'
*	Matches 0 or more times Equivalent to {0,}	"<p>smallpdf.cn</p>"	/<.*>/	'<p>smallpdf.cn</p>'
*?	Eliminate greed and match as little as possible	"<p>smallpdf.cn<p>"	/<.*?>/	'< p >' and '< / P >'
+	Matching times ≥ 1, equivalent to {1,}	"<p>smallpdf.cn</p>"	/<.+>/	'<p>smallpdf.cn</p>'
+?	Eliminate greed and match as little as possible	"<p>smallpdf.cn</p>"	/<.+?>/	'< p >' and '< / P >'
{n}	N is a positive integer, matching times = n
{n,}	N is a positive integer, matching times ≥ n
{n,m}	n and m are integers n ≤ matching times ≤ m n or m is 0, ignored

4. Logical operation

regular	meaning	character string	regular expression	Matching results
x\|y	Match x or y	"red apple"	/green\|red/	'red'
(x)	Match x and store the matching value, \Numbers to access stored values, \1 is the first stored value.	Look at the following example
\num	Returns the num th cache value. Num is an integer starting from 1.	"apple, orange, cherry, peach."	/apple(,)\sorange\1/	'apple, orange,'
(?:x)	Match x, but no matching characters, industry\|industries = industr(?:y\|ies)
x(?=y)	Match x followed by y, no matching value	"JackSpa"	/Jack(?=Spa)/	'Jack'
x(?!y)	Matches x and is not followed by y, no matching value is saved	"JackSp"	/Jack(?!Spa)/	'Jack'
(?<=y)x	Matches x and is preceded by y, no matching value is saved	"JackSpa"	/(?<=Jack)Spa/	'Spa'
(?<!y)x	Matches x and is not preceded by y, no matching value is saved	"JacSpa"	/(?<!Jack)Spa/	'Spa'

5. Non printing character

regular	contain
[\b]	Match a backspace (U+0008)
\f	Match a page feed (U+000C)
\n	Match a newline character (U+000A)
\r	Match a carriage return (U+000D)
\t	Match a horizontal tab (U+0009)
\v	Match a vertical tab (U+000B)
\0	Match the NULL (U+0000) character, and do not follow it with other decimals, because \ 0 < digits > is an octal escape sequence.
\xhh	Matches a character represented by a two digit hexadecimal number (\ x00-\xFF)
\uhhhh	Matches the UTF-16 code unit represented by a four digit hexadecimal number
\u{hhhh}	Matches Unicode characters represented by a hexadecimal number

6. Mode setting

regular	contain
g	Represents a global search option or tag that will find and return all matching results throughout the string.
i	Indicates case insensitive
m	Multiline search
s	Allow. Match newline
u	Use the pattern of unicode code for matching
y	Performs a sticky search, starting with the current position of the target string.

7. Operator priority

Regular expressions are calculated from left to right. Those with higher priority are calculated first, and the same priority is calculated from left to right. In the following table, the priority decreases from top to bottom, and the priority of the same row is the same:

|Regular primitive algorithm|

| ---- |

| \ |

| () [] |

| ^ $ \ |

| | |

Relevant reference links:

Regular expression concise tutorial!

Keywords: regex

Added by david4ie on Mon, 29 Nov 2021 02:55:52 +0200

Programming VIP