Introduction and examples
Regular expression describes a pattern of string matching, which can be used to extract sub strings of a specific format contained in a large string. Regular expressions are text patterns composed of ordinary characters and special characters.
1. Extract digital part
# Extract the numeric part from the string "abc123def" var str = "abc123def"; var patt1 = /[0-9]+/; document.write(str.match(patt1)); # Output result: 123
2. Find adjacent and identical words
# Is is the cost of of gasoline going up up? # Find out that all two adjacent words in the above string are the same characters (case insensitive) var str = "Is is the cost of of gasoline going up up"; var patt1 = /\b([a-z]+) \1\b/ig; document.write(str.match(patt1)); # result Is is of of up up # explain Two\b Indicates a word boundary; [a-z]+ Represents a word; ([a-z]+) All words in the string will be matched and stored; \1 Indicates access to the first word stored above;
3. url identification
var str = "http://www.runoob.com:80/html/html-tutorial.html"; var patt1 = /(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/; arr = str.match(patt1); for (var i = 0; i < arr.length ; i++) { document.write(arr[i]); document.write("<br>"); }
4. Two ways to use regular expressions
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>smallpdf.cn</title> </head> <body> <script> // (patt1 is equivalent to patt2) two ways to use regular expressions var str = "Is is the cost of of gasoline going up up"; var patt1 = /\b([a-z]+) \1\b/ig; document.write("Example 1:", str.match(patt1)); document.write("<br><br>"); var patt2 = new RegExp("\\b([a-z]+) \\1\\b", "ig") document.write("Example 2:"+str.match(patt2)); </script> </body> </html>
5. Global and non global matching
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>smallpdf.cn</title> </head> <body> <script> var str = "Google smallpdf.cn taobao smallpdf.cn"; var n1 = str.match(/smallpdf.cn/); // Find first match var n2 = str.match(/smallpdf.cn/g); // Find all matches document.write("Example 1:", n1); document.write("<br><br>"); document.write("Example 2:", n2); </script> </body> </html>
6. Match e-mail (mailbox)
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>smallpdf.cn</title> </head> <body> <script> var str = "abcd test@runoob.com 1234"; var patt1 = /\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,6}\b/g; document.write(str.match(patt1)); </script> </body> </html>
7. Do it yourself
regular grammars
1. Locator
Locators can fix regular expressions at the beginning and end of a line, within a word, at the beginning and end of a word. Locators cannot be used with qualifiers, such as: ^ * this is wrong because a string has only 1 start and there are no 0 or more starts.
regular | meaning | character string | regular expression | result |
---|---|---|---|---|
^ | Represents the beginning of a string | "An E" | /^A/ | 'A' |
$ | Indicates the end of the string | "eat" | /t$/ | 't' |
\b | Front and back boundaries of words | "moon" | /\bm/ | 'm '(find the word beginning with m) |
\B | Non boundary part of a word | "noonday" | /\Boo/ | 'oo' (the word contains oo and is not at the word boundary) |
/ | Terminator of regular expression | |||
\|Escape character|||| |
2. Ordinary character
regular | meaning | character string | regular expression | result |
---|---|---|---|---|
\d | Match a number, equivalent to [0-9] | "B2 is the suite number." | /\d/ | '2' |
\D | Match a non numeric character, equivalent to [^ 0-9] | "B2 is the suite number." | /\D/ | 'B' |
\w | Match one character (number, letter, underscore), equivalent to [A-Za-z0-9#]. | "apple," | /\w/ | 'a' |
\W | Match one character, equivalent to [^ A-Za-z0-9_]. | "50%." | /\W/ | '%' |
\s | Match a blank character (space, tab, page feed, line feed) | "foo bar." | /\s\w*/ | ' bar' |
[\S] | Match a non white space character | "foo bar." | /\S\w*/ | 'foo' |
. | Match any character except line breaks (\ n, \ r), equivalent to [^ \ n\r] | "nay, an apple is on the tree" | /.n/ | 'an','on' |
[abc] | Match any character in a, b and c, * and. In parentheses only represent the character itself and have no other special meaning | "asdfiobab" | /[abc]/ | 'a','b','a','b' |
[^abc] | Does not contain all characters of a, b, c | |||
[A-Z] | Match any character from A to Z | |||
[a-z] | Match any character from a to z | |||
[0-9] | Match any number from 0 to 9 |
3. Qualifier
regular | meaning | character string | regular expression | Matching results |
---|---|---|---|---|
? | Match 0 or 1 times Equivalent to {0,1}. | "angel" | /e?le?/ | 'el' |
* | Matches 0 or more times Equivalent to {0,} | "<p>smallpdf.cn</p>" | /<.*>/ | '<p>smallpdf.cn</p>' |
*? | Eliminate greed and match as little as possible | "<p>smallpdf.cn<p>" | /<.*?>/ | '< p >' and '< / P >' |
+ | Matching times ≥ 1, equivalent to {1,} | "<p>smallpdf.cn</p>" | /<.+>/ | '<p>smallpdf.cn</p>' |
+? | Eliminate greed and match as little as possible | "<p>smallpdf.cn</p>" | /<.+?>/ | '< p >' and '< / P >' |
{n} | N is a positive integer, matching times = n | |||
{n,} | N is a positive integer, matching times ≥ n | |||
{n,m} | n and m are integers n ≤ matching times ≤ m n or m is 0, ignored |
4. Logical operation
regular | meaning | character string | regular expression | Matching results |
---|---|---|---|---|
x|y | Match x or y | "red apple" | /green|red/ | 'red' |
(x) | Match x and store the matching value, \Numbers to access stored values, \1 is the first stored value. | Look at the following example | ||
\num | Returns the num th cache value. Num is an integer starting from 1. | "apple, orange, cherry, peach." | /apple(,)\sorange\1/ | 'apple, orange,' |
(?:x) | Match x, but no matching characters, industry|industries = industr(?:y|ies) | |||
x(?=y) | Match x followed by y, no matching value | "JackSpa" | /Jack(?=Spa)/ | 'Jack' |
x(?!y) | Matches x and is not followed by y, no matching value is saved | "JackSp" | /Jack(?!Spa)/ | 'Jack' |
(?<=y)x | Matches x and is preceded by y, no matching value is saved | "JackSpa" | /(?<=Jack)Spa/ | 'Spa' |
(?<!y)x | Matches x and is not preceded by y, no matching value is saved | "JacSpa" | /(?<!Jack)Spa/ | 'Spa' |
5. Non printing character
regular | contain |
---|---|
[\b] | Match a backspace (U+0008) |
\f | Match a page feed (U+000C) |
\n | Match a newline character (U+000A) |
\r | Match a carriage return (U+000D) |
\t | Match a horizontal tab (U+0009) |
\v | Match a vertical tab (U+000B) |
\0 | Match the NULL (U+0000) character, and do not follow it with other decimals, because \ 0 < digits > is an octal escape sequence. |
\xhh | Matches a character represented by a two digit hexadecimal number (\ x00-\xFF) |
\uhhhh | Matches the UTF-16 code unit represented by a four digit hexadecimal number |
\u{hhhh} | Matches Unicode characters represented by a hexadecimal number |
6. Mode setting
regular | contain |
---|---|
g | Represents a global search option or tag that will find and return all matching results throughout the string. |
i | Indicates case insensitive |
m | Multiline search |
s | Allow. Match newline |
u | Use the pattern of unicode code for matching |
y | Performs a sticky search, starting with the current position of the target string. |
7. Operator priority
Regular expressions are calculated from left to right. Those with higher priority are calculated first, and the same priority is calculated from left to right. In the following table, the priority decreases from top to bottom, and the priority of the same row is the same:
|Regular primitive algorithm|
| ---- |
| \ |
| () [] |
| ^ $ \ |
| | |