Special symbols and characters
0x01 match multiple regular expressions|
Pipe symbol, select one of multiple modes, regular mode at|home
>>> str = re.match("at|home","home") >>> str.group() 'home' >>> str = re.match("at|home","at qaq") >>> str.group() 'at'
0x02 matches any single character
A period or period The symbol matches any character except the newline character (the python regular expression has a compilation flag [S or DOTALL], which can override this restriction and make the point number match the newline character). Whether letters, numbers, spaces (excluding \ nline breaks), printable characters, non printable characters, or a symbol, use a dot to match them.
>>> str = re.match(".","a") >>> str.group() 'a'
Summarizes the common line breaks and spaces. After testing, they can match except \ n
- Newline \ v \n \f
>>> print("123\v456") 123 456 >>> print("123\n456") 123 456 >>> print("123\f456") 123 456
- \t tab, four spaces, equivalent to table key
>>> print("123\t456") 123 456
- \r returns the position of the cursor to the beginning of the line
Realize the countdown function on the command line
import time for i in range(10): print("\r There is still room to exit the program%s second" % (9-i), end="") time.sleep(1)
0x03 match from the beginning, end and boundary of characters
- Match start position: caret ^, special character \ A
- Match end position: dollar sign $, special character \ Z
The latter is mainly used for keyboards without caret, such as some international keyboards
Matches a string starting with from
>>> str = re.match("^from.*","from home") >>> str.group() 'from home'
Matches a string ending with end
>>> str = re.match(".*end$","123end") >>> str.group() '123end'
- \b matches the boundary of a character
- \B is not a word boundary
Any string starting with the
\bthe
Match only the word the
\bthe\b
Any string that contains but does not start with the
\Bthe
0x04 limited scope and negation
The two symbols in square brackets are connected by hyphen, which is used to specify the range of a character, such as A-Z, A-Z or 0-9. If the caret follows the left square bracket, it indicates that it does not match any of the given characters
Match the letter z followed by any character, followed by a number
>>> re.match("z.[0-9]","z=3").group() 'z=3'
>>> re.match("[a-b][deh-j][y-z]","ahy").group() 'ahy'
Match non vowel characters
>>> re.match("[^aeiou]*","pygb").group() 'pygb'
0x04 use closure operator to realize existence and frequency matching
- *Matches the expression on the left for zero or more times.
- +A regular expression that appears one or more times.
- ? Matches a regular expression with zero or one occurrence.
- {N} Or {M,N} matches the previous regular expression n times, or matches M-N occurrences.
Match 15 or 16 digits
[0-9]{15,16}
0x05 represents a special character of the character set
- \d matches any decimal number
- \w matches all alphanumeric characters, [A-Za-z0-9_]
- \s matches the space character
- The upper case version above indicates a mismatch. For example, \ D indicates any non decimal number
Matches the format of a US phone, for example 800-555-1212
\d{3}-\d{3}-\d{4}
Match qq mailbox
\d{5,10}@qq.com
0x06 parentheses specify grouping
First and last name
>>> re.match("(Mr?s?\.)?([A-Za-z]*[A-Za-z-]+)","Mr.chen").group(0) 'Mr.chen' >>> re.match("(Mr?s?\.)?([A-Za-z]*[A-Za-z-]+)","Mr.chen").group(1) 'Mr.' >>> re.match("(Mr?s?\.)?([A-Za-z]*[A-Za-z-]+)","Mr.chen").group(2) 'chen'
0x07 extended notation
I didn't understand much
reference resources Learning notes of python core programming (I): regular expression extended representation
(?:\w+\.)* A string ending with a period, such as "google." “twitter.”, “facebook.”, However, these matches will not be saved for subsequent use and data retrieval
(? #comment) there is no match here, just as a comment
(? =. com) if a string is followed by ". com", the matching operation is performed, and no target string is used
(?!. net) if a string is not followed by ". net", the matching operation is performed
(? < = 800 -) if the string is matched with "800 -" before, it is assumed to be a telephone number. Similarly, no input string is used
(?<!192\.168\.) If a string is not preceded by "192.168." Before matching, it is used to filter out a group of class C IP addresses
(? (1) y|x) if a matching group 1 exists, it matches y, otherwise it matches X
In summary, there are four assertions:
Forward matching (? =...)## End with a string
Forward (?...)## Do not end with a string
Forward and backward matching (< =...)## Start with a string
Negative backward line matching (<!...)## Do not start with a string
The so-called look ahead and look behind actually mean looking forward and backward