1, Regular expression
1.1 understanding of regular expressions and basic methods of regular use in python
1.1.1 understanding of regular expressions
The pattern of a string consisting of two characters: normal text character and special character (metacharacter),
. Metacharacters are of special significance in regular expressions, which make regular expressions more expressive. For example,
In the regular expression r"a.d", the characters "a" and "d" are common characters, ". Is metacharacter, and. Can refer to any character,
It can match "a1d", "a2d", "acd", etc.
1.1.2 basic usage of regular expressions in Python
In Python, use the re module to process regular expressions, as follows:
import re #String 1 regx_string='aab' #String 2 regx_string2='anb' #Generate a matching regular expression object pattern=re.compile('a.b') #Match string 1 m1=pattern.match(regx_string) print(m1) # <_sre.SRE_Match object; span=(0, 3), match='aab'> #Match string 2 m2=pattern.match(regx_string2) print(m2) # <_sre.SRE_Match object; span=(0, 3), match='anb'> #String 3 regx_string3='and' m3=pattern.match(regx_string3) print(m3) # None
Note: when a string matches a regular expression, the match method returns a Match object. If it does not match, it returns None
Another way to use regularity in python:
# match(pattern, string, flags=0) m4=re.match('a.b',regx_string) print(m4) # <_sre.SRE_Match object; span=(0, 3), match='aab'>
1.2 metacharacters in Python regular expressions
1.2.1: "."
". matches any character except line breaks: letters, numbers, symbols, white space characters
#Example: #Match a string with ab switch print(re.match('^ab.*','abccd$#2')) # <_sre.SRE_Match object; span=(0, 8), match='abccd$#2'> #Match a string with an ac switch print(re.match('^ac.*','abccd$#2')) # None
1.2.3: "$"
"$" matches the end of the string or before the newline at the end of the string
#Example: #Match an ac terminated string: print(re.match('.*ac$','adsfasdfac')) # <_sre.SRE_Match object; span=(0, 10), match='adsfasdfac'> #Look at the end print(re.match('.*ac$','adsfacdfac\n')) # <_sre.SRE_Match object; span=(0, 10), match='adsfacdfac'> print(re.match('.*ac$','adsfac\ndfac')) # None
Note: $is the end of a line. If there are multiple lines, there is no way to match. But it can be specified to match multiple lines
print(re.match('.*ac$','adsfac\ndfac',re.M)) # " sre.SRE_ Match object; span = (0, 6), match ='adsfac '> which is equivalent to directly matching' adsfac 'without specifying multiple lines
1.2.4 "*"
"*" repeatedly matches zero or more times (greedy mode)
Greedy pattern: match as many repetitions as possible.
print(re.match('.*ac$','adsfac\ndfac',re.M)) # " sre.SRE_ Match object; span = (0, 6), match ='adsfac '> which is equivalent to directly matching' adsfac 'without specifying multiple lines #An example is as follows: #Match a string starting with a and ending with b print(re.match('a.*b','aaaadbdgdbddf546b')) #<_sre.SRE_Match object; span=(0, 17), match='aaaadbdgdbddf546b'> #Limiting greed print(re.match('a.*?b','aaaadbdgdbddf546b')) #<_sre.SRE_Match object; span=(0, 6), match='aaaadb'>
1.2.5 "+"
"+" repeated matching once or more (greedy mode)
Example: use*and+Match one to contain'ab'String of #Match string: aaabcdbfdd print(re.match('^.*(ab)*.*$','aaabcdbfdd')) # <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'> print(re.match('^.*(ab)+.*$','aaabcdbfdd')) # <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'> #Match string: aaadcdfcc print(re.match('^.*(ab)*.*$','aaadcdfcc')) # <_sre.SRE_Match object; span=(0, 9), match='aaadcdfcc'> print(re.match('^.*(ab)+.*$','aaadcdfbb')) # None
1.2.6 "?"
"?" repeatedly matches 0 or 1 times (greedy mode)
#Example: match the regular expression '^. * ab?. * $' with aaabcdbfdd and aaadcdfbb respectively print(re.match('^.*(ab)?.*$','aaabcdbfdd')) # <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'> print(re.match('^.*(ab)?.*$','aaadcdfbb')) # <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'>
1.2.6 "?"
"?" repeatedly matches 0 or 1 times (greedy mode)
#Example: aaabcdbfdd and aaadcdfbb are used to match the regular expression '^. * ab?. * $'respectively print(re.match('^.*(ab)?.*$','aaabcdbfdd')) # <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'> print(re.match('^.*(ab)?.*$','aaadcdfbb')) # <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'>
1.2.7 "*?,+?,??"
*Non greedy mode of?, +?,? *, +
# Example: regular expression: '(ab)*','(ab)*?','(ab)+','(ab)+?','(ab)?','(ab)??', matching string: ababababababab #Note: the group method of the Match object returns the entire matching string when the parameter is zero print(re.match('(ab)*','ababababababababab').group()) # ababababababababab print(re.match('(ab)*?','ababababababababab').group()) # '' 0 print(re.match('(ab)+','ababababababababab').group()) # ababababababababab print(re.match('(ab)+?','ababababababababab').group()) # ab print(re.match('(ab)?','ababababababababab').group()) # ab print(re.match('(ab)??','ababababababababab').group()) # '' 0
1.2.8 "{m}"
"{m}" repeated matching m times
1.2.9 "{m,n}"
{m,n} repeated matching m times or N times (greedy pattern)
1.2.10 "{m,n}?"
The non greedy model of {m,n}? {m,n}
#Example: regular expression: '(ab){1,3}','(ab){2,5}','(ab){1,3}?','(ab){2,5}?', matching string: ababababababab print(re.match('(ab){1,3}','ababababababababab').group()) # ababab print(re.match('(ab){1,3}?','ababababababababab').group()) # ab print(re.match('(ab){2,5}','ababababababababab').group()) # ababababab print(re.match('(ab){2,5}?','ababababababababab').group()) # abab
1.2.11 "\"
"\" escape special characters or represent special sequences
#Example: Escape special characters -- > match. +*\ print(re.match('.+?*','.+?*').group()) # sre_constants.error: multiple repeat at position 3
Note: this error indicates that we have used multiple metacharacters to represent the repeating function in the expression, instead of the string "+? * \" we originally thought
print(re.match('\.\+\?\*','.+?*').group()) # .+?* # For special sequence: d for all numbers, w for all alphanumeric print(re.match('\d*','25*29').group()) # 25 print(re.match('\w+','1134afdads').group()) # 1134afdads
1.2.13 "[]"
[] represents a set of characters. If "^" is the first character, it represents a complementary set of characters
# Example: matching a string containing 1-5 print(re.match('[12345]+','1235425422119877').group()) # 123542542211 print(re.match('[1-5]+','1235425422119877').group()) # 123542542211 #Match all characters except abc print(re.match('[^abc]+','155acdefafdf').group()) # 155
1.2.13 "|"
"|" A| "B, select branch, or match A or match B
# Example: matching a string of [a-z] lowercase letters or [2-9] Numbers print(re.match('[a-z]+|[2-9]+','abcdefga').group()) # abcdefga print(re.match('[a-z]+|[2-9]+','32456546545').group()) # 32456546545 print(re.match('[a-z]+|[2-9]+','adfasf32456546545').group()) # adfasf print(re.match('[a-z]+|[2-9]+','2356safdsfa').group()) # 2356 print(re.match('[a-z]+|[2-9]+','12356safdsfa')) # None
1.3 grouping in Python regular
1.3.1: (...)
(… )Match a group and treat the content in brackets as a whole
# Example: print(re.match('ab*','abbb').group()) # abbb print(re.match('(ab)*','abbb').group()) # ab
1.3.2: (?aiLmsux)
(? aiLmsux) set the A, I, L, M, S, U, or X flag for the expression. The specific meaning of the flag will be explained in detail later
#Example: the I flag indicates that case insensitive matches are as follows print(re.match('[A-Z]+','acdadsfadf')) # None print(re.match('[A-Z]+(?i)','acdadsfadf')) # <_sre.SRE_Match object; span=(0, 10), match='acdadsfadf'> #It can also be set in this way print(re.match('[A-Z]+','acdadsfadf',re.I)) # <_sre.SRE_Match object; span=(0, 10), match='acdadsfadf'>
1.3.3: (?:...)
(?:… )Group matching of non group pattern
#Example: print(re.match('(?:\w+) (?:\w+)','Eric Brown').group()) # Eric Brown # print(re.match('(?:\w+) (?:\w+)','Eric Brown').group(1)) # IndexError: no such group print(re.match('(\w+) (\w+)','Eric Brown').group()) # Eric Brown print(re.match('(\w+) (\w+)','Eric Brown').group(1)) # Eric print(re.match('(\w+) (\w+)','Eric Brown').group(2)) # Brown
Note: in ungrouped mode, string matching will only return the whole matching result, while the matching of each group in the expression will not be saved
In the grouping mode, not only the matching results of the whole grouping are saved, but also the matching results of each group in the expression are saved separately. The grouping of Match objects will be described in detail later
1.3.4: (?P...)
(?P… )Group matching substring can be accessed by a name
# Example: print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group()) # Eric Brown print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group('first_name')) # Eric print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group('last_name')) # Brown
1.3.5: (?P=name)
(? P=name) refers to the reverse of the specified group. The text matched by the previous group with the name as the name is used as the grouping content, and the subsequent content is matched
#Example: pattern=re.compile('(?P<number>[1-9]){5}@(?P<letters>[a-z])+\.(?P=letters)+') m=pattern.match('12345@qq.qq') print(m.group()) # 12345@qq.qq print(m.group(1)) # 5 print(m.group(2)) # q
1.3.6: (?=...)
(?=… )When the expression matches successfully, its previous expression will match
#Example: print(re.match('\w+@(?=\d+)','abcds@123456').group()) # abcds@
Note: the expression above indicates that: the number must be followed by @ if not, the string does not meet the requirements; if yes, the matching returned content is the previous string and@
1.3.6: (?!..)
(?!..) when the expression does not match, all previous expressions will match successfully
#Example: print(re.match('\w+@(?!\d+)','abcds@dfa').group()) # abcds@
1.3.7: (?<=...)
(?<=… )Match to The string at the end of the beginning can only be a fixed length, that is, an explicit expression
#Example: print(re.match('(?<=abc)def', 'abcdef')) #None print(re.search('(?<=abc)def', 'abcdef')) #<_sre.SRE_Match object; span=(3, 6), match='def'>
Description: the pattern cannot be at the beginning of a string
print(re.match('(\w+)(?<=zhang)san', 'myzhangsan').group()) # myzhangsan
1.3.8: (?<!..)
(? <!..) matching is not based on String at the end of the beginning. Can only be fixed length
print(re.match('(\w+)(?<!zhang)san', 'mylisan').group())
1.3.9: (?(id/name)yes|no)
(? (id/name)yes|no) if the previous regular match represented by id/name, Then the regular expression at yes is used to match the following string, otherwise the match at no is used,
#Example: #The following example checks whether the brackets on both sides of a string are matched. If there are no brackets or pairs, the matching succeeds. Otherwise, the matching fails. pattern=re.compile('(?P<lefg_bracket>\()?\w+(?(lefg_bracket)\)|$)') #Brackets around m=pattern.match('(ab123456)') print(m.group()) # (ab123456) #No parentheses around m=pattern.match('cdefghj') print(m.group()) # cdefghj #Brackets on one side m=pattern.match('(abdcd') print(m) # None