Metacharacters and grouping of python regular expressions

1, Regular expression
1.1 understanding of regular expressions and basic methods of regular use in python
1.1.1 understanding of regular expressions

The pattern of a string consisting of two characters: normal text character and special character (metacharacter),
. Metacharacters are of special significance in regular expressions, which make regular expressions more expressive. For example,
In the regular expression r"a.d", the characters "a" and "d" are common characters, ". Is metacharacter, and. Can refer to any character,
It can match "a1d", "a2d", "acd", etc.

1.1.2 basic usage of regular expressions in Python

In Python, use the re module to process regular expressions, as follows:

import re
 
#String 1
regx_string='aab'
 
#String 2
regx_string2='anb'
 
#Generate a matching regular expression object
pattern=re.compile('a.b')
 
#Match string 1
m1=pattern.match(regx_string)
 
print(m1)
# <_sre.SRE_Match object; span=(0, 3), match='aab'>
 
#Match string 2
m2=pattern.match(regx_string2)
 
print(m2)
# <_sre.SRE_Match object; span=(0, 3), match='anb'>
 
#String 3
regx_string3='and'
 
m3=pattern.match(regx_string3)
 
print(m3)
# None

Note: when a string matches a regular expression, the match method returns a Match object. If it does not match, it returns None

Another way to use regularity in python:

# match(pattern, string, flags=0)
m4=re.match('a.b',regx_string)
 
print(m4)
# <_sre.SRE_Match object; span=(0, 3), match='aab'>

1.2 metacharacters in Python regular expressions

1.2.1: "."

". matches any character except line breaks: letters, numbers, symbols, white space characters

#Example:
#Match a string with ab switch
print(re.match('^ab.*','abccd$#2'))
# <_sre.SRE_Match object; span=(0, 8), match='abccd$#2'>
 
#Match a string with an ac switch
print(re.match('^ac.*','abccd$#2'))
# None

1.2.3: "$"

"$" matches the end of the string or before the newline at the end of the string

#Example:
#Match an ac terminated string:
print(re.match('.*ac$','adsfasdfac'))
# <_sre.SRE_Match object; span=(0, 10), match='adsfasdfac'>
 
#Look at the end
print(re.match('.*ac$','adsfacdfac\n'))
# <_sre.SRE_Match object; span=(0, 10), match='adsfacdfac'>
 
print(re.match('.*ac$','adsfac\ndfac'))
# None

Note: $is the end of a line. If there are multiple lines, there is no way to match. But it can be specified to match multiple lines

print(re.match('.*ac$','adsfac\ndfac',re.M))
# " sre.SRE_ Match object; span = (0, 6), match ='adsfac '> which is equivalent to directly matching' adsfac 'without specifying multiple lines

1.2.4 "*"

"*" repeatedly matches zero or more times (greedy mode)

Greedy pattern: match as many repetitions as possible.

print(re.match('.*ac$','adsfac\ndfac',re.M))
# " sre.SRE_ Match object; span = (0, 6), match ='adsfac '> which is equivalent to directly matching' adsfac 'without specifying multiple lines
#An example is as follows:
#Match a string starting with a and ending with b
print(re.match('a.*b','aaaadbdgdbddf546b'))
#<_sre.SRE_Match object; span=(0, 17), match='aaaadbdgdbddf546b'>
 
#Limiting greed
print(re.match('a.*?b','aaaadbdgdbddf546b'))
#<_sre.SRE_Match object; span=(0, 6), match='aaaadb'>

1.2.5 "+"

"+" repeated matching once or more (greedy mode)

Example: use*and+Match one to contain'ab'String of
#Match string: aaabcdbfdd
 
print(re.match('^.*(ab)*.*$','aaabcdbfdd'))
# <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'>
 
print(re.match('^.*(ab)+.*$','aaabcdbfdd'))
# <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'>
 
#Match string: aaadcdfcc
 
print(re.match('^.*(ab)*.*$','aaadcdfcc'))
# <_sre.SRE_Match object; span=(0, 9), match='aaadcdfcc'>
 
print(re.match('^.*(ab)+.*$','aaadcdfbb'))
# None

1.2.6 "?"

"?" repeatedly matches 0 or 1 times (greedy mode)

#Example: match the regular expression '^. * ab?. * $' with aaabcdbfdd and aaadcdfbb respectively
 
print(re.match('^.*(ab)?.*$','aaabcdbfdd'))
# <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'>
 
print(re.match('^.*(ab)?.*$','aaadcdfbb'))
# <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'>

1.2.6 "?"

"?" repeatedly matches 0 or 1 times (greedy mode)

#Example: aaabcdbfdd and aaadcdfbb are used to match the regular expression '^. * ab?. * $'respectively
 
print(re.match('^.*(ab)?.*$','aaabcdbfdd'))
# <_sre.SRE_Match object; span=(0, 10), match='aaabcdbfdd'>
 
print(re.match('^.*(ab)?.*$','aaadcdfbb'))
# <_sre.SRE_Match object; span=(0, 9), match='aaadcdfbb'>

1.2.7 "*?,+?,??"

*Non greedy mode of?, +?,? *, +

# Example: regular expression: '(ab)*','(ab)*?','(ab)+','(ab)+?','(ab)?','(ab)??', matching string: ababababababab
#Note: the group method of the Match object returns the entire matching string when the parameter is zero
print(re.match('(ab)*','ababababababababab').group())
# ababababababababab
 
print(re.match('(ab)*?','ababababababababab').group())
# '' 0
 
print(re.match('(ab)+','ababababababababab').group())
# ababababababababab
 
print(re.match('(ab)+?','ababababababababab').group())
# ab
 
print(re.match('(ab)?','ababababababababab').group())
# ab
 
print(re.match('(ab)??','ababababababababab').group())
# '' 0

1.2.8 "{m}"

"{m}" repeated matching m times
1.2.9 "{m,n}"

{m,n} repeated matching m times or N times (greedy pattern)
1.2.10 "{m,n}？"

The non greedy model of {m,n}? {m,n}

#Example: regular expression: '(ab){1,3}','(ab){2,5}','(ab){1,3}?','(ab){2,5}?', matching string: ababababababab
print(re.match('(ab){1,3}','ababababababababab').group())
# ababab
 
print(re.match('(ab){1,3}?','ababababababababab').group())
# ab
 
print(re.match('(ab){2,5}','ababababababababab').group())
# ababababab
 
print(re.match('(ab){2,5}?','ababababababababab').group())
# abab

1.2.11 "\"

"\" escape special characters or represent special sequences

#Example: Escape special characters -- > match. +*\
 
print(re.match('.+?*','.+?*').group())
# sre_constants.error: multiple repeat at position 3

Note: this error indicates that we have used multiple metacharacters to represent the repeating function in the expression, instead of the string "+? * \" we originally thought

print(re.match('\.\+\?\*','.+?*').group())
# .+?*
 
# For special sequence: d for all numbers, w for all alphanumeric
 
print(re.match('\d*','25*29').group())
# 25
 
print(re.match('\w+','1134afdads').group())
# 1134afdads

1.2.13 "[]"

[] represents a set of characters. If "^" is the first character, it represents a complementary set of characters

# Example: matching a string containing 1-5
print(re.match('[12345]+','1235425422119877').group())
# 123542542211
 
print(re.match('[1-5]+','1235425422119877').group())
# 123542542211
 
#Match all characters except abc
print(re.match('[^abc]+','155acdefafdf').group())
# 155

1.2.13 "|"

"|" A| "B, select branch, or match A or match B

# Example: matching a string of [a-z] lowercase letters or [2-9] Numbers
 
print(re.match('[a-z]+|[2-9]+','abcdefga').group())
# abcdefga
 
print(re.match('[a-z]+|[2-9]+','32456546545').group())
# 32456546545
 
print(re.match('[a-z]+|[2-9]+','adfasf32456546545').group())
# adfasf
 
print(re.match('[a-z]+|[2-9]+','2356safdsfa').group())
# 2356
 
print(re.match('[a-z]+|[2-9]+','12356safdsfa'))
# None

1.3 grouping in Python regular
1.3.1: (...)

(… )Match a group and treat the content in brackets as a whole

# Example:
print(re.match('ab*','abbb').group())
# abbb
 
print(re.match('(ab)*','abbb').group())
# ab

1.3.2: (?aiLmsux)

(? aiLmsux) set the A, I, L, M, S, U, or X flag for the expression. The specific meaning of the flag will be explained in detail later

#Example: the I flag indicates that case insensitive matches are as follows
 
print(re.match('[A-Z]+','acdadsfadf'))
# None
 
print(re.match('[A-Z]+(?i)','acdadsfadf'))
# <_sre.SRE_Match object; span=(0, 10), match='acdadsfadf'>
 
#It can also be set in this way
print(re.match('[A-Z]+','acdadsfadf',re.I))
# <_sre.SRE_Match object; span=(0, 10), match='acdadsfadf'>

1.3.3: (?:...)

(?:… )Group matching of non group pattern

#Example:
print(re.match('(?:\w+) (?:\w+)','Eric Brown').group())
# Eric Brown
 
# print(re.match('(?:\w+) (?:\w+)','Eric Brown').group(1))
# IndexError: no such group
 
print(re.match('(\w+) (\w+)','Eric Brown').group())
# Eric Brown
 
print(re.match('(\w+) (\w+)','Eric Brown').group(1))
# Eric 
 
print(re.match('(\w+) (\w+)','Eric Brown').group(2))
# Brown

Note: in ungrouped mode, string matching will only return the whole matching result, while the matching of each group in the expression will not be saved

In the grouping mode, not only the matching results of the whole grouping are saved, but also the matching results of each group in the expression are saved separately. The grouping of Match objects will be described in detail later

1.3.4: (?P...)

(?P… )Group matching substring can be accessed by a name

# Example:
print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group())
# Eric Brown
 
print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group('first_name'))
# Eric
 
print(re.match('(?P<first_name>\w+) (?P<last_name>\w+)','Eric Brown').group('last_name'))
# Brown

1.3.5: (?P=name)

(? P=name) refers to the reverse of the specified group. The text matched by the previous group with the name as the name is used as the grouping content, and the subsequent content is matched

#Example:
pattern=re.compile('(?P<number>[1-9]){5}@(?P<letters>[a-z])+\.(?P=letters)+')
 
m=pattern.match('12345@qq.qq')
 
print(m.group())
# 12345@qq.qq
 
print(m.group(1))
# 5
 
print(m.group(2))
# q

1.3.6: (?=...)

(?=… )When the expression matches successfully, its previous expression will match

#Example:
 
print(re.match('\w+@(?=\d+)','abcds@123456').group())
# abcds@

Note: the expression above indicates that: the number must be followed by @ if not, the string does not meet the requirements; if yes, the matching returned content is the previous string and@
1.3.6: (?!..)
(?!..) when the expression does not match, all previous expressions will match successfully

#Example:
print(re.match('\w+@(?!\d+)','abcds@dfa').group())
# abcds@

1.3.7: (?<=...)

(?<=… )Match to The string at the end of the beginning can only be a fixed length, that is, an explicit expression

#Example:
print(re.match('(?<=abc)def', 'abcdef'))
#None
 
print(re.search('(?<=abc)def', 'abcdef'))
#<_sre.SRE_Match object; span=(3, 6), match='def'>

Description: the pattern cannot be at the beginning of a string

print(re.match('(\w+)(?<=zhang)san', 'myzhangsan').group())
# myzhangsan

1.3.8: (?<!..)

(? <!..) matching is not based on String at the end of the beginning. Can only be fixed length

print(re.match('(\w+)(?<!zhang)san', 'mylisan').group())

1.3.9: (?(id/name)yes|no)

(? (id/name)yes|no) if the previous regular match represented by id/name,
Then the regular expression at yes is used to match the following string, otherwise the match at no is used,

#Example:
#The following example checks whether the brackets on both sides of a string are matched. If there are no brackets or pairs, the matching succeeds. Otherwise, the matching fails.
pattern=re.compile('(?P<lefg_bracket>\()?\w+(?(lefg_bracket)\)|$)')
 
#Brackets around
m=pattern.match('(ab123456)')
 
print(m.group())
# (ab123456)
 
#No parentheses around
m=pattern.match('cdefghj')
 
print(m.group())
# cdefghj
 
#Brackets on one side
m=pattern.match('(abdcd')
 
print(m)
# None

Keywords: Python

Added by orbitalnets on Sat, 27 Jun 2020 07:15:29 +0300

Programming VIP

Metacharacters and grouping of python regular expressions

Popular Keywords