Content
match
Match string begins, Match object is returned successfully, None is returned unsuccessfully, only one is matched.
search
Searching in a string (not limited to the beginning), returning the Match object successfully, and returning None failed, matching only one.
findall
Find all groups in the string that match successfully, that is, the parts enclosed in parentheses.Returns a list object, each list item being a list of all matching groups.
1. match
Re.match() always matches from the beginning of the string and returns the match object of the matched string.So when I use the re.match() function to match a string that is not at the beginning of the string, I return NONE.
e.g.1
As an example, only'string3'can print out the result p, the others are output'NONE'.
import re string1='I love python but hate pig' string2='I love python' string3='python' string4='123' result = re.match(r'[p]', string1) print(result)
import re # Compile regular expressions into Pattern s pattern = re.compile(r'hello') # Use Pattern Match Text to get a match result, and return None if no match occurs match = pattern.match('hello world, hello word') if match: # Use Match to get grouping information print (match.group()) hello
Intuitively, re.match() has a limited purpose.Match string start, match only one.
It can sometimes be useful depending on your needs.Following are a number of extensions that introduce a variety of matching patterns.
1.1 Matching characters between a and z
string3='python' string4='123' result = re.match(r'[a-z]', string3) print(result) # p
1.2 Matching characters between A and Z
string3='Python' string4='123' result = re.match(r'[A-Z]', string3) print(result) # P
1.3 Match characters between 0 and 9
ma = re.match(r'[0-9]',string4) print (ma.group())
1.4 A-z, A-Z, and 0-9 can be combined
string3='python' string4='123' result = re.match(r'[a-zA-Z0-9]', string3) print(result)
\w and \W are identical, matching word characters [a-zA-Z0-9] and non-word characters, respectively.
1.5 Matching numbers/non-numbers
string4 = '[];;:' ma1 = re.match(r'\D',string4)#Match non-numeric ma2 = re.match(r'\d',string2)#Match Number print (ma1.group()) # [ # print (ma2.group()) # raise error
1.6 Match whitespace and non-whitespace characters
\s and \S match whitespace and non-whitespace characters, respectively, as above.
1.7 Match 0 to infinite times: * (asterisk)
ma = re.match(r'[a-z][a-z]*',string1)
1.8 Match 1 to infinity: + (plus sign)
1.9 Match strings that occur m to n times: {m,n}
ma = re.match(r'[\w]{1,4}',string1) Any letter or number occurs 1 to 4 times
2. search
Searching in a string (not limited to the beginning), returning the Match object successfully, and returning None failed, matching only one.
The wildcards are the same as before.
For more wildcards, refer to the following blog:
https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
e.g.
import re string5 = 'alibetjgi676$gjgk@126.com' ma6 = re.search(r'[\d]+',string5) #Match Number print(ma6) print(ma6.group()) output: <_sre.SRE_Match object; span=(9, 12), match='676'> 676
import re string1='I love python but hate pig' string2='I love python' string3='python' string4='123' result = re.search(r'[\w]+', string3) print(result) # python result2 = re.search(r'[\w]+', string2) print(result2) # I
Notice that when string2 is entered, the result of the search is'I'.\w is not recognized because there is a space (not a character) between'I'and'love'.
To put it bluntly, when you encounter a character that the wildcard cannot recognize, search ends because only one result is returned.
string4='123 45' result = re.search(r'[\d]+', string4) print(result) # 123
Of course, you can force this to match all the characters.(
str2 = 'char|johljh' ma6 = re.search(r'char[\W][\w]+',str2) print(ma6.group()) char|johljh
str = 'oajfs|char|dhddfgdfg' str2 = 'char|johljh|jjgkhk' str3 = 'dlkngldnfk|flmgkdm|char' ma6 = re.search(r'char[\W][\w]+',str) #This pattern can be matched to str str2 print(ma6.group()) ma6 = re.search(r'[\W]char',str3) # This pattern can be matched to str3 print(ma6.group())
3. findall
Find all groups in the string that match successfully, that is, the parts enclosed in parentheses.Returns a list object, each list item being a list of all matching groups.
import re string1='I love python but hate pig' string2='I love python' string3='python' string4='123 45' result0 = re.findall(r'[p]+', string1) result1 = re.findall(r'[p][a-z]+', string1) result2 = re.findall(r'[\w]+', string2) result4 = re.findall(r'[\d]+', string4) print(result0) print(result1) print(result2) print(result4)
output:
['p', 'p'] ['python', 'pig'] ---> highly recommend result1 = re.findall(r'[p][a-z]+', string1) ['I', 'love', 'python'] ['123', '45']
string2 = '1,2,3,4' ma = re.findall(r'\d+',string2) print (ma) #['1', '2', '3', '4']
import re p = re.compile(r'\d+') print (p.findall('one1two2three3four456')) ### output ### # ['1', '2', '3', '456']
References:
https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
https://blog.csdn.net/ali197294332/article/details/50894419