Introduction to Python regular
Python has added the re module since version 1.5, which provides a Perl style regular expression pattern.
The re module enables the Python language to have all the regular expression functions.
Before using regular expressions in Python, you need to reference the re module. The syntax is as follows:
import re
It mainly has the following functions
function | function |
---|---|
Match string | match() ; search() ; findall() |
Replace string | |
Split string |
Match string
re.match()
Start position matching
re.match(pattern, string[, flags=0])
Parameter Description:
parameter | explain |
---|---|
pattern | Matching regular expressions |
string | String to match |
flags | Flag bit is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. see below for details. |
If the Match is successful, a Match object is returned; otherwise, None is returned
import re s = 'The children are playing on the green grass.' pattern = r'green.*land' match = re.match(pattern, s, 0) print('output match Object:', match) #Run result output match object: < re Match object; Span = (0,6), match = 'green grass' >
Method of MatchObject
group() returns the matching string
span() returns a tuple containing the matched (start, end) position
start() returns the location where the match started
end() returns the position where the match ends
Note: when using the above methods, you need to ensure that the regular object is returned. If the regular object is empty, an exception will be thrown
import re s = 'The children are playing on the green grass.' pattern = r'green.*land' match = re.match(pattern, s, 0) print('Matching data:', match.group()) print('Match location:', match.span()) print('Starting position:', match.start()) print('End position:', match.end()) # The operation results are as follows: # Matching data: green grassland # Match location: (0, 6) # Start position: 0 # End position: 6
re.search()
Matches the full string and returns the first match
re.search(pattern, string[, flags=0])
If the Match is successful, a Match object is returned; otherwise, None is returned
re.findall()
Match all characters and return a list
re.findall(pattern, string[, flags=0])
Find all substrings matched by the regular expression in the string and return a list. If no matching is found, return an empty list.
Expand re finditer()
Similar to findall, all substrings matched by the regular expression are found in the string and returned as an iterator.
re.finditer(pattern, string[, flags=0])
Replace string
re.sub()
Replace matches in string
re.sub(pattern, repl, string[, count=0][, flags=0])
parameter | explain |
---|---|
pattern | Pattern string in regular. |
repl | The replaced string can also be a function. |
string | The original string to be found and replaced. |
count | The maximum number of times to replace after pattern matching. The default value of 0 means to replace all matches. |
flags | The matching pattern used at compile time, in digital form. |
The repl parameter is set to function
The repl parameter can be a function, for example:
import re def double(matched): value = str(matched.group('string')) return str(value * 2) s = 'A snowflake fell from the sky' print(re.sub('(?P<string>slice)', double, s)) # Operation result: snowflakes fall from the sky
Split string
re.split()
Split the string according to the substring that can be matched and return to the list
re.split(pattern, string[, maxsplit=0, flags=0])
parameter | explain |
---|---|
pattern | Matching regular expressions |
string | String to match. |
maxsplit | Separation times, maxplit = 1, separation once; The default value is 0, and the number of times is not limited. |
flags | Flag bit is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. see below for details. |
Regular expression modifier - optional flag
For controlling the matching mode, multiple flags can be specified through (flags1|flags2). Such as re I | re. M is set to I and M flags:
Modifier | explain |
---|---|
re.I | Make matching pairs case insensitive |
re.L | Do local aware matching |
re.M | Multiline matching, affecting ^ and$ |
re.S | Make Matches all characters, including line breaks |
re.U | Parses characters according to the Unicode character set. This flag affects \ w, \W, \b, \B |
re.X | This flag gives you a more flexible format so that you can write regular expressions easier to understand. |