Share the ingenious use skills of Python regular expressions

Transferred from: Micro reading  https://www.weidianyuedu.com

Introduction

Regular expression is to find rules from strings and express them through "abstract" symbols. For example, for a number sequence such as 2, 5, 10, 17, 26 and 37, how to calculate the seventh value must first find the law of the sequence, and then use the expression n2+1 to describe its law, so as to get the seventh value of 50. For the strings that need to be matched, the discovery rule is also the first step. This paper mainly uses regular expressions to complete the query matching, replacement matching and segmentation matching of strings.

Common regular symbols

Before entering the string matching, let's first understand the common regular symbols, as shown in the following table:

If readers can master the contents in the above table more skillfully, they believe that they will be able to handle strings with ease. As mentioned earlier, this section will complete the query, replacement and segmentation of strings based on regular expressions. These operations need to be imported into the re module and use the following functions.

String matching query

The findall function in the re module can traverse and match the specified string, obtain all matching substrings in the string, and return a list result. The parameters of this function have the following meanings:

findall(pattern, string, flags=0)pattern: Specifies the regular expression to match. String: Specifies the string to be processed. Flags: Specifies the matching pattern. The commonly used value can be re I,re.M,re.S and re X. re.I's pattern is to make regular expressions case insensitive; re.M's pattern is that regular expressions can match multiple lines; re. The pattern of s indicates the regular symbol Can match any character, including newline \ n; re.X mode allows regular expressions to be written in more detail, such as multi line representation, ignoring white space characters, adding comments, etc.

Matching and substitution of strings

The function of the sub function in the re module is to replace, which is similar to the replace method of string. The function replaces the matching content with replace according to the regular expression. The parameters of this function have the following meanings: sub(pattern, repl, string, count=0, flags=0)pattern: the same as the pattern in findall function. Repl: Specifies the new value to replace with. String: the same as the string in findall function. Count: used to specify the maximum number of replacements. The default value is all replacements. Flags: the same as flags in findall function.

Matching and segmentation of strings

The split function in the re module separates strings according to the specified regular expression, which is similar to the split method of strings. The specific parameters of this function are as follows: split (pattern, string, maxplit = 0, flags = 0) pattern: the same as the pattern in findall function. Maxplit: used to specify the maximum number of divisions. The default is all divisions. String: the same as the string in findall function. Flags: the same as flags in findall function.

Actual combat cases

If you have understood the meaning of the above three function examples, you need to further understand them:

# Import for regular expressions re modular import re# Fetch string string8 All weather conditions in string8 = "{ymd:"2018-01-01",tianqi:"Sunny",aqiInfo:"Light pollution"},{ymd:"2018-01-02",tianqi:"Yin~light rain",aqiInfo:"excellent"},{ymd:"2018-01-03",tianqi:"light rain~moderate rain",aqiInfo:"excellent"},{ymd:"2018-01-04",tianqi:"moderate rain~light rain",aqiInfo:"excellent"}"# Regular expression based usage findall function print(re.findall("tianqi:"(.*?)"", string8))# take out string9 All in O Alphabetic words string9  = "Together, we discovered that a free market only thrives when there are rules to ensure competition and fair play, Our celebration of initiative and enterprise"# Regular expression based usage findall function print(re.findall("\w*o\w*",string9, flags = re.I))# take string10 Punctuation marks, numbers, and letters in are deleted string10 = "It is reported that the four steam condensers shipped this time belong to the international thermonuclear fusion experimental reactor( ITER)The nuclear secondary pressure equipment of the project has successively completed acceptance tests such as pressure test, vacuum test, helium leak detection test, Jack test, lifting lug load test and stacking test."# Regular expression based usage sub function print(re.sub("[,. ,a-zA-Z0-9()]","",string10))# take string11 Separate the contents of each sub part of the string11 = "2 Room 2 hall | 101.62 flat | Low area/7 layer | face south \n Shanghai future - Pudong - Golden poplar - 2005 Built in"# Regular expression based usage split function split = re.split("[-\|\n]", string11)print(split)#Cleaning split of segmentation results_ Strip = [i.strip() for  I  in  split]print(split_strip)out: ["sunny", "overcast ~ light rain", "light rain ~ moderate rain", "moderate rain ~ light rain"] ["Together", "discovered", "only", "to", "competition", "Our", "clearance", "of"] It is reported that the steam condensate tank shipped this time belongs to the nuclear secondary pressure equipment of the international thermonuclear fusion experimental reactor project, and has successively completed acceptance tests such as pressure test, vacuum test, helium leak detection test, Jack test, lifting lug load test and stacking test ["2 rooms and 2 halls", "101.62 flat", "low area / 7 floors", "facing south", "Shanghai future", "Pudong", "Jinyang" "Built in 2005"] ["2 rooms and 2 halls", "101.62 flat", "low area / 7 floors", "facing south", "Shanghai future", "Pudong", "Jinyang", "built in 2005"]

As shown in the above results, in the first example, through the regular expression "tianqi:" (. *?) "" achieve the acquisition of target data. If parentheses are not used, values like "tianqi:" sunny "," tianqi: "Yin ~ light rain" will be generated. Therefore, parentheses are added for grouping and only the contents in the group will be returned;

The second example does not write the regular expression into parentheses. If parentheses are written, the same result will be returned. Therefore, findall is used to return the list value that meets the matching conditions. If there are parentheses, only the matching value in parentheses will be returned;

The third example uses the replacement method to replace all punctuation marks with empty characters, so as to achieve the effect of deletion;

The fourth example is the segmentation of a string. If it is directly divided according to the regular "[,, a-zA-Z0-9 ()]", the returned result contains null characters, such as a null character after "2 rooms and 2 halls". In order to delete the first and last empty characters of each element in the list, the list expression is used, and the compression of empty characters is completed in combination with the strip method of string.

 

Keywords: Python Programming regex

Added by urgido on Tue, 08 Feb 2022 13:54:19 +0200