Python rewrites a method to extract administrative divisions in a string

Creative background

This dish chicken recently encountered such a column of data during data analysis, as shown in the figure

What I want to do is:

  1. Extract the administrative divisions and names. eg: Jiangsu Province, Beijing. The goal has been achieved, see article.
  2. Like pd.to_ Time can be called after datetime()_ Series.dt.year returns the corresponding year series, like my area_series can return the corresponding province, city, municipal district / county. See this article for details.
  3. Note: in this paper, the municipal districts and counties are regarded as the same level, which is convenient for data extraction. If there is anything wrong, please correct it.

If you think my article is well written, can you give me some praise and comment on it.
It's not impossible to focus on it 🤗

Thinking analysis - writing class methods

Returns a result

First, we have to write a method, which works as follows:

  1. The result displayed is the original string by setting the function__ str__ .
  2. Return the data corresponding to the properties province, city and country by using the decorator @ property.

The code is as follows:

divisions = {
    'province': ['province', 'Autonomous Region'], 
    'city': ['city'], 
    'county': ['area', 'flag', 'county']
class ToCityOne:
    def __init__(self, word):
        self.word = word
        self.result = {}
    def __check_exist(self, area, names):
        for name in names:
            if name in area:
                return area
            return ''
    # Same as the previous article
    def __extract(self):
        # '\ S + province \ S + autonomous region \ S + city \ S + banner \ S + District \ S + County'
        result = re.findall('|'.join(['|'.join(['\S+' + char for char in chars]) for chars in divisions.values()]), self.word)

        count = 0

        for key in divisions:

            if count != len(result):
                test_exist = self.__check_exist(result[count], divisions[key])
                self.result[key] = test_exist

                if test_exist:
                    count += 1

                self.result[key] = ''
    def province(self):
        return self.result['province']
    def city(self):
        return self.result['city']
    def county(self):
        return self.result['county']
    def __str__(self):
        return self.word
    __repr__ = __str__

Of which:

  • __ The extract function is used to parse strings.
  • province, city and country are read-only properties created by the property decorator. See the property tutorial here.
  • __ str__ And__ repr__ Is the result displayed when the print object is directly.


As you can see, the results contain empty strings because the strings do not contain the corresponding data.
The results were good.

Return Series results

The results returned by ToCityOne above are really good, but they are all information about extracting a string, which needs to be matched with the map function.
If you want to extract the whole Series and directly return a new Series by attribute, use the map function and encapsulate it.

The code is as follows:

class ToCity(pd.Series):
    def __init__(self, area_series):
    	# In order to directly return the Series object, make this class inherit from pandas.Series
        super(ToCity, self).__init__(area_series)
        self.series = area_series
        self.result = {}
    def __process(self):
    	# Gets the object after parsing the string
        result =
        for key in divisions:
        	# Get the corresponding result list
            s = x: eval(f'x.{key}'))
            self.result[key] = s
    def province(self):
        return self.result['province']
    def city(self):
        return self.result['city']
    def county(self):
        return self.result['county']


It looks good. It's a success to achieve the goal!


If you want to learn python together, you can confide me into the group.

The above is what I want to share. Because my knowledge is still shallow, there will be deficiencies. Please correct me.
If you have any questions, you can also leave a message in the comment area.

Keywords: Python Data Analysis

Added by Zyxist on Sat, 30 Oct 2021 17:39:58 +0300