Creative background
This dish chicken recently encountered such a column of data during data analysis, as shown in the figure
What I want to do is:
- Extract the administrative divisions and names. eg: Jiangsu Province, Beijing. The goal has been achieved, see article.
- Like pd.to_ Time can be called after datetime()_ Series.dt.year returns the corresponding year series, like my area_series can return the corresponding province, city, municipal district / county. See this article for details.
- Note: in this paper, the municipal districts and counties are regarded as the same level, which is convenient for data extraction. If there is anything wrong, please correct it.
If you think my article is well written, can you give me some praise and comment on it.
It's not impossible to focus on it 🤗
Thinking analysis - writing class methods
Returns a result
First, we have to write a method, which works as follows:
- The result displayed is the original string by setting the function__ str__ .
- Return the data corresponding to the properties province, city and country by using the decorator @ property.
The code is as follows:
divisions = { 'province': ['province', 'Autonomous Region'], 'city': ['city'], 'county': ['area', 'flag', 'county'] } class ToCityOne: def __init__(self, word): self.word = word self.result = {} self.__extract() def __check_exist(self, area, names): for name in names: if name in area: return area else: return '' # Same as the previous article def __extract(self): # '\ S + province \ S + autonomous region \ S + city \ S + banner \ S + District \ S + County' result = re.findall('|'.join(['|'.join(['\S+' + char for char in chars]) for chars in divisions.values()]), self.word) count = 0 for key in divisions: if count != len(result): test_exist = self.__check_exist(result[count], divisions[key]) self.result[key] = test_exist if test_exist: count += 1 else: self.result[key] = '' @property def province(self): return self.result['province'] @property def city(self): return self.result['city'] @property def county(self): return self.result['county'] def __str__(self): return self.word __repr__ = __str__
Of which:
- __ The extract function is used to parse strings.
- province, city and country are read-only properties created by the property decorator. See the property tutorial here.
- __ str__ And__ repr__ Is the result displayed when the print object is directly.
Test:
As you can see, the results contain empty strings because the strings do not contain the corresponding data.
The results were good.
Return Series results
The results returned by ToCityOne above are really good, but they are all information about extracting a string, which needs to be matched with the map function.
If you want to extract the whole Series and directly return a new Series by attribute, use the map function and encapsulate it.
The code is as follows:
class ToCity(pd.Series): def __init__(self, area_series): # In order to directly return the Series object, make this class inherit from pandas.Series super(ToCity, self).__init__(area_series) self.series = area_series self.result = {} self.__process() def __process(self): # Gets the object after parsing the string result = self.series.map(ToCityOne) for key in divisions: # Get the corresponding result list s = result.map(lambda x: eval(f'x.{key}')) self.result[key] = s @property def province(self): return self.result['province'] @property def city(self): return self.result['city'] @property def county(self): return self.result['county']
Test:
It looks good. It's a success to achieve the goal!
ending
If you want to learn python together, you can confide me into the group.
The above is what I want to share. Because my knowledge is still shallow, there will be deficiencies. Please correct me.
If you have any questions, you can also leave a message in the comment area.