Fundamentals of python grammar • python procedural programming and functional programming • python skills • a wonderful use of python decorator

Well, I know it's midnight... But I still think it's worth spending half an hour to share this latest idea and get straight to the point

Let's simulate a scenario. You need to grab a page, and then there are many URLs on this page, and after entering these sub URLs, there is data to grab. Simply put, let's look at the three layers, and our code is as follows:

def func_top(url):
    data_dict= {}
 
    #Get sub url on page
    sub_urls = xxxx
 
    data_list = []
    for it in sub_urls:
        data_list.append(func_sub(it))
 
    data_dict['data'] = data_list
 
    return data_dict
 
def func_sub(url):
    data_dict= {}
 
    #Get sub url on page
    bottom_urls = xxxx
 
    data_list = []
    for it in bottom_urls:
        data_list.append(func_bottom(it))
 
    data_dict['data'] = data_list
 
    return data_dict
 
def func_bottom(url):
    #get data
    data = xxxx
    return data

func_top is the processing function of the upper page, func_sub is the processing function of the sub page, func_bottom is the processing function of the deepest page, func_top will traverse and call func after getting the sub page url_ sub,func_ So is sub.

Under normal circumstances, this really meets the demand, but the website you want to capture may be extremely unstable and often can't be linked, resulting in the lack of data.

So you have two choices at this time:

Stop when you encounter an error, and then run again from the broken position
Continue when you encounter an error, but run it again later. At this time, you don't want to pull the existing data from the website again, but only the data that you haven't got
The first scheme is basically impossible to implement, because if the URLs of other websites are adjusted in order, your recorded location will be invalid. Then there is only the second solution. To put it bluntly, it is to cache the obtained data and directly get it from the cache when necessary. Finally, if your time is not very tight and you want to improve quickly, the most important thing is not afraid of hardship. It is recommended that you can price @ 762459510. That's really good. Many people make rapid progress and need you to be not afraid of hardship! You can add it and have a look~

OK, the goal already exists. How can we achieve it?

If it is in C + +, this is a very troublesome thing, and the code written must be very ugly. However, fortunately, we use python, which has decorators for functions.

Therefore, there are implementation schemes:

Define a decorator. If you get data before, you can directly get the data of the cache; If it is not retrieved before, it is pulled from the website and stored in the cache

The code is as follows:

import os
import hashlib
 
def deco_args_recent_cache(category='dumps'):
    '''
    Decorator, return to the latest cache Data
    '''
    def deco_recent_cache(func):
        def func_wrapper(*args, **kargs):
            sig = _mk_cache_sig(*args, **kargs)
            data = _get_recent_cache(category, func.__name__, sig)
            if data is not None:
                return data
 
            data = func(*args, **kargs)
            if data is not None:
                _set_recent_cache(category, func.__name__, sig, data)
            return data
 
        return func_wrapper
 
    return deco_recent_cache
 
def _mk_cache_sig(*args, **kargs):
    '''
    Generate unique identification by passing in parameters
    '''
    src_data = repr(args) + repr(kargs)
    m = hashlib.md5(src_data)
    sig = m.hexdigest()
    return sig
 
def _get_recent_cache(category, func_name, sig):
    full_file_path = '%s/%s/%s' % (category, func_name, sig)
    if os.path.isfile(full_file_path):
        return eval(file(full_file_path,'r').read())
    else:
        return None
 
def _set_recent_cache(category, func_name, sig, data):
    full_dir_path = '%s/%s' % (category, func_name)
    if not os.path.isdir(full_dir_path):
        os.makedirs(full_dir_path)
 
    full_file_path = '%s/%s/%s' % (category, func_name, sig)
    f = file(full_file_path, 'w+')
    f.write(repr(data))
    f.close()

Then, we just need to be in each func_top,func_sub,func_bottom plus Deco_ args_ recent_ Just cache this decorator~~

Done! The biggest advantage of this is that each layer of top, sub and bottom will dump data, so for example, after the data of a sub layer is dumped, it will not go to the corresponding bottom layer at all, reducing a lot of overhead!

OK, that's it ~ life is short, I use python!

Keywords: Python Go Programming Pycharm Programmer

Added by Aaron_Escobar on Fri, 08 Oct 2021 12:23:57 +0300