Advanced Python language

Important knowledge points

  • Usage of generating formula (derivation)

    prices = {
        'AAPL': 191.88,
        'GOOG': 1186.96,
        'IBM': 149.24,
        'ORCL': 48.44,
        'ACN': 166.89,
        'FB': 208.09,
        'SYMC': 21.29
    }
    # Construct a new dictionary with stocks with a stock price greater than 100 yuan
    prices2 = {key: value for key, value in prices.items() if value > 100}
    print(prices2)
    

    Description: generating formula (derivation) can be used to generate lists, sets and dictionaries.

  • Pit of nested list

    names = ['Guan Yu', 'Fei Zhang', 'Zhao Yun', 'ma chao', 'Huang Zhong']
    courses = ['language', 'mathematics', 'English']
    # Enter the scores of five students in three courses
    # error - reference resources http://pythontutor.com/visualize.html#mode=edit
    # scores = [[None] * len(courses)] * len(names)
    scores = [[None] * len(courses) for _ in range(len(names))]
    for row, name in enumerate(names):
        for col, course in enumerate(courses):
            scores[row][col] = float(input(f'Please enter{name}of{course}achievement: '))
            print(scores)
    

    Python Tutor - VISUALIZE CODE AND GET LIVE HELP

  • heapq module (heap sort)

    """
    Find the largest or smallest from the list N Elements
     Heap structure(Big root pile/Small root pile)
    """
    import heapq
    
    list1 = [34, 25, 12, 99, 87, 63, 58, 78, 88, 92]
    list2 = [
        {'name': 'IBM', 'shares': 100, 'price': 91.1},
        {'name': 'AAPL', 'shares': 50, 'price': 543.22},
        {'name': 'FB', 'shares': 200, 'price': 21.09},
        {'name': 'HPQ', 'shares': 35, 'price': 31.75},
        {'name': 'YHOO', 'shares': 45, 'price': 16.35},
        {'name': 'ACME', 'shares': 75, 'price': 115.65}
    ]
    print(heapq.nlargest(3, list1))
    print(heapq.nsmallest(3, list1))
    print(heapq.nlargest(2, list2, key=lambda x: x['price']))
    print(heapq.nlargest(2, list2, key=lambda x: x['shares']))
    
  • itertools module

    """
    Iteration tool module
    """
    import itertools
    
    # Generate full arrangement of ABCD
    itertools.permutations('ABCD')
    # Three out of five combinations that produce ABCDE
    itertools.combinations('ABCDE', 3)
    # Produces the Cartesian product of ABCD and 123
    itertools.product('ABCD', '123')
    # Infinite cyclic sequence of generating ABC
    itertools.cycle(('A', 'B', 'C'))
    
  • collections module

    Common tools:

    • namedtuple: Command tuple, which is a class factory that accepts the name of the type and the list of attributes to create a class.
    • Deque: double ended queue, which is an alternative implementation of list. The bottom layer of the list in Python is based on arrays, while the bottom layer of deque is a two-way linked list. Therefore, when you need to add and delete elements at the beginning and end, deque will show better performance, and the asymptotic time complexity is O ( 1 ) O(1) O(1).
    • Counter: subclass of dict. The key is the element, the value is the count of the element, and its most_ The common () method can help us get the most frequent elements. I think the inheritance relationship between counter and dict is debatable. According to the principle of car, the relationship between counter and dict should be designed as an association relationship, which is more reasonable.
    • OrderedDict: subclass of dict, which records the insertion order of key value pairs. It seems that it has the behavior of both dictionary and linked list.
    • defaultdict: similar to the dictionary type, but the default value corresponding to the key can be obtained through the default factory function. This method is more efficient than the setdefault() method in the dictionary.
    """
    Find the most frequent element in the sequence
    """
    from collections import Counter
    
    words = [
        'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
        'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around',
        'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes',
        'look', 'into', 'my', 'eyes', "you're", 'under'
    ]
    counter = Counter(words)
    print(counter.most_common(3))
    

Data structure and algorithm

  • Algorithm: methods and steps to solve problems

  • Evaluate the quality of the algorithm: asymptotic time complexity and asymptotic space complexity.

  • Large O-markers for asymptotic time complexity:

    • -Constant time complexity - bloom filter / hash storage
    • -Logarithmic time complexity - half search (binary search)
    • -Linear time complexity - sequential lookup / count sort
    • -Log linear time complexity - Advanced sorting algorithm (merge sorting, quick sorting)
    • -Square time complexity - simple sorting algorithm (selection sorting, insertion sorting, bubble sorting)
    • -Cubic time complexity - Floyd algorithm / matrix multiplication
    • -Geometric series time complexity - Tower of Hanoi
    • -Factorial time complexity - Travel dealer problem - NPC

    [the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-UbQQ6Uy6-1623374776792)(./res/algorithm_complexity_1.png)]

    [the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-qSqVT0zN-1623374776796)(./res/algorithm_complexity_2.png)]

  • Sorting algorithm (selection, bubbling and merging) and search algorithm (order and half)

    def select_sort(items, comp=lambda x, y: x < y):
        """Simple selection sort"""
        items = items[:]
        for i in range(len(items) - 1):
            min_index = i
            for j in range(i + 1, len(items)):
                if comp(items[j], items[min_index]):
                    min_index = j
            items[i], items[min_index] = items[min_index], items[i]
        return items
    
    def bubble_sort(items, comp=lambda x, y: x > y):
        """Bubble sorting"""
        items = items[:]
        for i in range(len(items) - 1):
            swapped = False
            for j in range(len(items) - 1 - i):
                if comp(items[j], items[j + 1]):
                    items[j], items[j + 1] = items[j + 1], items[j]
                    swapped = True
            if not swapped:
                break
        return items
    
    def bubble_sort(items, comp=lambda x, y: x > y):
        """Stirring sequencing(Bubble sort upgrade)"""
        items = items[:]
        for i in range(len(items) - 1):
            swapped = False
            for j in range(len(items) - 1 - i):
                if comp(items[j], items[j + 1]):
                    items[j], items[j + 1] = items[j + 1], items[j]
                    swapped = True
            if swapped:
                swapped = False
                for j in range(len(items) - 2 - i, i, -1):
                    if comp(items[j - 1], items[j]):
                        items[j], items[j - 1] = items[j - 1], items[j]
                        swapped = True
            if not swapped:
                break
        return items
    
    def merge(items1, items2, comp=lambda x, y: x < y):
        """merge(Merge two ordered lists into one ordered list)"""
        items = []
        index1, index2 = 0, 0
        while index1 < len(items1) and index2 < len(items2):
            if comp(items1[index1], items2[index2]):
                items.append(items1[index1])
                index1 += 1
            else:
                items.append(items2[index2])
                index2 += 1
        items += items1[index1:]
        items += items2[index2:]
        return items
    
    
    def merge_sort(items, comp=lambda x, y: x < y):
        return _merge_sort(list(items), comp)
    
    
    def _merge_sort(items, comp):
        """Merge sort"""
        if len(items) < 2:
            return items
        mid = len(items) // 2
        left = _merge_sort(items[:mid], comp)
        right = _merge_sort(items[mid:], comp)
        return merge(left, right, comp)
    
    def seq_search(items, key):
        """Sequential search"""
        for index, item in enumerate(items):
            if item == key:
                return index
        return -1
    
    def bin_search(items, key):
        """Half search"""
        start, end = 0, len(items) - 1
        while start <= end:
            mid = (start + end) // 2
            if key > items[mid]:
                start = mid + 1
            elif key < items[mid]:
                end = mid - 1
            else:
                return mid
        return -1
    
  • Common algorithms:

    • Exhaustive method - also known as brute force cracking method, verifies all possibilities until the correct answer is found.
    • Greedy method - when solving a problem, always make a decision in the current view
    • The best choice is to find a satisfactory solution quickly without pursuing the optimal solution.
    • Divide and Conquer - divide a complex problem into two or more identical or similar subproblems, then divide the subproblem into smaller subproblems until it can be solved directly, and finally merge the solutions of the subproblems to obtain the solution of the original problem.
    • Backtracking method - backtracking method, also known as heuristic method, searches forward according to the optimization conditions. When it finds that the original selection is not excellent or fails to achieve the goal, it will go back to one step and choose again.
    • Dynamic programming - the basic idea is to decompose the problem to be solved into several sub problems, solve and save the solutions of these sub problems first, so as to avoid a large number of repeated operations.

    Examples of exhaustive method: 100 yuan, 100 chickens and five people share fish.

    # A rooster is 5 yuan, a hen is 3 yuan, a chick is 1 yuan and three
    # Buy 100 chickens for 100 yuan and ask how many cocks / hens / chicks each
    for x in range(20):
        for y in range(33):
            z = 100 - x - y
            if 5 * x + 3 * y + z // 3 == 100 and z % 3 == 0:
                print(x, y, z)
    
    # A. B, C, D and E were fishing together one night, and finally they were tired and went to bed
    # The next day A woke up first. He divided the fish into five parts, threw away the extra one and took one of his own
    # B the second one wakes up and divides the fish into five parts. Throw away the extra one and take one of his own
    # Then C, D and E wake up in turn and divide the fish in the same way. Ask them how many fish they have caught at least
    fish = 6
    while True:
        total = fish
        enough = True
        for _ in range(5):
            if (total - 1) % 5 == 0:
                total = (total - 1) // 5 * 4
            else:
                enough = False
                break
        if enough:
            print(fish)
            break
        fish += 5
    

    Example of greedy method: suppose the thief has a backpack that can hold up to 20 kilograms of stolen goods. He breaks into a house and finds the items shown in the table below. Obviously, he can't put all his items in his backpack, so he must determine what items to take away and what items to leave behind.

    namePrice (USD)Weight (kg)
    computer20020
    radio204
    clock17510
    vase502
    book101
    Oil Painting909
    """
    Greedy method: when solving problems, we always make the best choice at present, do not pursue the optimal solution, and quickly find a satisfactory solution.
    Input:
    20 6
     Computer 200 20
     Radio 20 4
     Clock 175 10
     Vase 50 2
     Book 10 1
     Oil painting 90 9
    """
    class Thing(object):
        """goods"""
    
        def __init__(self, name, price, weight):
            self.name = name
            self.price = price
            self.weight = weight
    
        @property
        def value(self):
            """Price weight ratio"""
            return self.price / self.weight
    
    
    def input_thing():
        """Enter item information"""
        name_str, price_str, weight_str = input().split()
        return name_str, int(price_str), int(weight_str)
    
    
    def main():
        """Main function"""
        max_weight, num_of_things = map(int, input().split())
        all_things = []
        for _ in range(num_of_things):
            all_things.append(Thing(*input_thing()))
        all_things.sort(key=lambda x: x.value, reverse=True)
        total_weight = 0
        total_price = 0
        for thing in all_things:
            if total_weight + thing.weight <= max_weight:
                print(f'The thief took it{thing.name}')
                total_weight += thing.weight
                total_price += thing.price
        print(f'Total value: {total_price}dollar')
    
    
    if __name__ == '__main__':
        main()
    

    Examples of divide and Conquer: Quick sort.

    """
    Quick sort - Select the pivot to divide the elements. The left is smaller than the pivot and the right is larger than the pivot
    """
    def quick_sort(items, comp=lambda x, y: x <= y):
        items = list(items)[:]
        _quick_sort(items, 0, len(items) - 1, comp)
        return items
    
    
    def _quick_sort(items, start, end, comp):
        if start < end:
            pos = _partition(items, start, end, comp)
            _quick_sort(items, start, pos - 1, comp)
            _quick_sort(items, pos + 1, end, comp)
    
    
    def _partition(items, start, end, comp):
        pivot = items[end]
        i = start - 1
        for j in range(start, end):
            if comp(items[j], pivot):
                i += 1
                items[i], items[j] = items[j], items[i]
        items[i + 1], items[end] = items[end], items[i + 1]
        return i + 1
    

    Example of backtracking method: Knight patrol.

    """
    Recursive backtracking method: it is called heuristic method. It searches forward according to the optimization conditions. When it is found that the original selection is not excellent or cannot reach the goal, it will go back and choose again. The more classic problems include Knight patrol, eight queens and maze pathfinding.
    """
    import sys
    import time
    
    SIZE = 5
    total = 0
    
    
    def print_board(board):
        for row in board:
            for col in row:
                print(str(col).center(4), end='')
            print()
    
    
    def patrol(board, row, col, step=1):
        if row >= 0 and row < SIZE and \
            col >= 0 and col < SIZE and \
            board[row][col] == 0:
            board[row][col] = step
            if step == SIZE * SIZE:
                global total
                total += 1
                print(f'The first{total}Walking method: ')
                print_board(board)
            patrol(board, row - 2, col - 1, step + 1)
            patrol(board, row - 1, col - 2, step + 1)
            patrol(board, row + 1, col - 2, step + 1)
            patrol(board, row + 2, col - 1, step + 1)
            patrol(board, row + 2, col + 1, step + 1)
            patrol(board, row + 1, col + 2, step + 1)
            patrol(board, row - 1, col + 2, step + 1)
            patrol(board, row - 2, col + 1, step + 1)
            board[row][col] = 0
    
    
    def main():
        board = [[0] * SIZE for _ in range(SIZE)]
        patrol(board, SIZE - 1, SIZE - 1)
    
    
    if __name__ == '__main__':
        main()
    

    Dynamic programming example: the maximum value of the sum of sub list elements.

    Note: sublist refers to the list composed of continuous index (subscript) elements in the list; The elements in the list are of type int and may contain positive integers, 0 and negative integers; The program inputs the elements in the list and outputs the maximum sum value of the elements in the sub list, for example:

    Input: 1 - 2 3 5 - 3 2

    Output: 8

    Input: 0 - 2 3 5 - 1 2

    Output: 9

    Input: - 9 - 2 - 3 - 5 - 3

    Output: - 2

    def main():
        items = list(map(int, input().split()))
        overall = partial = items[0]
        for i in range(1, len(items)):
            partial = max(items[i], partial + items[i])
            overall = max(partial, overall)
        print(overall)
    
    
    if __name__ == '__main__':
        main()
    

    Note: the easiest solution to this problem is to use double loops, but the time performance of the code will become very bad. Using the idea of dynamic programming, only two more variables are used O ( N 2 ) O(N^2) The problem of O(N2) complexity becomes O ( N ) O(N) O(N).

How functions are used

  • Treat functions as "first class citizens"

    • Functions can be assigned to variables
    • Functions can be used as arguments to functions
    • Function can be used as the return value of a function
  • Usage of higher-order functions (filter, map and their substitutes)

    items1 = list(map(lambda x: x ** 2, filter(lambda x: x % 2, range(1, 10))))
    items2 = [x ** 2 for x in range(1, 10) if x % 2]
    
  • Location parameter, variable parameter, keyword parameter, named keyword parameter

  • Meta information of parameters (code readability problem)

  • Usage of anonymous and inline functions (lambda function)

  • Closure and scope issues

    • LEGB order of Python search variables (local > > > embedded > > > Global > > > build in)

    • Role of global and nonlocal keywords

      Global: declare or define global variables (either directly use the variables of the existing global scope, or define a variable into the global scope).

      nonlocal: declare a variable using nested scope (the variable must exist in nested scope, otherwise an error will be reported).

  • Decorator function (use decorator and cancel decorator)

    Example: decorator that outputs function execution time.

    def record_time(func):
        """Decorator for custom decorating function"""
        
        @wraps(func)
        def wrapper(*args, **kwargs):
            start = time()
            result = func(*args, **kwargs)
            print(f'{func.__name__}: {time() - start}second')
            return result
            
        return wrapper
    

    If the decorator does not want to be coupled with the print function, you can write a decorator that can be parameterized.

    from functools import wraps
    from time import time
    
    
    def record(output):
        """Decorator that can be parameterized"""
    	
    	def decorate(func):
    		
    		@wraps(func)
    		def wrapper(*args, **kwargs):
    			start = time()
    			result = func(*args, **kwargs)
    			output(func.__name__, time() - start)
    			return result
                
    		return wrapper
    	
    	return decorate
    
    from functools import wraps
    from time import time
    
    
    class Record():
        """Define decorators by defining classes"""
    
        def __init__(self, output):
            self.output = output
    
        def __call__(self, func):
    
            @wraps(func)
            def wrapper(*args, **kwargs):
                start = time()
                result = func(*args, **kwargs)
                self.output(func.__name__, time() - start)
                return result
    
            return wrapper
    

    Note: since the @ wraps decorator is added to the function with decoration function, you can use func__ wrapped__ Method to get the function or class before being decorated to cancel the function of the decorator.

    Example: use decorator to implement singleton mode.

    from functools import wraps
    
    
    def singleton(cls):
        """Decorative ornament"""
        instances = {}
    
        @wraps(cls)
        def wrapper(*args, **kwargs):
            if cls not in instances:
                instances[cls] = cls(*args, **kwargs)
            return instances[cls]
    
        return wrapper
    
    
    @singleton
    class President:
        """president(Singleton class)"""
        pass
    

    Tip: the above code uses closure s. I don't know if you have realized it. There is no small problem that the above code does not implement a thread safe singleton. What should I do if I want to implement a thread safe singleton?

    Thread safe singleton decorator.

    from functools import wraps
    from threading import RLock
    
    
    def singleton(cls):
        """Thread safe singleton decorator"""
        instances = {}
        locker = RLock()
    
        @wraps(cls)
        def wrapper(*args, **kwargs):
            if cls not in instances:
                with locker:
                    if cls not in instances:
                        instances[cls] = cls(*args, **kwargs)
            return instances[cls]
    
        return wrapper
    

    Tip: the above code uses the with context syntax for lock operation, because the lock object itself is the context manager object (supports _ enter and _ exit magic methods). In the wrapper function, we first check without a lock, and then check with a lock. This is better than the direct lock check. If the object has been created, we don't have to lock it, but just return it directly.

Object oriented knowledge

  • Three pillars: encapsulation, inheritance and polymorphism

    Example: salary settlement system.

    """
    Monthly salary settlement system - Department Manager 15000 programmers per month 200 salesmen per hour 1800 base salary plus sales 5%Commission
    """
    from abc import ABCMeta, abstractmethod
    
    
    class Employee(metaclass=ABCMeta):
        """staff(abstract class)"""
    
        def __init__(self, name):
            self.name = name
    
        @abstractmethod
        def get_salary(self):
            """Settle monthly salary(Abstract method)"""
            pass
    
    
    class Manager(Employee):
        """division manager"""
    
        def get_salary(self):
            return 15000.0
    
    
    class Programmer(Employee):
        """programmer"""
    
        def __init__(self, name, working_hour=0):
            self.working_hour = working_hour
            super().__init__(name)
    
        def get_salary(self):
            return 200.0 * self.working_hour
    
    
    class Salesman(Employee):
        """salesperson"""
    
        def __init__(self, name, sales=0.0):
            self.sales = sales
            super().__init__(name)
    
        def get_salary(self):
            return 1800.0 + self.sales * 0.05
    
    
    class EmployeeFactory:
        """Create a factory for employees (factory mode) - Decoupling between object users and objects through factories)"""
    
        @staticmethod
        def create(emp_type, *args, **kwargs):
            """Create employee"""
            all_emp_types = {'M': Manager, 'P': Programmer, 'S': Salesman}
            cls = all_emp_types[emp_type.upper()]
            return cls(*args, **kwargs) if cls else None
    
    
    def main():
        """Main function"""
        emps = [
            EmployeeFactory.create('M', 'Cao Cao'), 
            EmployeeFactory.create('P', 'Xun Yu', 120),
            EmployeeFactory.create('P', 'Guo Jia', 85), 
            EmployeeFactory.create('S', 'Dianwei', 123000),
        ]
        for emp in emps:
            print(f'{emp.name}: {emp.get_salary():.2f}element')
    
    
    if __name__ == '__main__':
        main()
    
  • Relationship between classes

    • is-a relationship: Inheritance
    • has-a relationship: Association / aggregation / composition
    • use-a relationship: dependency

    Example: poker game.

    """
    Experience: symbolic constants are always better than literal constants, and enumeration types are the best choice for defining symbolic constants
    """
    from enum import Enum, unique
    
    import random
    
    
    @unique
    class Suite(Enum):
        """Decor"""
    
        SPADE, HEART, CLUB, DIAMOND = range(4)
    
        def __lt__(self, other):
            return self.value < other.value
    
    
    class Card():
        """brand"""
    
        def __init__(self, suite, face):
            """Initialization method"""
            self.suite = suite
            self.face = face
    
        def show(self):
            """Display board"""
            suites = ['♠︎', '♥︎', '♣︎', '♦︎']
            faces = ['', 'A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']
            return f'{suites[self.suite.value]}{faces[self.face]}'
    
        def __repr__(self):
            return self.show()
    
    
    class Poker():
        """poker"""
    
        def __init__(self):
            self.index = 0
            self.cards = [Card(suite, face)
                          for suite in Suite
                          for face in range(1, 14)]
    
        def shuffle(self):
            """Shuffle (random disorder)"""
            random.shuffle(self.cards)
            self.index = 0
    
        def deal(self):
            """Licensing"""
            card = self.cards[self.index]
            self.index += 1
            return card
    
        @property
        def has_more(self):
            return self.index < len(self.cards)
    
    
    class Player():
        """game player"""
    
        def __init__(self, name):
            self.name = name
            self.cards = []
    
        def get_one(self, card):
            """Touch a card"""
            self.cards.append(card)
    
        def sort(self, comp=lambda card: (card.suite, card.face)):
            """Tidy up your cards"""
            self.cards.sort(key=comp)
    
    
    def main():
        """Main function"""
        poker = Poker()
        poker.shuffle()
        players = [Player('East evil'), Player('Western poison'), Player('South Emperor'), Player('Northern beggar')]
        while poker.has_more:
            for player in players:
                    player.get_one(poker.deal())
        for player in players:
            player.sort()
            print(player.name, end=': ')
            print(player.cards)
    
    
    if __name__ == '__main__':
        main()
    

    Note: Emoji character is used in the above code to represent the four colors of playing cards, which may not be displayed on some systems that do not support Emoji character.

  • Replication of objects (deep copy / deep copy / deep clone and shallow copy / shallow copy / shadow clone)

  • Garbage collection, circular references, and weak references

    Python uses automatic memory management. This management mechanism is based on reference counting. At the same time, it also introduces the strategies of mark clear and generational collection.

    typedef struct _object {
        /* Reference count */
        int ob_refcnt;
        /* Object pointer */
        struct _typeobject *ob_type;
    } PyObject;
    
    /* Macro definition for increasing reference count */
    #define Py_INCREF(op)   ((op)->ob_refcnt++)
    /* Macro definition for reducing reference count */
    #define Py_ Decrease (OP) \ / / decrease count
        if (--(op)->ob_refcnt != 0) \
            ; \
        else \
            __Py_Dealloc((PyObject *)(op))
    

    Conditions that result in a reference count of + 1:

    • The object is created, for example, a = 23
    • Object is referenced, e.g. b = a
    • Object is passed as a parameter into a function, such as f(a)
    • Object is stored in a container as an element, for example list1 = [a, a]

    Condition causing reference count - 1:

    • The alias of the object is explicitly destroyed, such as del a
    • The alias of the object is given to the new object, for example, a = 24
    • An object leaves its scope. For example, when the f function is completed, the local variable (global variable) in the f function will not be changed
    • The container in which the object is located is destroyed or the object is deleted from the container

    Reference counting can cause circular reference problems, which can lead to memory leaks, as shown in the following code. To solve this problem, Python has introduced "mark clear" and "generational collection". When creating an object, the object is placed in the first generation. If the object survives in the garbage inspection of the first generation, the object will be placed in the second generation. Similarly, if the object survives in the garbage inspection of the second generation, the object will be placed in the third generation.

    # Circular references can lead to memory leaks - Python introduces tag cleaning and generational recycling in addition to reference technology
    # Before Python 3.6, if rewritten__ del__ The magic method will invalidate the circular reference processing
    # If you don't want to cause circular references, you can use weak references
    list1 = []
    list2 = [] 
    list1.append(list2)
    list2.append(list1)
    

    The following conditions can lead to garbage collection:

    • Call GC collect()
    • The counter of gc module has reached the threshold
    • Program exit

    If both objects in the circular reference are defined__ del__ Method, the gc module will not destroy these unreachable objects, because the gc module does not know which object should be called first__ del__ Method, which is solved in Python 3.6.

    You can also solve the problem of circular reference by constructing weak reference in weakref module.

  • Magic attributes and methods (please refer to the Python magic method Guide)

    There are a few small questions for you to think about:

    • Can a user-defined object be operated with an operator?
    • Can custom objects be put into set? Can you remove the weight?
    • Can a custom object be used as the key of dict?
    • Can custom objects use context syntax?
  • Mix in

    Example: user defined dictionary limits that key value pairs can be set in the dictionary only when the specified key does not exist.

    class SetOnceMappingMixin:
        """Custom blend class"""
        __slots__ = ()
    
        def __setitem__(self, key, value):
            if key in self:
                raise KeyError(str(key) + ' already set')
            return super().__setitem__(key, value)
    
    
    class SetOnceDict(SetOnceMappingMixin, dict):
        """Custom dictionary"""
        pass
    
    
    my_dict= SetOnceDict()
    try:
        my_dict['username'] = 'jackfrued'
        my_dict['username'] = 'hellokitty'
    except KeyError:
        pass
    print(my_dict)
    
  • Metaprogramming and metaclasses

    Objects are created through classes, and classes are created through metaclasses, which provide meta information for creating classes. All classes are directly or indirectly inherited from object, and all metaclasses are directly or indirectly inherited from type.

    Example: using metaclasses to implement singleton mode.

    import threading
    
    
    class SingletonMeta(type):
        """Custom metaclass"""
    
        def __init__(cls, *args, **kwargs):
            cls.__instance = None
            cls.__lock = threading.RLock()
            super().__init__(*args, **kwargs)
    
        def __call__(cls, *args, **kwargs):
            if cls.__instance is None:
                with cls.__lock:
                    if cls.__instance is None:
                        cls.__instance = super().__call__(*args, **kwargs)
            return cls.__instance
    
    
    class President(metaclass=SingletonMeta):
        """president(Singleton class)"""
        
        pass
    
  • Object oriented design principles

    • Single responsibility principle (SRP) - a class does only what it should do (class design should be highly cohesive)
    • Open close principle (OCP) - software entities should turn off extension development and modification
    • Dependency Inversion Principle (DIP) - oriented Abstract Programming (weakened in weakly typed languages)
    • Richter substitution principle (LSP) - you can replace parent objects with child objects at any time
    • Interface isolation principle (ISP) - interfaces should be small but specialized rather than large and complete (there is no concept of interface in Python)
    • Composite aggregation Reuse Principle (CARP) - reuse code with strong correlation rather than inheritance
    • The principle of least knowledge (Dimitri's law, LoD) - do not send messages to objects that are not necessarily connected

    Note: the bold letters above are called the object-oriented SOLID principle.

  • GoF design pattern

    • Creation mode: single instance, factory, builder, prototype
    • Structural mode: adapter, facade (appearance), agent
    • Behavioral patterns: iterators, observers, States, strategies

    Example: pluggable hash algorithm (policy mode).

    class StreamHasher():
        """Hash summarizer"""
    
        def __init__(self, alg='md5', size=4096):
            self.size = size
            alg = alg.lower()
            self.hasher = getattr(__import__('hashlib'), alg.lower())()
    
        def __call__(self, stream):
            return self.to_digest(stream)
    
        def to_digest(self, stream):
            """Generate a summary in hexadecimal form"""
            for buf in iter(lambda: stream.read(self.size), b''):
                self.hasher.update(buf)
            return self.hasher.hexdigest()
    
    def main():
        """Main function"""
        hasher1 = StreamHasher()
        with open('Python-3.7.6.tgz', 'rb') as stream:
            print(hasher1.to_digest(stream))
        hasher2 = StreamHasher('sha1')
        with open('Python-3.7.6.tgz', 'rb') as stream:
            print(hasher2(stream))
    
    
    if __name__ == '__main__':
        main()
    

Iterators and generators

  • The object iterator implements the protocol iterator.

    • There are no keywords in Python that define protocols such as protocol or interface.
    • The protocol is represented by magic in Python.
    • __ iter__ And__ next__ The magic method is the iterator protocol.
    class Fib(object):
        """iterator """
        
        def __init__(self, num):
            self.num = num
            self.a, self.b = 0, 1
            self.idx = 0
       
        def __iter__(self):
            return self
    
        def __next__(self):
            if self.idx < self.num:
                self.a, self.b = self.b, self.a + self.b
                self.idx += 1
                return self.a
            raise StopIteration()
    
  • Generators are syntactically simplified versions of iterators.

    def fib(num):
        """generator """
        a, b = 0, 1
        for _ in range(num):
            a, b = b, a + b
            yield a
    
  • Generators evolve into co processes.

    The generator object can send data using the send() method, and the sent data will become the value obtained through the yield expression in the generator function. In this way, the generator can be used as a collaborative process, which is simply a subroutine that can cooperate with each other.

    def calc_avg():
        """Flow calculation average"""
        total, counter = 0, 0
        avg_value = None
        while True:
            value = yield avg_value
            total, counter = total + value, counter + 1
            avg_value = total / counter
    
    
    gen = calc_avg()
    next(gen)
    print(gen.send(10))
    print(gen.send(20))
    print(gen.send(30))
    

Concurrent programming

There are three schemes to realize concurrent programming in Python: multithreading, multi process and asynchronous I/O. The advantage of concurrent programming is that it can improve the execution efficiency of the program and improve the user experience; The disadvantage is that concurrent programs are not easy to develop and debug, and it is not friendly to other programs.

  • Multithreading: Python provides the Thread class, supplemented by Lock, Condition, Event, Semaphore and Barrier. Python has GIL to prevent multiple threads from executing local bytecode at the same time. This Lock is necessary for CPython, because CPython's memory management is not Thread safe, and because of GIL, multithreading cannot give play to the multi-core characteristics of CPU.

    """
    Interview question: the difference and connection between process and thread?
    process - The basic unit of memory allocated by the operating system - A process can contain one or more threads
     thread  - Operating system allocation CPU Basic unit of
     Concurrent programming( concurrent programming)
    1. Improve execution performance - So that the non causal part of the program can be executed concurrently
    2. Improve user experience - So that the time-consuming operation will not cause the fake death of the program
    """
    import glob
    import os
    import threading
    
    from PIL import Image
    
    PREFIX = 'thumbnails'
    
    
    def generate_thumbnail(infile, size, format='PNG'):
        """Generates thumbnails of the specified picture file"""
    	file, ext = os.path.splitext(infile)
    	file = file[file.rfind('/') + 1:]
    	outfile = f'{PREFIX}/{file}_{size[0]}_{size[1]}.{ext}'
    	img = Image.open(infile)
    	img.thumbnail(size, Image.ANTIALIAS)
    	img.save(outfile, format)
    
    
    def main():
        """Main function"""
    	if not os.path.exists(PREFIX):
    		os.mkdir(PREFIX)
    	for infile in glob.glob('images/*.png'):
    		for size in (32, 64, 128):
                # Create and start a thread
    			threading.Thread(
    				target=generate_thumbnail, 
    				args=(infile, (size, size))
    			).start()
    			
    
    if __name__ == '__main__':
    	main()
    

    Multiple threads compete for resources.

    """
    Multithreaded programs are usually easier to handle if they do not compete for resources
     When multiple threads compete for critical resources, the lack of necessary protection measures will lead to data disorder
     Note: critical resources are resources competed by multiple threads
    """
    import time
    import threading
    
    from concurrent.futures import ThreadPoolExecutor
    
    
    class Account(object):
        """bank account"""
    
        def __init__(self):
            self.balance = 0.0
            self.lock = threading.Lock()
    
        def deposit(self, money):
            # Protect critical resources through locks
            with self.lock:
                new_balance = self.balance + money
                time.sleep(0.001)
                self.balance = new_balance
    
    
    class AddMoneyThread(threading.Thread):
        """Custom thread class"""
    
        def __init__(self, account, money):
            self.account = account
            self.money = money
            # The initialization method of the parent class must be called in the initialization method of the custom thread
            super().__init__()
    
        def run(self):
            # Actions to be performed after the thread starts
            self.account.deposit(self.money)
    
    def main():
        """Main function"""
        account = Account()
        # Create thread pool
        pool = ThreadPoolExecutor(max_workers=10)
        futures = []
        for _ in range(100):
            # The first way to create a thread
            # threading.Thread(
            #     target=account.deposit, args=(1, )
            # ).start()
            # The second way to create a thread
            # AddMoneyThread(account, 1).start()
            # The third way to create a thread
            # Call threads in the thread pool to perform specific tasks
            future = pool.submit(account.deposit, 1)
            futures.append(future)
        # Close thread pool
        pool.shutdown()
        for future in futures:
            future.result()
        print(account.balance)
    
    
    if __name__ == '__main__':
        main()
    

    Modify the above program, start five threads to deposit money into the account, and five threads withdraw money from the account. When withdrawing money, if the balance is insufficient, pause the thread to wait. In order to achieve the above objectives, it is necessary to schedule the threads of saving and withdrawing money. When the balance is insufficient, the thread of withdrawing money pauses and releases the lock. After the thread of saving money deposits the money, it should notify the thread of withdrawing money to wake it up from the suspended state. The Condition of threading module can be used to realize thread scheduling. The object is also created based on lock. The code is as follows:

    """
    Multiple threads compete for a resource - Protect critical resources - Lock( Lock/RLock)
    Multiple threads compete for multiple resources (number of threads)>Number of resources) - Semaphore( Semaphore)
    Scheduling of multiple threads - Pause thread execution/Wake up waiting threads - Condition
    """
    from concurrent.futures import ThreadPoolExecutor
    from random import randint
    from time import sleep
    
    import threading
    
    
    class Account:
        """bank account"""
    
        def __init__(self, balance=0):
            self.balance = balance
            lock = threading.RLock()
            self.condition = threading.Condition(lock)
    
        def withdraw(self, money):
            """Withdraw money"""
            with self.condition:
                while money > self.balance:
                    self.condition.wait()
                new_balance = self.balance - money
                sleep(0.001)
                self.balance = new_balance
    
        def deposit(self, money):
            """save money"""
            with self.condition:
                new_balance = self.balance + money
                sleep(0.001)
                self.balance = new_balance
                self.condition.notify_all()
    
    
    def add_money(account):
        while True:
            money = randint(5, 10)
            account.deposit(money)
            print(threading.current_thread().name, 
                  ':', money, '====>', account.balance)
            sleep(0.5)
    
    
    def sub_money(account):
        while True:
            money = randint(10, 30)
            account.withdraw(money)
            print(threading.current_thread().name, 
                  ':', money, '<====', account.balance)
            sleep(1)
    
    
    def main():
        account = Account()
        with ThreadPoolExecutor(max_workers=15) as pool:
            for _ in range(5):
                pool.submit(add_money, account)
            for _ in range(10):
                pool.submit(sub_money, account)
    
    
    if __name__ == '__main__':
        main()
    
  • Multi Process: multi Process can effectively solve the problem of GIL. The main class to realize multi Process is Process. Other auxiliary classes are similar to those in threading module. Pipes, sockets, etc. can be used to share data between processes. There is a Queue class in multiprocessing module, which provides a Queue shared by multiple processes based on pipes and locking mechanism. The following is an example of multiple processes and Process pools in the official documentation.

    """
    Use of multiple processes and process pools
     Multithreading because GIL The existence of can not play a role CPU Multi-core characteristics of
     For computationally intensive tasks, you should consider using multiple processes
    time python3 example22.py
    real    0m11.512s
    user    0m39.319s
    sys     0m0.169s
     After using multiple processes, the actual execution time is 11.512 Seconds, while user time 39.319 Seconds is about 4 times the actual execution time
     This proves that our program is used through multiple processes CPU And this computer is configured with a 4-core CPU
    """
    import concurrent.futures
    import math
    
    PRIMES = [
        1116281,
        1297337,
        104395303,
        472882027,
        533000389,
        817504243,
        982451653,
        112272535095293,
        112582705942171,
        112272535095293,
        115280095190773,
        115797848077099,
        1099726899285419
    ] * 5
    
    
    def is_prime(n):
        """Judgement prime"""
        if n % 2 == 0:
            return False
    
        sqrt_n = int(math.floor(math.sqrt(n)))
        for i in range(3, sqrt_n + 1, 2):
            if n % i == 0:
                return False
        return True
    
    
    def main():
        """Main function"""
        with concurrent.futures.ProcessPoolExecutor() as executor:
            for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
                print('%d is prime: %s' % (number, prime))
    
    
    if __name__ == '__main__':
        main()
    

    Key points: comparison between multithreading and multiprocessing.

    Multithreading is required in the following cases:

    1. Programs need to maintain many shared states (especially variable states). Lists, dictionaries and collections in Python are thread safe, so the cost of using threads instead of processes to maintain shared states is relatively small.
    2. The program will spend a lot of time on I/O operation. There is not much need for parallel computing and does not need to occupy too much memory.

    Multiple processes are required for the following situations:

    1. The program performs computing intensive tasks (such as bytecode operation, data processing, scientific computing).
    2. The input of the program can be divided into blocks in parallel, and the operation results can be combined.
    3. The program has no restrictions on memory usage and does not rely strongly on I/O operations (such as reading and writing files, sockets, etc.).
  • Asynchronous processing: select tasks from the task queue of the scheduler. The scheduler executes these tasks in a cross form. We cannot guarantee that the tasks will be executed in a certain order, because the execution order depends on whether one task in the queue is willing to give up CPU processing time to another task. Asynchronous tasks are usually implemented through multi task collaborative processing. Due to the uncertainty of execution time and order, it is necessary to obtain the execution results of tasks through callback programming or future objects. Python 3 supports asynchronous processing through the asyncio module and await and async keywords (officially listed as keywords in Python 3.7).

    """
    asynchronous I/O - async / await
    """
    import asyncio
    
    
    def num_generator(m, n):
        """Number generator for specified range"""
        yield from range(m, n + 1)
    
    
    async def prime_filter(m, n):
        """Prime Number Sieves """
        primes = []
        for i in num_generator(m, n):
            flag = True
            for j in range(2, int(i ** 0.5 + 1)):
                if i % j == 0:
                    flag = False
                    break
            if flag:
                print('Prime =>', i)
                primes.append(i)
    
            await asyncio.sleep(0.001)
        return tuple(primes)
    
    
    async def square_mapper(m, n):
        """Square mapper"""
        squares = []
        for i in num_generator(m, n):
            print('Square =>', i * i)
            squares.append(i * i)
    
            await asyncio.sleep(0.001)
        return squares
    
    
    def main():
        """Main function"""
        loop = asyncio.get_event_loop()
        future = asyncio.gather(prime_filter(2, 100), square_mapper(1, 100))
        future.add_done_callback(lambda x: print(x.result()))
        loop.run_until_complete(future)
        loop.close()
    
    
    if __name__ == '__main__':
        main()
    

    Note: the above code uses get_ event_ The loop function obtains the default event loop of the system. A future object and the add of the future object can be obtained through the gather function_ done_ Callback can add the callback function at the completion of execution, and the run of loop object_ until_ The complete method can wait for the execution result of the collaboration to be obtained through the future object.

    Python has a third-party library called aiohttp, which provides asynchronous HTTP client and server. This third-party library can work with asyncio module and provide support for Future objects. async and await are introduced in Python 3.6 to define functions executed asynchronously and create asynchronous context. They have officially become keywords in Python 3.7. The following code asynchronously obtains the page from five URL s and extracts the title of the website through the named capture group of regular expression.

    import asyncio
    import re
    
    import aiohttp
    
    PATTERN = re.compile(r'\<title\>(?P<title>.*)\<\/title\>')
    
    
    async def fetch_page(session, url):
        async with session.get(url, ssl=False) as resp:
            return await resp.text()
    
    
    async def show_title(url):
        async with aiohttp.ClientSession() as session:
            html = await fetch_page(session, url)
            print(PATTERN.search(html).group('title'))
    
    
    def main():
        urls = ('https://www.python.org/',
                'https://git-scm.com/',
                'https://www.jd.com/',
                'https://www.taobao.com/',
                'https://www.douban.com/')
        loop = asyncio.get_event_loop()
        cos = [show_title(url) for url in urls]
        loop.run_until_complete(asyncio.wait(cos))
        loop.close()
    
    
    if __name__ == '__main__':
        main()
    

    Key points: comparison between asynchronous I/O and multi process.

    Asyncio is a good choice when programs do not need real concurrency or parallelism, but rely more on asynchronous processing and callback. If there is a lot of waiting and hibernation in the program, you should also consider asyncio, which is very suitable for writing Web application servers without real-time data processing requirements.

    Python also has many third-party libraries for processing parallel tasks, such as joblib, PyMP, etc. In actual development, to improve the scalability and concurrency of the system, there are usually two methods: vertical expansion (increasing the processing capacity of a single node) and horizontal expansion (turning a single node into multiple nodes). The decoupling of applications can be realized through message Queue. Message Queue is equivalent to the extended version of multi-threaded synchronous Queue. Applications on different machines are equivalent to threads, and the shared distributed message Queue is the Queue in the original program. The most popular and standardized implementation of message queuing (Message Oriented Middleware) is AMQP (Advanced message queuing protocol). AMQP originates from the financial industry and provides queuing, routing, reliable transmission, security and other functions. The most famous implementations include Apache ActiveMQ, RabbitMQ and so on.

    To achieve task asynchronization, you can use a third-party library called celery. Celery is a distributed task queue written in Python. It uses distributed messages to work. It can be used as the back-end message broker based on RabbitMQ or Redis.

Keywords: Python

Added by flashbot on Mon, 31 Jan 2022 08:10:05 +0200