python Basics

catalogue

Process and thread

Multi process

os.fork (a fork system call provided by Unix/Linux operating system)

multiprocessing (multi process module of cross platform version)

Pool 

Subprocess

Interprocess communication

Multithreading (data sharing)

threading.Thread

Lock

Multi core CPU (due to the existence of Gil lock, cpython multithreading cannot achieve parallelism)

ThreadLocal

Process and thread

  • Multi process

    • The Unix/Linux operating system provides a fork() system call (fork is not supported on Windows systems).

      Ordinary function call, call once, return once. However, fork() is called once and returned twice, because the operating system automatically copies the current process (called the parent process) (called the child process), and then returns it in the parent process and the child process respectively

      The child process always returns 0, while the parent process returns the ID of the child process

    • os.fork (a fork system call provided by Unix/Linux operating system)

      • Python's os module encapsulates common system calls, including fork (fork is not supported on Windows systems)
      • With the fork call, a process can copy a child process to handle a new task when it receives a new task. The common Apache server is that the parent process listens to the port. Whenever there is a new http request, it forks out the child process to handle the new http request
      • """
        Child processes only need to call getppid()You can get the name of the parent process ID
        
        getpid():Returns the of the current process id
        getppid():Returns the of the parent process of the current process id
        """
        
        import os
        
        print('Process (%s) start...' % os.getpid())
        # Only works on Unix/Linux/Mac:
        pid = os.fork()
        if pid == 0:
            print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
        else:
            print('I (%s) just created a child process (%s).' % (os.getpid(), pid))
    • multiprocessing (multi process module of cross platform version)

      • from multiprocessing import Process
        import os
        
        # Code to be executed by child process
        def run_proc(name):
            print('Run child process %s (%s)...' % (name, os.getpid()))
        
        if __name__=='__main__':
            p = Process(target=run_proc, args=('test',)) #
            p.start()
            p.join()   # Wait for the child process to finish before continuing to run. It is used for synchronization between processes
        
    • Pool 

      • Start a large number of sub processes, and you can create sub processes in batch by means of process pool
      • from multiprocessing import Pool
        import os, time, random
        
        def long_time_task(name):
            print('Run task %s (%s)...' % (name, os.getpid()))
            start = time.time()
            time.sleep(random.random() * 3)
            end = time.time()
            print('Task %s runs %0.2f seconds.' % (name, (end - start)))
        
        if __name__=='__main__':
        
            p = Pool(4)
            for i in range(5):
                p.apply_async(long_time_task, args=(i,))
        
            p.close()  # Before calling join(), you must call close(). After calling close(), you cannot add a new Process
            p.join()   # Wait for all child processes to complete execution
    • Subprocess

      • Many times, a subprocess is not itself, but an external process. After creating a subprocess, you also need to control the input and output of the subprocess
      • The subprocess module allows us to easily start a subprocess and then control its input and output
      • """
        Below Python Running commands in code nslookup www.python.org
        
        It has the same effect as running directly from the command line
        """
        import subprocess
        
        r = subprocess.call(['nslookup', 'www.python.org'])
        
        """
        If the child process still needs input, you can use the communicate()Method input
        
        The following code is equivalent to executing a command on the command line nslookup,Then enter manually:
        set q=mx
        python.org
        exit
        """
        
        p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
        print(output.decode('utf-8'))
    • Interprocess communication

      • Python's multiprocessing module wraps the underlying mechanism and provides multiple ways to exchange data, such as Queue and Pipes
      • """
        Queue For example, create two child processes in the parent process
        
        One to Queue Write data from Queue Read data in
        """
        
        from multiprocessing import Process, Queue
        import os, time, random
        
        # Write the code executed by the data process:
        def write(q):
            print('Process to write: %s' % os.getpid())
            for value in ['A', 'B', 'C']:
                print('Put %s to queue...' % value)
                q.put(value)
                time.sleep(random.random())
        
        # Code executed by data reading process:
        def read(q):
            print('Process to read: %s' % os.getpid())
            while True:
                value = q.get(True)
                print('Get %s from queue.' % value)
        
        if __name__=='__main__':
            # The parent process creates a Queue and passes it to each child process:
            q = Queue()
            pw = Process(target=write, args=(q,))
            pr = Process(target=read, args=(q,))
            # Start subprocess pw, write:
            pw.start()
            # Start subprocess pr, read:
            pr.start()
            # Wait pw end:
            pw.join()
            # The pr process is an endless loop. You can't wait for it to end. You can only forcibly terminate it:
            pr.terminate()

  • Multithreading (data sharing)

    • threading.Thread

      • A process is composed of several threads. A process has at least one thread. Threads are the execution unit directly supported by the operating system. Therefore, high-level languages usually have built-in multithreading support

      • Python's standard library provides two modules:_ thread and threading_ thread is a low-level module and threading is a high-level module, right_ thread is encapsulated

      • """
        To start a thread is to pass in a function and create it Thread Instance, then call start()Start execution
        """
        import time, threading
        
        # Code executed by the new thread:
        def loop():
            print('thread %s is running...' % threading.current_thread().name)
            n = 0
            while n < 5:
                n = n + 1
                print('thread %s >>> %s' % (threading.current_thread().name, n))
                time.sleep(1)
            print('thread %s ended.' % threading.current_thread().name)
        
        print('thread %s is running...' % threading.current_thread().name)
        
        
        t = threading.Thread(target=loop, name='LoopThread')
        t.start()
        t.join()
        print('thread %s ended.' % threading.current_thread().name)
        
        
        """
        output:
        thread MainThread is running...
        thread LoopThread is running...
        thread LoopThread >>> 1
        thread LoopThread >>> 2
        thread LoopThread >>> 3
        thread LoopThread >>> 4
        thread LoopThread >>> 5
        thread LoopThread ended.
        thread MainThread ended.
        
        Any process will start a thread by default. We call this thread the main thread.
        
        The main thread can start a new thread, Python of threading Module has current_thread()Function, which always returns an instance of the current thread.
        
        The name of the main thread instance is MainThread,The name of the child thread is specified when it is created. We use LoopThread Name the child thread.
        
        The name is only used to display when printing. It has no other meaning at all. If you can't name it Python The thread is automatically named Thread-1,Thread-2......
        """
    • Lock

      • The biggest difference between multithreading and multiprocessing is that in multiprocessing, the same variable does not affect each other, while in multithreading, all variable threads share
      • Variable threads are error prone when sharing data. Because a statement of a high-level language is several statements when executed by the CPU, even a simple calculation:
        balance = balance + n
        
        """
        There are two steps:
        
        calculation balance + n,Stored in temporary variables;
        Assign the value of a temporary variable to balance. 
        
        If multiple threads execute alternately, balance The value of may be changed
        """
      • So lock the function
      • """
        Other threads cannot execute at the same time change_it(),You can only wait until the lock is released and can't change it until you get the lock.
        
        Since there is only one lock, no matter how many threads, at most one thread holds the lock at the same time, there will be no modification conflict
        
        Create a lock through threading.Lock()To achieve
        """
        
        import time, threading
        
        balance = 0  #global variable
        
        def change_it(n):
            # Save before retrieve, the result should be 0:
            global balance
            balance = balance + n
            balance = balance - n
        
        
        lock = threading.Lock()
        
        def run_thread(n):
            for i in range(100000):
                # To acquire a lock:
                lock.acquire()
                try:
                    # Change it safely:
                    change_it(n)
                finally:
                    # Release the lock after modification:
                    lock.release()
      •   

        The advantage of locking: it ensures that a piece of key code can only be executed completely by one thread from beginning to end

        Disadvantages:

        The first is to prevent multi-threaded concurrent execution. In fact, a piece of code containing a lock can only be executed in single thread mode, which greatly reduces the efficiency.

        Secondly, because there can be multiple locks, different threads hold different locks and try to obtain the locks held by the other party, it may cause deadlock, resulting in multiple threads hanging, which can neither execute nor end, and can only be forcibly terminated by the operating system

    • Multi core CPU (due to the existence of Gil lock, cpython multithreading cannot achieve parallelism)

      • If there are two dead loop threads, in a multi-core CPU, it can be monitored that it will occupy 200% of the CPU, that is, it will occupy two CPU cores. If you want to run all the cores of N-Core CPU, you must start n dead loop threads

      • Start N threads with the same number of CPU cores. On the 4-core CPU, it can be monitored that the CPU utilization rate is only 102%, that is, only one core is used

      • import threading, multiprocessing
        
        def loop():
            x = 0
            while True:
                x = x ^ 1
        
        for i in range(multiprocessing.cpu_count()):
            t = threading.Thread(target=loop)
            t.start()
      • Using C, C + + or Java to rewrite the same dead cycle can directly run all cores to 400% for 4 cores and 800% for 8 cores, but not Python.
        • Due to GIL lock: Global Interpreter Lock
          • In the multithreaded environment of python, the "Python level thread scheduling" will be triggered when io operations are encountered or every 100 instructions are executed (called "soft clock"). At this time, thread A releases the GIL and thread B obtains the GIL, so as to master the "execution power" of the interpreter
          • GIL ensures that only one thread executes code at the same time, and each thread must obtain GIL first during execution
          • GIL only works for computing intensive programs, but has no effect on Io intensive programs, because GIL lock will be automatically released in case of IO blocking
          • CPython is the default Python execution environment in most environments
      • The difference between Lock thread Lock and GIL Lock: Lock is to Lock a statement in a thread, GIL is to Lock the whole thread (or interpreter use right), and GIL is a coarse-grained Lock
      • When you need to execute computationally intensive programs, you can choose: 1 Interpreter, 2 Extended C language, 3 Change multiple processes and other schemes
    • ThreadLocal

      • In the case of multithreading, each thread can have its own data (not shared), just like local variables, but it is troublesome to pass local variables when calling functions
        • def process_student(name):
              std = Student(name)  #std is a local variable, but every function uses it, so it must be passed in
              do_task_1(std)
              do_task_2(std)
          
          def do_task_1(std):
              do_subtask_1(std)
              do_subtask_2(std)
          
          def do_task_2(std):
              do_subtask_2(std)
              do_subtask_2(std)
      • ThreadLocal is used. Although ThreadLocal variable is a global variable, each thread can only read and write an independent copy of its own thread without interference with each other
        • ThreadLocal solves the problem that parameters are passed between functions in a thread
      • import threading
            
        # Create a global ThreadLocal object
        local_school = threading.local()
        
        def process_student():
            # Get the student associated with the current thread:
            std = local_school.student
            print('Hello, %s (in %s)' % (std, threading.current_thread().name))
        
        def process_thread(name):
            # Bind ThreadLocal student:
            local_school.student = name
            process_student()
        
        t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
        t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
        t1.start()
        t2.start()
        t1.join()
        t2.join()

Keywords: Python

Added by BigBadKev on Sat, 22 Jan 2022 18:13:41 +0200