python learning --- process, thread and coprocess

Multithreading

Multithreading is similar to executing multiple different programs at the same time

  • Each independent thread has an entry for program operation, sequential execution sequence and program exit. However, threads cannot be executed independently. They must be stored in the application, and the application provides multiple thread execution control.
  • Each thread has its own set of CPU registers, called the thread context, which reflects the state of the CPU register in which the thread last ran.
  • Instruction pointer and stack pointer registers are the two most important registers in the thread context. Threads always run in the process context. These addresses are used to mark the memory in the process address space of the thread.
  • Threads can be preempted (interrupted).
  • When other threads are running, threads can be temporarily suspended (also known as sleep), which is the concession of threads.

advantage

  • Using threads, you can put tasks in programs that occupy a long time in the background.
  • The user interface can be more attractive. For example, if the user clicks a button to trigger the processing of some events, a progress bar can pop up to display the processing progress.
  • The program may run faster.
  • In the implementation of some waiting tasks, such as user input, file reading and writing, network sending and receiving data, threads are more useful. In this case, we can release some precious resources, such as memory occupation and so on.

classification

  • Kernel threads: created and undone by the operating system kernel.
  • User thread: a thread implemented in a user program without kernel support.

step

  • newly build
  • be ready
  • function
  • block
  • end
import threading
from time import sleep


def download(n):
    img = ['a.jpg','b.jpg','c.jpg']
    for i in img:
        print('downloading:',i)
        sleep(n)#Relinquish the right to use cpu
        print('donei',i)
def listenmusic():
    musics=['11','22','33','44']
    for m in musics:
        print('listening:',m)
        sleep(0.5)
        print('donem',m)
if __name__ == '__main__':

    t1 = threading.Thread(target=download,name='aa',args=(1,))
    t1.start()

    t2 = threading.Thread(target=listenmusic,name='aa')
    t2.start()
"""
downloading: a.jpg
listening: 11
donem 11
listening: 22
donei a.jpg
downloading: b.jpg
donem 22
listening: 33
donem 33
listening: 44
donei b.jpg
downloading: c.jpg
donem 44
donei c.jpg

Process finished with exit code 0

"""

shared data

Threads can share global variables. When sharing data, we should consider the security of data.

Shared data:

  • If multiple threads modify a data together, unexpected results may occur. In order to ensure the correctness of the data, multiple processes need to be synchronized

Synchronization:
One finish, one finish, the other can come in. Efficiency will be reduced

The advantage of multithreading is that it can run multiple tasks at the same time (python multithreading is pseudo multithreading)

import threading

money =1000
def run1():
    global money
    for i in range(100):
        money-=1

if __name__ == '__main__':

    t1 = threading.Thread(target=run1,name='aa')
    t2 = threading.Thread(target=run1,name='bb')

    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print(money)
 #800

lock

When threads need to share data, there may be a problem of data synchronization. In order to avoid this situation, the concept of lock is introduced.
python multithreading has a global interpreter lock GIL. In order to ensure the safety of data, threads are locked by default. After one thread is executed, another thread is allowed to execute. After locking, the thread is synchronized, which is slow but data is safe.
When the data operation is large enough, the lock will be released by default, and the data is no longer safe.

import threading

money =0
def run1():
    global money
    for i in range(10000000):
        money+=1
    print('1:',money)
def run2():
    global money
    for i in range(10000000):
        money+=1
    print('2:',money)

if __name__ == '__main__':

    t1 = threading.Thread(target=run1,name='aa')
    t2 = threading.Thread(target=run2,name='bb')

    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print(money)
   """
1: 11693709
2: 12647085
12647085
   """

Processes are used when computation is intensive and computationally intensive.
Use threads when there are time-consuming operations. Such as crawler, download, I/O operation, etc

Thread synchronization

Simple Thread synchronization can be achieved by using Lock and Rlock of Thread object

  • lock.acquire() blocking
  • lock. Release

Because python is locked by default, when the amount of data operation is small, whether to write lock or not is locked

import threading
import time

lock = threading.Lock()
l1 = [0]*10
def task1():
    #Get the thread lock. If it is locked, block and wait for the lock to be released
    lock.acquire()#block
    for i in range(len(l1)):
        l1[i]=1
        time.sleep(0.5)
    lock.release()
def task2():
    lock.acquire()#block
    for i in range(len(l1)):
        print('-------->',l1[i])
        time.sleep(0.5)
    lock.release()

if __name__ == '__main__':
    t1 = threading.Thread(target=task1)
    t2 = threading.Thread(target=task2)


    t2.start()
    t1.start()

    t2.join()
    t1.join()

    print(l1)
"""
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
--------> 0
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
"""

deadlock

If two threads occupy some resources respectively and wait for each other's resources at the same time, it will cause deadlock

Deadlock avoidance method

  1. Refactoring code
  2. lock.acquire(timeout =) the timeout is released
  3. Smash the computer
import threading
from threading import Thread,Lock
import time
lock1 = threading.Lock()
lock2 = threading.Lock()

class MyThread1(Thread):
    def run(self) :
        if lock1.acquire():
            print(self.name+'get 1')
            time.sleep(0.1)
            if lock2.acquire(timeout=2):
                print(self.name+'get AB')
                lock2.release()
            lock1.release()
class MyThread2(Thread):
    def run(self) :
        if lock2.acquire():
            print(self.name+'get 2')
            time.sleep(0.1)
            if lock1.acquire():
                print(self.name+'get 1 2 ')
                lock1.release()
            lock2.release()
if __name__ == '__main__':

    t1 = MyThread1()
    t2 = MyThread2()
    t1.start()
    t2.start()
#Thread 1: Take 1 and expect 2. If 2 waits for timeout, release 1
#Thread 2: take 2, expect 1
"""
Thread-1get 1
Thread-2get 2
Thread-2get 1 2 
"""

Communication between two threads: producer and consumer mode

Synergetic process

Micro thread

Applicable to time-consuming operations: network request, network download (crawler), IO operation
Efficient use of CPU

Done by generator

def task1():
    for i in range(3):
        print('A'+str(i))
        yield
        time.sleep(1)
def task2():
    for i in range(3):
        print('B'+str(i))
        yield
        time.sleep(2)
if __name__ == '__main__':
    g1 = task1()
    g2 = task2()
    while True:
        try:
            next(g1)
            next(g2)
        except:
            break
"""
A0
B0
A1
B1
A2
B2
"""

Using greenlet

It is not intelligent enough and needs to be switched manually

  • g1 = greenlet(a)
  • switch()
from  greenlet import greenlet
def a():
    for i in range(5):
        print('A' + str(i))
        g2.switch()
        time.sleep(0.5)
def b():
    for i in range(5):
        print('B' + str(i))
        g3.switch()
        time.sleep(0.5)
def c():
    for i in range(5):
        print('C' + str(i))
        g1.switch()
        time.sleep(0.5)
if __name__ == '__main__':
    g1 = greenlet(a)
    g2 = greenlet(b)
    g3 = greenlet(c)
    g1.switch()

Using gevent

The bottom layer is still a greenlet, which can be switched automatically
g1 = gevent.spawn(a)

monkey patch

Time consuming operation detected, automatic switching

  • from gevent import monkey
  • monkey.patch_all()
import time
import gevent
from gevent import monkey
monkey.patch_all()
def a():
    for i in range(5):
        print('A' + str(i))

        time.sleep(0.5)
def b():
    for i in range(5):
        print('B' + str(i))

        time.sleep(0.5)
def c():
    for i in range(5):
        print('C' + str(i))

        time.sleep(0.5)
if __name__ == '__main__':
    g1 = gevent.spawn(a)
    g2 = gevent.spawn(b)
    g3 = gevent.spawn(c)
    g1.join()
    g2.join()
    g3.join()

case

import urllib.request
import gevent
from gevent import monkey
monkey.patch_all()
def download(url):
    response = urllib.request.urlopen(url)#time-consuming operation 
    content = response.read()
    print('Downloaded{}Data, length:{}'.format(url,len(content)))
if __name__ == '__main__':
    urls = ['https://www.qq.com','https://www.baidu.com','https://www.weibo.com']
    g1 = gevent.spawn(download,urls[0])
    g2 = gevent.spawn(download, urls[1])
    g3 = gevent.spawn(download, urls[2])
    g1.join()
    g2.join()
    g3.join()

Keywords: Python

Added by rkm11 on Sat, 25 Dec 2021 15:56:10 +0200