GIL global interpreter lock and deadlock

Global interpreter lock GIL

The execution of Python code is controlled by the python virtual machine (also known as the interpreter main loop). Python was designed to have only one thread executing in the main loop at the same time. Although multiple threads can be "run" in the Python interpreter, only one thread runs in the interpreter at any time.
Access to the Python virtual machine is controlled by the global interpreter lock (GIL), which ensures that only one thread is running at the same time.

In a multithreaded environment, the Python virtual machine executes as follows:

a. set GIL;

b. switch to a thread to run;

c. run a specified number of bytecode instructions or the thread actively gives up control (you can call time.sleep(0));

d. set the thread to sleep state;

e. unlock GIL;

d. repeat all the above steps again.
When calling external code (such as C/C + + extension function), the GIL will be locked until the end of the function (since no Python bytecode is run during this period, thread switching will not be done). Programmers writing extensions can actively unlock the GIL.

What is a GIL lock

Click to view details
GIL (Global Interperter Lock) It's called a global interpreter lock.
GIL Not at all Python Language characteristics, it is in the implementation Python A concept referenced by the interpreter. GIL Only in CPython Exists on the interpreter.
However, in Python The most used interpreters are CPython Interpreter, so we will inevitably encounter GIL. 
When using mutex lock to solve the resource competition problem in code, when a thread executes, it will lock the globally shared resources. When the thread executes, it will unlock and release the resources before other threads can use it.
GIL The function of is similar to that of mutex, which is to solve the problem of resource competition among multiple threads in the interpreter.

GIL Lock: a global interpreter lock. It is a large lock on the interpreter. The thread must obtain this lock before it can execute. It is only for and cpython interpreter

Why set up global interpretation lock GIL

1 Global interpreter lock, GIL Lock( cpython Interpreter problems)
 At the same time, multiple threads can be opened in a process, but only one thread can execute
 Initial settings GIL Just for garbage collection, there was only single core and no multi-core at that time. Therefore, if multiple threads are opened, they will not be affected by multiple threads CPU Mobility is implemented because there is only a single core. The thread can't run until it gets this lock. There was no problem at that time, because there was only one CPU. But with multi-core CPU It is assumed that the computer has four cores, one process has four threads, and the theory is that one thread and one core, but Python No, opening four threads in a thread will not be run by four cores. Only one thread can run in one core at the same time because GIL reason.
    -python Need to do garbage collection( gc)
    -Garbage collection thread for garbage collection
    -Designed a big lock( GIL Lock). Only the thread that gets the lock can execute
    -At the same time, multiple threads can be opened in a process, but only one thread can execute
    -therefore python Cannot take advantage of multi-core
  

### Only for cpython interpreters (not for other interpreters, including other languages)
2 If it is computationally intensive: start the process
3 If it is io Intensive: to open threads

Verify the existence of GIL lock

Click to view the code
from threading import Thread
import time
m = 100
def test():
    global m
    tmp = m
    tmp -= 1
    m = tmp
for i in range(100):
    t = Thread(target=test)
    t.start()
time.sleep(3)
print(m)


result:
    0
    
"""
Although there are multiple threads in the same process GIL The existence of does not have the effect of parallelism
 But if there is IO The operation will still cause data disorder. At this time, we need to add additional mutexes
"""

The GIL interpreter lock is released in two cases

1. Active release

:
An IO operation is encountered or the allocated CPU time slice is up.

Note that the significance of GIL is to maintain thread safety. x=10 involves IO operations. If it is also regarded as an ordinary IO operation and takes the initiative to hand over GIL, there will be data insecurity. Therefore, x=10 must be treated differently.

As for how x=10 is implemented differently, it is actually well understood that any io operation is to send a system call to the operating system, that is, to call an interface of the operating system. For example, variable assignment operation must call an interface, file reading and writing operation must also call an interface, and network io must also call an interface, This provides an implementation basis for differentiated treatment, that is, variable assignment does not belong to the category of active release, so that GIL can make a difference in thread safety

2. Passive release

python3. After 2, a global variable is defined

/ Python/ceval.c /*
...
*static volatile int gil_drop_request = 0;

Note that when there is only one thread, the thread will run all the time and will not release GIL. When there are multiple threads

For example, thead1, thread2

If thread1 doesn't actively release GIL, it won't let it run all the time

In fact, when thread1 runs the process, thread2 executes a cv_wait(gil,TIMEOUT)

(the default TIMEOUT value is 5milliseconds, but it can be modified). Once the time is up, the global variable will be

gil_drop_request = 1;, Thread thread1 is forced to release the GIL, and then thread thread2 starts running and

An ack is returned to thread thread1, which starts calling cv_wait(gil,TIMEOUT

deadlock

What is deadlock

Deadlock refers to the phenomenon that two or more processes or threads wait for each other due to competing for resources during execution. If there is no external force, they will not be able to move forward. At this time, it is said that the system is in a deadlock state or the system has a deadlock. These processes that are always waiting for each other are called deadlock processes. The following are deadlock processes

Click to view the deadlock code
from threading import Thread, Lock
import time

A = Lock()
B = Lock()


class MyThread(Thread):
    def run(self):
        self.func1()
        self.func2()

    def func1(self):
        A.acquire()
        print('%s Got it A lock' % self.name)  # current_thread().name get thread name
        B.acquire()
        print('%s Got it B lock' % self.name)
        time.sleep(1)
        B.release()
        print('%s Released B lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)

    def func2(self):
        B.acquire()
        print('%s Got it B lock' % self.name)
        A.acquire()
        print('%s Got it A lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)
        B.release()
        print('%s Released B lock' % self.name)

for i in range(10):
    obj = MyThread()
    obj.start()
 
"""Even if you know the characteristics and usage of the lock, don't use it easily, because it is easy to cause deadlock"""

Thread 1 starts execution first func1,Distribution get AB Lock and release
 Thread 1 executes first func2,Got it first B Lock, go sleep
 Thread 2 got it first A lock
 At this time, there is an impasse. Thread 2 wants to be in the hands of thread 1 B Lock, thread 1 wants the lock in thread 2 A Lock.

Solve deadlock problem

Click to view the code
### To solve the deadlock problem, RLock: reentrant. You can repeat acquire. You need to release it several times after obtaining it several times
from threading import Thread, Lock,Rlock
import time


A = RLock()   # Solve deadlock problem
B = A


class MyThread(Thread):
    def run(self):
        self.func1()
        self.func2()

    def func1(self):
        A.acquire()
        print('%s Got it A lock' % self.name)  # current_thread().name get thread name
        B.acquire()
        print('%s Got it B lock' % self.name)
        time.sleep(1)
        B.release()
        print('%s Released B lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)

    def func2(self):
        B.acquire()
        print('%s Got it B lock' % self.name)
        A.acquire()
        print('%s Got it A lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)
        B.release()
        print('%s Released B lock' % self.name)

for i in range(10):
    obj = MyThread()
    obj.start()
 

summary

  • GIL lock exists only in the Python interpreter (GIL is related to the interpreter)
  • In essence, GIL is also a mutex lock (concurrent to serial, sacrificing efficiency to ensure security)
  • GIL exists because there are only single core computers during the creation of Cpython interpreter. Memory management in Cpython is not a thread safe garbage collection mechanism.
  • At the same time, multiple threads can be opened in a process, but only one thread can execute
  • In python, multiple threads in the same process cannot be parallel (can be concurrent)
  • Do not use the lock easily, which is easy to cause deadlock

Is python multithreading useless

# Whether it is useful depends on the situation (type of program)
# IO intensive
	eg:Four tasks, each taking 10 minutes s
    	Opening multiple processes does not have much advantage	42s+
        	encounter IO You need to switch and set up the process. You also need to apply for memory space and copy code
        Multithreading has advantages
			No need to consume additional resources 2 s+
# Compute intensive
	eg:Four tasks	 Each task takes 10 minutes s
        Computing intensive tasks are characterized by a large number of calculations
        Multi process can take advantage of multi-core	5s+
      	Setting up multithreading cannot take advantage of multi-core 23 s+
"""
Multi process and multi thread
 Can handle compute intensive and IO Intensive
"""
"""IO Intensive"""
# from multiprocessing import Process
# from threading import Thread
# import threading
# import os,time
# def work():
#     time.sleep(2)
#
#
# if __name__ == '__main__':
#     l=[]
#     print(os.cpu_count()) #This machine is 6-core
#     start=time.time()
#     for i in range(400):
#         # p=Process(target=work) #It takes more than 42.54s, most of which is spent on the creation process
#         p=Thread(target=work) #It takes more than 2.08s
#         l.append(p)
#         p.start()
#     for p in l:
#         p.join()
#     stop=time.time()
#     print('run time is %s' %(stop-start))


"""Compute intensive"""
from multiprocessing import Process
from threading import Thread
import os,time
def work():
    res=0
    for i in range(100000000):
        res*=i
if __name__ == '__main__':
    l=[]
    print(os.cpu_count())  # This machine is 6-core
    start=time.time()
    for i in range(6):
        # p=Process(target=work) #It takes more than 5.35s
        p=Thread(target=work) #It takes more than 23.37s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop=time.time()
    print('run time is %s' %(stop-start))

Conclusion: whether multithreading is useless depends on the type of program. It can be used in combination with process and thread to achieve the fastest efficiency

IO Intensive: e.g.: socket,Reptiles, web  With multithreading
 Computing intensive: e.g. financial analysis(Requiring a lot of calculation)  With multiple processes

Added by whistler on Thu, 20 Jan 2022 00:19:39 +0200