In the past, when writing multithreading and multiprocessing, because they usually complete their own tasks, each subthread or subprocess does not have much contact before. If I need to communicate, I will use queue or database to complete it. But recently, when I write some codes of multithreading and multiprocessing, I found that if they need to use shared variables There are some points to pay attention to
Shared data between multiple threads
Standard data type sharing between programs
Look at the following code
#coding:utf-8 import threading def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) if __name__ == '__main__': d = 5 name = "Yang Yan Xing" for i in range(5): th = threading.Thread(target=test,args=(name,d)) th.start()
Here I create a global int variable d, whose value is 5. When I call the test function in the 5 threads, I pass d as a parameter. Do these 5 threads have the same d? I print their IDs through id(data) in the test function and get the following results
in thread <Thread(Thread-1, started 6624)> name is Yang Yan Xing data is 5 id(data) is 1763791776 in thread <Thread(Thread-2, started 8108)> name is Yang Yan Xing data is 5 id(data) is 1763791776 in thread <Thread(Thread-3, started 3356)> name is Yang Yan Xing data is 5 id(data) is 1763791776 in thread <Thread(Thread-4, started 13728)> name is Yang Yan Xing data is 5 id(data) is 1763791776 in thread <Thread(Thread-5, started 3712)> name is Yang Yan Xing data is 5 id(data) is 1763791776
From the result, we can see that in the five sub threads, the id of data is 1763791776, indicating that the variable d is created in the main thread, which can be shared in the sub thread, and the change of shared elements in the sub thread will affect other threads, so if you want to modify the shared variable, that is, the thread is unsafe, you need to lock it.
Custom type object sharing between threads
What if we want to customize a class and pass an object as a variable in a sub thread? What would be the effect?
#coding:utf-8 import threading class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data.get(),id(data))) if __name__ == '__main__': d = Data(10) name = "Yang Yan Xing" print("in main thread id(data) is {}".format(id(d))) for i in range(5): th = threading.Thread(target=test,args=(name,d)) th.start()
Here I define a simple class. In the main thread, I initialize an object d of this type, and then pass it as a parameter to the sub thread. The main thread and the sub thread print the id of this object respectively. Let's see the result
in main thread id(data) is 2849240813864 in thread <Thread(Thread-1, started 11648)> name is Yang Yan Xing data is 10 id(data) is 2849240813864 in thread <Thread(Thread-2, started 11016)> name is Yang Yan Xing data is 10 id(data) is 2849240813864 in thread <Thread(Thread-3, started 10416)> name is Yang Yan Xing data is 10 id(data) is 2849240813864 in thread <Thread(Thread-4, started 8668)> name is Yang Yan Xing data is 10 id(data) is 2849240813864 in thread <Thread(Thread-5, started 4420)> name is Yang Yan Xing data is 10 id(data) is 2849240813864
We see that in the main thread and the sub thread, the id of this object is the same, indicating that they use the same object.
Whether it's standard data type or complex custom data type, they share the same among multiple threads, but in multi process?
Shared data between multiple processes
Standard data types are shared between processes
As for the above code, let's first look at the sharing among subprocesses of variables of type int
#coding:utf-8 import threading import multiprocessing def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) if __name__ == '__main__': d = 10 name = "Yang Yan Xing" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
The result is
in main thread id(data) is 1763791936 in thread <_MainThread(MainThread, started 9364)> name is Yang Yan Xing data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 9464)> name is Yang Yan Xing data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 3964)> name is Yang Yan Xing data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 10480)> name is Yang Yan Xing data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 13608)> name is Yang Yan Xing data is 10 id(data) is 1763791936
We can see that their IDs are the same, indicating that they use the same variable, but when I try to change d from int to string, I find that they are different again
if __name__ == '__main__': d = 'yangyanxing' name = "Yang Yan Xing" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
The result is
in main thread id(data) is 2629633397040 in thread <_MainThread(MainThread, started 9848)> name is Yang Yan Xing data is yangyanxing id(data) is 1390942032880 in thread <_MainThread(MainThread, started 988)> name is Yang Yan Xing data is yangyanxing id(data) is 2198251377648 in thread <_MainThread(MainThread, started 3728)> name is Yang Yan Xing data is yangyanxing id(data) is 2708672287728 in thread <_MainThread(MainThread, started 5288)> name is Yang Yan Xing data is yangyanxing id(data) is 2376058999792 in thread <_MainThread(MainThread, started 12508)> name is Yang Yan Xing data is yangyanxing id(data) is 2261044040688
So I tried list, Tuple and dict again, and the results were different. I went back and tried to use list tuples and dictionaries in multithreading, and their IDs were the same in multithreading.
There is an interesting problem here. If it's int type, when the value is less than or equal to 256, their IDs among multiple processes are the same. If it's greater than 256, their IDs will be different. I d id n't see the reason.
Sharing custom type objects between processes
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data.get(),id(data))) if __name__ == '__main__': d = Data(10) name = "Yang Yan Xing" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
The result is
in main thread id(data) is 1927286591728 in thread <_MainThread(MainThread, started 2408)> name is Yang Yan Xing data is 10 id(data) is 1561177927752 in thread <_MainThread(MainThread, started 5728)> name is Yang Yan Xing data is 10 id(data) is 2235260514376 in thread <_MainThread(MainThread, started 1476)> name is Yang Yan Xing data is 10 id(data) is 2350586073040 in thread <_MainThread(MainThread, started 996)> name is Yang Yan Xing data is 10 id(data) is 2125002248088 in thread <_MainThread(MainThread, started 10740)> name is Yang Yan Xing data is 10 id(data) is 1512231669656
You can see that their IDs are different, that is, different objects.
How to share data among multiple processes
We can see that data is not shared among multiple processes (except for int types less than 256), but what should we do when we want to share a data object between the main process and the sub process?
Before we look at this problem, let's modify the previous multithreaded code
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() if __name__ == '__main__': d = Data(0) thlist = [] name = "yang" lock = threading.Lock() for i in range(5): th = threading.Thread(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
The purpose of our code is to use a custom data type object. After five sub threads operate, each sub thread adds 1 to its data value, and finally prints the data value of the object in the main thread.
The output is as follows
in thread <Thread(Thread-1, started 3296)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-2, started 9436)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-3, started 760)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-4, started 1952)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-5, started 5988)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 5
You can see that 5 is printed out at the end of the main thread, which is in line with our expectation, but what if it is put into multiple processes? Because the objects held by each subprocess are different in multiprocesses, each subprocess operates its own Data object, which should have no impact on the Data object of the main process. Let's take a look at its results
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() if __name__ == '__main__': d = Data(0) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
Its output is:
in thread <_MainThread(MainThread, started 7604)> name is yang data is <__mp_main__.Data object at 0x000001D110130EB8> id(data) is 1997429477048 in thread <_MainThread(MainThread, started 12108)> name is yang data is <__mp_main__.Data object at 0x000002C4E88E0E80> id(data) is 3044738469504 in thread <_MainThread(MainThread, started 3848)> name is yang data is <__mp_main__.Data object at 0x0000027827270EF0> id(data) is 2715076202224 in thread <_MainThread(MainThread, started 12368)> name is yang data is <__mp_main__.Data object at 0x000002420EA80E80> id(data) is 2482736991872 in thread <_MainThread(MainThread, started 4152)> name is yang data is <__mp_main__.Data object at 0x000001B1577F0E80> id(data) is 1861188783744 0
The final output is 0, which shows that the operation of Data objects passed in by the subprocess does not work for the objects of the main process. What kind of operation do we need to implement to enable the subprocess to operate the objects of the main process? We can use BaseManager under multiprocessing.managers to implement
#coding:utf-8 import threading import multiprocessing from multiprocessing.managers import BaseManager class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data BaseManager.register("mydata",Data) def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() def getManager(): m = BaseManager() m.start() return m if __name__ == '__main__': manager = getManager() d = manager.mydata(0) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
After using from multiprocessing.managers import BaseManager to introduce BaseManager, after defining the Data type, use BaseManager.register("mydata",Data) to register the Data type in BaseManager, and give it a name of mydata. Then you can use the name of BaseManager object to initialize the object. Let's take a look at the output
C:\Python35\python.exe F:/python/python3Test/multask.py in thread <_MainThread(MainThread, started 12244)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2222932504080 in thread <_MainThread(MainThread, started 2860)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 1897574510096 in thread <_MainThread(MainThread, started 2748)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2053415775760 in thread <_MainThread(MainThread, started 7812)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2766155820560 in thread <_MainThread(MainThread, started 2384)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2501159890448 5
We see that although different objects are used in each subprocess, their values can be "shared".
Standard data types can also be used through the Value object in the multiprocessing library, for example
#coding:utf-8 import threading import multiprocessing from multiprocessing.managers import BaseManager class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data BaseManager.register("mydata",Data) def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.value +=1 lock.release() if __name__ == '__main__': d = multiprocessing.Value("l",10) # print(d) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.value)
In this case, d = multiprocessing.Value("l",10) is used to initialize a number type object. This type is synchronized wrapper for c_long. When multiprocessing.value is initialized, the first parameter is type and the second parameter is value. The specific supported types are as follows
You can also use the ctypes library and class to initialize strings
>>> from ctypes import c_char_p >>> s = multiprocessing.Value(c_char_p, b'\xd1\xee\xd1\xe5\xd0\xc7') >>> print(s.value.decode('gbk')) Yang Yan Xing
You can also use the Manager object to initialize the list,dict, etc
#coding:utf-8 import multiprocessing def func(mydict, mylist): # The child process changes dict, and the main process changes with it mydict["index1"] = "aaaaaa" # Sub process changes List, main process changes with it mydict["index2"] = "bbbbbb" mylist.append(11) mylist.append(22) mylist.append(33) if __name__ == "__main__": # The main process and the child process share this dictionary mydict = multiprocessing.Manager().dict() # The main process and the child process share this List mylist = multiprocessing.Manager().list(range(5)) p = multiprocessing.Process(target=func, args=(mydict, mylist)) p.start() p.join() print(mylist) print(mydict)
In fact, the sharing we are talking about here is only the sharing of data values. Because in multiple processes, the objects held by each process are different, so if you want to synchronize the state, you need to save the country by a curve. However, this kind of small project can be used simply. If you want to do some larger projects, you are advised not to use this way of sharing data. This greatly increases the coupling between programs, and the use logic becomes complex and hard to understand. Therefore, it is recommended to use queues or databases as the communication channels.
Reference articles
Sharing data between Python processes (global variables)
Python multiprocess programming - sharing data between processes