Multithreading and data sharing in python

In the past, when writing multithreading and multiprocessing, because they usually complete their own tasks, each subthread or subprocess does not have much contact before. If I need to communicate, I will use queue or database to complete it. But recently, when I write some codes of multithreading and multiprocessing, I found that if they need to use shared variables There are some points to pay attention to

Shared data between multiple threads

Standard data type sharing between programs

Look at the following code

#coding:utf-8
import threading

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))


if __name__ == '__main__':
    d = 5
    name = "Yang Yan Xing"
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d))
        th.start()

Here I create a global int variable d, whose value is 5. When I call the test function in the 5 threads, I pass d as a parameter. Do these 5 threads have the same d? I print their IDs through id(data) in the test function and get the following results

in thread <Thread(Thread-1, started 6624)> name is Yang Yan Xing
data is 5 id(data) is 1763791776
in thread <Thread(Thread-2, started 8108)> name is Yang Yan Xing
data is 5 id(data) is 1763791776
in thread <Thread(Thread-3, started 3356)> name is Yang Yan Xing
data is 5 id(data) is 1763791776
in thread <Thread(Thread-4, started 13728)> name is Yang Yan Xing
data is 5 id(data) is 1763791776
in thread <Thread(Thread-5, started 3712)> name is Yang Yan Xing
data is 5 id(data) is 1763791776

From the result, we can see that in the five sub threads, the id of data is 1763791776, indicating that the variable d is created in the main thread, which can be shared in the sub thread, and the change of shared elements in the sub thread will affect other threads, so if you want to modify the shared variable, that is, the thread is unsafe, you need to lock it.

Custom type object sharing between threads

What if we want to customize a class and pass an object as a variable in a sub thread? What would be the effect?

#coding:utf-8
import threading

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data.get(),id(data)))


if __name__ == '__main__':
    d = Data(10)
    name = "Yang Yan Xing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d))
        th.start()

Here I define a simple class. In the main thread, I initialize an object d of this type, and then pass it as a parameter to the sub thread. The main thread and the sub thread print the id of this object respectively. Let's see the result

in main thread id(data) is 2849240813864
in thread <Thread(Thread-1, started 11648)> name is Yang Yan Xing
data is 10 id(data) is 2849240813864
in thread <Thread(Thread-2, started 11016)> name is Yang Yan Xing
data is 10 id(data) is 2849240813864
in thread <Thread(Thread-3, started 10416)> name is Yang Yan Xing
data is 10 id(data) is 2849240813864
in thread <Thread(Thread-4, started 8668)> name is Yang Yan Xing
data is 10 id(data) is 2849240813864
in thread <Thread(Thread-5, started 4420)> name is Yang Yan Xing
data is 10 id(data) is 2849240813864

We see that in the main thread and the sub thread, the id of this object is the same, indicating that they use the same object.

Whether it's standard data type or complex custom data type, they share the same among multiple threads, but in multi process?

Shared data between multiple processes

Standard data types are shared between processes

As for the above code, let's first look at the sharing among subprocesses of variables of type int

#coding:utf-8
import threading
import multiprocessing

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))


if __name__ == '__main__':
    d = 10
    name = "Yang Yan Xing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is

in main thread id(data) is 1763791936
in thread <_MainThread(MainThread, started 9364)> name is Yang Yan Xing
data is 10 id(data) is 1763791936
in thread <_MainThread(MainThread, started 9464)> name is Yang Yan Xing
data is 10 id(data) is 1763791936
in thread <_MainThread(MainThread, started 3964)> name is Yang Yan Xing
data is 10 id(data) is 1763791936
in thread <_MainThread(MainThread, started 10480)> name is Yang Yan Xing
data is 10 id(data) is 1763791936
in thread <_MainThread(MainThread, started 13608)> name is Yang Yan Xing
data is 10 id(data) is 1763791936

We can see that their IDs are the same, indicating that they use the same variable, but when I try to change d from int to string, I find that they are different again

if __name__ == '__main__':
    d = 'yangyanxing'
    name = "Yang Yan Xing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is

in main thread id(data) is 2629633397040
in thread <_MainThread(MainThread, started 9848)> name is Yang Yan Xing
data is yangyanxing id(data) is 1390942032880
in thread <_MainThread(MainThread, started 988)> name is Yang Yan Xing
data is yangyanxing id(data) is 2198251377648
in thread <_MainThread(MainThread, started 3728)> name is Yang Yan Xing
data is yangyanxing id(data) is 2708672287728
in thread <_MainThread(MainThread, started 5288)> name is Yang Yan Xing
data is yangyanxing id(data) is 2376058999792
in thread <_MainThread(MainThread, started 12508)> name is Yang Yan Xing
data is yangyanxing id(data) is 2261044040688

So I tried list, Tuple and dict again, and the results were different. I went back and tried to use list tuples and dictionaries in multithreading, and their IDs were the same in multithreading.

There is an interesting problem here. If it's int type, when the value is less than or equal to 256, their IDs among multiple processes are the same. If it's greater than 256, their IDs will be different. I d id n't see the reason.

Sharing custom type objects between processes

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data.get(),id(data)))


if __name__ == '__main__':
    d = Data(10)
    name = "Yang Yan Xing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is

in main thread id(data) is 1927286591728
in thread <_MainThread(MainThread, started 2408)> name is Yang Yan Xing
data is 10 id(data) is 1561177927752
in thread <_MainThread(MainThread, started 5728)> name is Yang Yan Xing
data is 10 id(data) is 2235260514376
in thread <_MainThread(MainThread, started 1476)> name is Yang Yan Xing
data is 10 id(data) is 2350586073040
in thread <_MainThread(MainThread, started 996)> name is Yang Yan Xing
data is 10 id(data) is 2125002248088
in thread <_MainThread(MainThread, started 10740)> name is Yang Yan Xing
data is 10 id(data) is 1512231669656

You can see that their IDs are different, that is, different objects.

How to share data among multiple processes

We can see that data is not shared among multiple processes (except for int types less than 256), but what should we do when we want to share a data object between the main process and the sub process?

Before we look at this problem, let's modify the previous multithreaded code

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()


if __name__ == '__main__':
    d = Data(0)
    thlist = []
    name = "yang"
    lock = threading.Lock()
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

The purpose of our code is to use a custom data type object. After five sub threads operate, each sub thread adds 1 to its data value, and finally prints the data value of the object in the main thread.
The output is as follows

in thread <Thread(Thread-1, started 3296)> name is yang
data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
in thread <Thread(Thread-2, started 9436)> name is yang
data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
in thread <Thread(Thread-3, started 760)> name is yang
data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
in thread <Thread(Thread-4, started 1952)> name is yang
data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
in thread <Thread(Thread-5, started 5988)> name is yang
data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
5

You can see that 5 is printed out at the end of the main thread, which is in line with our expectation, but what if it is put into multiple processes? Because the objects held by each subprocess are different in multiprocesses, each subprocess operates its own Data object, which should have no impact on the Data object of the main process. Let's take a look at its results

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()


if __name__ == '__main__':
    d = Data(0)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

Its output is:

in thread <_MainThread(MainThread, started 7604)> name is yang
data is <__mp_main__.Data object at 0x000001D110130EB8> id(data) is 1997429477048
in thread <_MainThread(MainThread, started 12108)> name is yang
data is <__mp_main__.Data object at 0x000002C4E88E0E80> id(data) is 3044738469504
in thread <_MainThread(MainThread, started 3848)> name is yang
data is <__mp_main__.Data object at 0x0000027827270EF0> id(data) is 2715076202224
in thread <_MainThread(MainThread, started 12368)> name is yang
data is <__mp_main__.Data object at 0x000002420EA80E80> id(data) is 2482736991872
in thread <_MainThread(MainThread, started 4152)> name is yang
data is <__mp_main__.Data object at 0x000001B1577F0E80> id(data) is 1861188783744
0

The final output is 0, which shows that the operation of Data objects passed in by the subprocess does not work for the objects of the main process. What kind of operation do we need to implement to enable the subprocess to operate the objects of the main process? We can use BaseManager under multiprocessing.managers to implement

#coding:utf-8
import threading
import multiprocessing
from multiprocessing.managers import BaseManager

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data
        
BaseManager.register("mydata",Data)

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()



def getManager():
    m = BaseManager()
    m.start()
    return m


if __name__ == '__main__':
    manager = getManager()
    d = manager.mydata(0)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

After using from multiprocessing.managers import BaseManager to introduce BaseManager, after defining the Data type, use BaseManager.register("mydata",Data) to register the Data type in BaseManager, and give it a name of mydata. Then you can use the name of BaseManager object to initialize the object. Let's take a look at the output

C:\Python35\python.exe F:/python/python3Test/multask.py
in thread <_MainThread(MainThread, started 12244)> name is yang
data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2222932504080
in thread <_MainThread(MainThread, started 2860)> name is yang
data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 1897574510096
in thread <_MainThread(MainThread, started 2748)> name is yang
data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2053415775760
in thread <_MainThread(MainThread, started 7812)> name is yang
data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2766155820560
in thread <_MainThread(MainThread, started 2384)> name is yang
data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2501159890448
5

We see that although different objects are used in each subprocess, their values can be "shared".

Standard data types can also be used through the Value object in the multiprocessing library, for example

#coding:utf-8
import threading
import multiprocessing
from multiprocessing.managers import BaseManager

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

BaseManager.register("mydata",Data)

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.value +=1
    lock.release()


if __name__ == '__main__':
    d = multiprocessing.Value("l",10) #
    print(d)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.value)

In this case, d = multiprocessing.Value("l",10) is used to initialize a number type object. This type is synchronized wrapper for c_long. When multiprocessing.value is initialized, the first parameter is type and the second parameter is value. The specific supported types are as follows

You can also use the ctypes library and class to initialize strings

>>> from ctypes import c_char_p
>>> s = multiprocessing.Value(c_char_p, b'\xd1\xee\xd1\xe5\xd0\xc7')
>>> print(s.value.decode('gbk'))
Yang Yan Xing

You can also use the Manager object to initialize the list,dict, etc

#coding:utf-8
import multiprocessing


def func(mydict, mylist):
    # The child process changes dict, and the main process changes with it
    mydict["index1"] = "aaaaaa" 
    # Sub process changes List, main process changes with it 
    mydict["index2"] = "bbbbbb"
    mylist.append(11)  
    mylist.append(22)
    mylist.append(33)


if __name__ == "__main__":
    # The main process and the child process share this dictionary
    mydict = multiprocessing.Manager().dict()
    # The main process and the child process share this List
    mylist = multiprocessing.Manager().list(range(5))  

    p = multiprocessing.Process(target=func, args=(mydict, mylist))
    p.start()
    p.join()

    print(mylist)
    print(mydict)

In fact, the sharing we are talking about here is only the sharing of data values. Because in multiple processes, the objects held by each process are different, so if you want to synchronize the state, you need to save the country by a curve. However, this kind of small project can be used simply. If you want to do some larger projects, you are advised not to use this way of sharing data. This greatly increases the coupling between programs, and the use logic becomes complex and hard to understand. Therefore, it is recommended to use queues or databases as the communication channels.

Reference articles
Sharing data between Python processes (global variables)

Python multiprocess programming - sharing data between processes

Keywords: Python less Database Programming

Added by Rokboy on Fri, 20 Mar 2020 17:02:01 +0200