Concurrent reading and writing of vector
As we all know, vector in C + + standard library does not guarantee thread safety. When we read and write vectors concurrently, we often face two risks:
- Content reading exception: for example, two threads are reading, one is writing, or two threads are writing at the same time, which will lead to inconsistency within a single data.
- When the vector is expanded, the memory location changes, resulting in a Segmentation fault error. Because the vector will copy all the contents to the new memory area during capacity expansion, and the original memory area will be released. At this time, if a thread is still reading or writing to the old memory area, there will be a problem.
Take a simple example:
vector<int> vec; void add_vector(int range, unsigned int seed){ srand(seed); for(int i = 0 ; i < range; i++){ vec.push_back(rand()); } } int main(){ vec.reserve(100); thread t1 = thread(add_vector, 1000, 2); thread t2 = thread(add_vector, 1000, 1); t1.join(); t2.join(); }
Both threads are adding elements to the vec. If there is no processing, it is easy to crash because of the second reason. This kind of concurrent write is likely to occur in many business scenarios. For example, in the recommendation system, in order to improve the operation efficiency, each thread produces the recommendation recall according to different strategies, and these threads will merge into the same array after the recall. Then select the best recommendation result from these recalls.
In this paper, three schemes of adding elements to vector concurrency are proposed to ensure that they can be added to vector correctly under the condition of multithreading concurrency. The project is in safe_vector.
Multithread safe vector design - lock design
The simplest design for solving concurrency problems is locking. Here, we use the mutex provided by the standard library to push_ Lock the back critical area.
template<typename T> class lock_vector{ std::mutex mlock; vector<T> mvec; public: T operator[](unsigned int idx){ return mvec[idx]; } lock_vector()=default; lock_vector(lock_vector<T>& vec){ vec.getVector(mvec); }; lock_vector(lock_vector<T>&& vec){ vec.getVector(mvec); }; void push_back(const T& value) noexcept{ mlock.lock(); mvec.push_back(value); mlock.unlock(); } void getVector(vector<T> & res){ res = mvec; } };
Multi thread safe vector design -- lock free design
In addition to using mutex locks, thread synchronization can also be realized through lock free design. One common idea is CAS (compare and swap). The atomic variable of C + + provides compare_exchange_weak and compare_exchange_strong to implement CAS mechanism. The following code is a multi-threaded security scheme based on CAS.
template<typename T> class cas_vector{ std::atomic<bool> flag; vector<T> mvec; void lock(){ bool expect = false; while(!flag.compare_exchange_weak(expect, true)){ expect = false; } } void unlock(){ flag.store(false); } public: T operator[](unsigned int idx){ return mvec[idx]; } cas_vector(){ flag.store(false); }; cas_vector(const cas_vector& vec){ mvec = vec; flag.store(false); }; cas_vector(cas_vector&& vec){ mvec = vec; flag.store(false); }; void replace(const int idx, const T& value) noexcept{ lock(); mvec[idx] = value; unlock(); } void push_back(const T& value) noexcept{ lock(); mvec.push_back(value); unlock(); } };
Multi thread safe vector design -- with the help of thread_local variable
thread_ Introduction to local variable
thread_local is a keyword introduced after C++11. thread_ The local variable is created and destroyed synchronously with its thread, and can only be accessed by the thread that created it, that is, thread_ The local variable is thread safe. Each thread stores thread information in its own TIB(Thread Information Block)_ A copy of the local variable.
Thread based_ Implementation scheme of local variable
The code implementation of this scheme is as follows: vector_thl is the type of multi-threaded add element security. In my implementation, there are two vectors for each class, one is thread_local type, push every time_ Back, elements are added to it. Because there is a copy of this variable in each thread, thread synchronization is not required, but at the same time, in order to obtain the final result, these copies scattered in each thread must be combined. So in vector_thl adds a merge interface to merge the local vectors of these threads.
template<typename T> class vector_thl{ vector<T> mvec; mutex lock; public: thread_local static vector<T> vec; vector_thl()=default; vector_thl(const vector_thl& vec){ mvec = vec; vec = vec; }; vector_thl(vector_thl&& vec){ mvec = vec; vec = vec; }; void push_back(const T& value) noexcept{ vec.push_back(value); } void merge(){ mvec.insert(mvec.end(), vec.begin(), vec.end()); } void getVector(vector<T>& res){ res = mvec; } }; template<typename T> thread_local vector<T> vector_thl<T>::vec(0);
performance comparison
Benchmark the three implementation schemes and get the following results:
Run on (12 X 2994.27 MHz CPU s) CPU Caches: L1 Data 32 KiB (x6) L1 Instruction 32 KiB (x6) L2 Unified 512 KiB (x6) L3 Unified 4096 KiB (x1) Load Average: 0.00, 0.09, 0.22 -------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------- BM_VEC_LOC/10 4574464 ns 1072230 ns 639 BM_VEC_LOC/2 6627843 ns 176688 ns 1000 BM_VEC_CAS/10 9280705 ns 1027921 ns 683 BM_VEC_CAS/2 7537580 ns 180541 ns 1000 BM_VEC_THL/10 1108654 ns 993997 ns 687 BM_VEC_THL/2 693148 ns 123333 ns 5723
Visible with thread_ The local implementation scheme is the most efficient, about four times that of the mutually exclusive scheme and eight times that of the lock free scheme. At the same time, it is also the scheme with the highest CPU utilization efficiency.
summary
In this paper, we implement three kinds of multi-threaded concurrent element safe vector s. Among them, thread is used_ The scheme implemented in local is the most efficient, mainly due to thread_ The local variable has a copy in each thread and will not be written concurrently, avoiding lock competition.
However, this scheme is not perfect due to the thread of each thread_local variables are incomplete, so the elements cannot be read correctly in the element adding stage. They can only be read after each thread adding elements combines the elements.