Ten minutes to take you in-depth understanding of multithreading, multithreading lock optimization

1, Some suggestions for improving lock performance

Lock competition will inevitably lead to the decline of the overall performance of the program. In order to minimize this side effect, here are some suggestions on using locks, hoping to help you write programs with higher performance.

1. Reduce lock holding time

For applications that use locks for concurrency control, in the process of lock competition, the holding time of a single thread to the lock is directly related to the system performance. If the thread holds the lock for a longer time, the lock competition will be more intense. Imagine that if 100 people are required to fill in their own identity information, but only one pen is given to them, if everyone takes a long time to hold the pen, the overall time will be very long. If there is only one pen to share with 100 people, it's best for everyone to spend as little time as possible holding the pen. Be sure to think about it before taking the pen to write. Don't take the pen to think about how to fill in this form. Program development is similar. We should reduce the occupation time of a lock as much as possible to reduce the possibility of mutual exclusion between threads. Take the following code snippet as an example:

public synchronized void syncMethod(){
othercode1();
mutextMethod();
othercode2();
}

In the syncMethod() method, it is assumed that only mutextMethod() method needs synchronization, while othercode1() method and othercode2() method do not need synchronization control. If othercode1() and othercode2() are heavyweight methods respectively, it will take a long CPU time. If you use this scheme to synchronize the whole method when the amount of concurrency is large, it will lead to a large increase in waiting threads. Because a thread obtains an internal lock when entering the method, and the lock will not be released until all tasks are executed.
A more optimized solution is to synchronize only when necessary, which can significantly reduce the time for threads to hold locks and improve the throughput of the system.

public void syncMethod2 (){
othercode1();
synchronized (this){
mutextMethod ();
}
othercode2();
}

In the improved code, only the mutextMethod() method is synchronized, and the lock takes a relatively short time, so it can have a higher degree of parallelism. This technical means can also be easily found in the source code package of JDK, such as the Pattern class dealing with regular expressions.

public Matcher matcher(CharSequence input) {
if (!compiled){
synchronized (this){
if (!compiled)
compile();
}
}
Matcher m= new Matcher (this, input);
return m;
}

The matcher() method conditionally applies for a lock. Only when the expression is not compiled, the local lock is applied. This processing method greatly improves the execution efficiency and reliability of the matcher (method).
Note: reducing the holding time of locks helps to reduce the possibility of lock conflict and improve the concurrency of the system.

2. Reduce lock granularity

Reducing lock granularity is also an effective means to weaken multi-threaded lock competition. The typical use scenario of this technology is the implementation of the ConcurrentHashMap class.
For HashMap, the two most important methods are get() and put(). One of the most natural ideas is to lock the entire HashMap to get a thread safe object, but in this way, the locking granularity is too large. For the ConcurrentHashMap class, it further subdivides several small hashmaps called segments. By default, a ConcurrentHashMap class can be subdivided into 16 segments.
If you need to add a new table item to the ConcurrentHashMap class, instead of locking the entire HashMap, you first get the segment in which the table item should be stored according to the hashcode, then lock the segment, and complete the put() method operation. In a multithreaded environment, if multiple threads operate the put() method at the same time, the real parallelism can be achieved between threads as long as the added table items are not stored in the same segment.
Since there are 16 segments by default, if you are lucky, the ConcurrentHashMap class can accept 16 threads inserted at the same time (if they are inserted into different segments), which greatly improves its throughput. The following code shows the operation of the put() method. The codes in lines 5 to 6 obtain the sequence number of the corresponding segment according to the key. Then get the segment on line 9, and insert the data into the given segment.

public v put(K key,v value) {
Segment<K,V> S;
if (value -= null)
throw new NullPointerException();
int hash = hash (key);
int j =(hash >>> segmentShift) & segmentMask;
if((s= (Segment<K, V>)UNSAFE.getObject
(segments,(j<<SSHIFT)+SBASE)) -= null)s =ensureSegment(j);
return s.put (key, hash,value, false);
}

However, reducing the lock granularity will bring a new problem, that is, when the system needs to obtain the global lock, it will consume more resources. Still take the ConcurrentHashMap class as an example. Although its put() method separates locks well, when trying to access the global information of the ConcurrentHashMap class, you need to obtain the locks of all segments at the same time to implement it smoothly. For example, the size() method of the ConcurrentHashMap class will return the number of valid table items of the ConcurrentHashMap class, that is, the sum of all valid table items of the ConcurrentHashMap class. To obtain this information, you need to obtain the locks of all sub segments. Therefore, part of the code of the size() method is as follows:

sum= 0;
for (int i = 0; i<segments.length; ++i)
//Lock all segments
segments[i].lock();
for (int i =0; i< segments.length; ++i)
//Total statistics
sum +=segments[i].count;
for (int i =0; i<segments.length; ++i)
//Release all locks
segments[i].unlock();

You can see that when calculating the total number, you need to obtain the locks of all segments before summing. However, the size() method of the ConcurrentHashMap class does not always execute in this way. In fact, the size() method will sum in a lock free way first. If it fails, it will try this locking method. However, in high concurrency situations, the performance of the size() method of the ConcurrentHashMap class is still worse than that of the synchronized HashMap.
Therefore, only when the method similar to the size() method to obtain global information is not called frequently, this method of reducing lock granularity can improve the throughput of the system in a real sense.
Note: reducing lock granularity means reducing the scope of locked objects, so as to reduce the possibility of lock conflict and improve the concurrency of the system.

3. Replace exclusive locks with read-write separate locks

As we mentioned before, using the read-write separation lock ReadWriteLock can improve the performance of the system. Using read-write separate locks instead of exclusive locks is a special case of reducing lock granularity. If reducing the lock granularity is achieved by dividing the data structure, then the read-write separation lock is the division of the system function points.
In the situation of more reading and less writing, the read-write lock is very good for the system performance. Because if the system only uses exclusive locks when reading and writing data, the real concurrency cannot be achieved between read and write operations, between read operations and read operations, and between write operations and write operations, and they need to wait for each other. The read operation itself will not affect the integrity and consistency of data. Therefore, in theory, in most cases, multiple threads can be allowed to read at the same time, and the read-write lock realizes this function. Since we have introduced read-write locks in Chapter 3, we will not repeat them here.
Note: using read-write lock in the situation of more reading and less writing can effectively improve the concurrency of the system.

4. Lock separation

If the idea of read-write lock is further extended, it is lock separation. The read-write lock effectively separates the lock according to the different functions of the read-write operation. According to the functional characteristics of the application, the exclusive lock can also be separated by using a similar separation idea. A typical case is Java util. concurrent. Implementation of linkedblockingqueue.
In the implementation of LinkedBlockingQueue, the take() function and put() function realize the functions of obtaining data from the queue and adding data to the queue respectively. Although both functions modify the current queue, because LinkedBlockingQueue is based on linked list, the two operations act on the front end and tail end of the queue respectively. Theoretically, the two do not conflict.
If an exclusive lock is used, it is required to obtain the exclusive lock of the current queue when the two operations are in progress, so the take() method and put() method cannot be truly concurrent. At run time, they will wait for each other to release the lock resources. In this case, the lock competition will be relatively fierce, which will affect the performance of the program at high concurrency.
Therefore, in the implementation of JDK, this method is not adopted. Instead, two different locks are used to separate the operation of take() method and put () method.

/** Lock held by take, poll, etc */
private final ReentrantLock takeLock = new ReentrantLock();//take() method needs to hold takeLock/**ait queue for waiting takes */
private final Condition notEmpty - takeLock.newCondition ();/** Lock held by put,offer,etc*/
private final ReentrantLock putLock = new ReentrantLock();//The put() method needs to hold putlock/** wait queue for waiting puts */
private final Condition notFull= putLock.newCondition();

The above code snippet defines takeLock and putLock, which are used in the take() method and put() method, respectively. Therefore, the take() method and put() method are independent of each other, and there is no lock competition between them. They only need to compete for takeLock and putLock between take() method and take() method, put() method and put() method respectively. Thus, the possibility of lock competition is weakened.
The implementation of take() method is as follows. The author gives detailed comments in the code, so it will not be further explained in the body.

public E take()throws InterruptedException{
Ex;
int c - -1;
final AtomicInteger count = this.count;
final ReentrantLock takeLock = this.takeLock;takeLock. lockInterruptibly();
//Two threads cannot fetch data at the same time
try {
try {
while (count.get( =0)
//If no data is currently available, wait
notEmpty.await ();
//Wait for notification of the put() method operation
] catch (InterruptedException ie){
notEmpty.signal ();
//Notify other non interrupted threads
throw ie;
x=extract();
//Get the first data
C= count.getAndDecrement (O;
//If the number is reduced by 1, the atomic operation will be performed because the count will be accessed simultaneously with the put() / / function. Note: variable c is the value before / / count minus 1
if(c >1)
notEmpty.signal ();
//Notify other take() method operations
} finally {
takeLock.unlock();
//Release lock
if(c -= capacity)
signalNotFul1(0);
//Notify the put() method operation that there is free space
return x;

The implementation of the function put() is as follows.

public void put(Ee) throws InterruptedException{
if (e -= null)throw new NullPointerException();int c= -1;
final ReentrantLock putLock = this.putLock;final AtomicInteger count = this.count;putLock.lockInterruptiblyO;
//You cannot have two threads doing the put() method at the same time
try {
try {
while (count.get( -=capacity)
//If the queue is full
notFull.await(;
//wait for
}catch (InterruptedException ie) {
notFull.signal();
//Notify non interrupted threads
throw ie;
insert(e);
//insert data
C=count.getAndIncrement ();
//Total number of updates. Variable c is the value before count plus 1
if (c+1< capacity)
notFull.signal ();
//There is enough space to notify other threads
}finally{
putLock.unlock();
//Release lock
if (c ==0)
signalNotEmpty();//After successful insertion, notify the take () method to fetch data
}

Through takeLock and putLock, LinkedBlockingQueue realizes the separation of fetching data and writing data, making them truly concurrent operations.

4. Lock coarsening

Generally, in order to ensure the effective concurrency between multiple threads, each thread is required to hold the lock as short as possible, that is, the lock should be released immediately after using the public resources. Only in this way can other threads waiting on this lock get resources to execute tasks as soon as possible. However, everything has a degree. If the same lock is continuously requested, synchronized and released, it will consume valuable resources of the system, which is not conducive to the optimization of performance.
Therefore, when the virtual machine encounters a series of operations that continuously request and release the same lock, it will integrate all lock operations into one request of the lock, so as to reduce the number of synchronization of lock requests. This operation is called lock coarsening.

public void demoMethod () {
synchronized(lock){
//do sth.
)
//Do other unnecessary synchronization work, but can quickly complete synchronized(lock){
//do sth.

The above code segment will be integrated into the following form;

public void demoMethod({
//Integrated into a lock request synchronized (lock){
//do sth.
//Do other unnecessary synchronization work, but it can be completed quickly
)

In the development process, we should also consciously roughen the lock on reasonable occasions, especially when the lock is requested in the loop. The following is an example of requesting a lock in a loop. In this case, it means that each loop has the operation of applying for and releasing the lock. But in this case, it is obviously unnecessary.

for(int i=0;i<CIRCLE;i++){
synchronized (lock){
}}

Therefore, a more reasonable approach should be to request the lock only once in the outer layer:

synchronized (lock){
for(int i=0;i<CIRCLE;i++){
)
)

Note: performance optimization is a trade-off process for each resource point according to the real situation of the runtime. The idea of lock coarsening is opposite to reducing lock holding time, but their effects are different in different occasions, so they should be weighed according to the actual situation.

From JAVA high concurrency programming, recommended

Keywords: Java Back-end

Added by MiniMonty on Mon, 14 Feb 2022 14:27:52 +0200

Programming VIP