It is said that HashMap is thread unsafe. Where is it reflected?

Foreword: we all know that HashMap is thread unsafe and is not recommended to be used in multi-threaded environment, but where is its thread unsafe? This paper will decrypt this problem.

1.jdk1. HashMap in 7

In jdk1 Many optimizations have been made to HashMap in 8. Here, we first analyze it in jdk1 7, I believe you all know in jdk1 7. HashMap is prone to dead loops in multithreaded environment. Here we first use code to simulate the occurrence of dead loops:

public class HashMapTest {

    public static void main(String[] args) {
        HashMapThread thread0 = new HashMapThread();
        HashMapThread thread1 = new HashMapThread();
        HashMapThread thread2 = new HashMapThread();
        HashMapThread thread3 = new HashMapThread();
        HashMapThread thread4 = new HashMapThread();
        thread0.start();
        thread1.start();
        thread2.start();
        thread3.start();
        thread4.start();
    }
}

class HashMapThread extends Thread {
    private static AtomicInteger ai = new AtomicInteger();
    private static Map<Integer, Integer> map = new HashMap<>();

    @Override
    public void run() {
        while (ai.get() < 1000000) {
            map.put(ai.get(), ai.get());
            ai.incrementAndGet();
        }
    }
}

The above code is relatively simple, that is, open multiple threads to continuously carry out put operation, and HashMap and AtomicInteger are shared globally. After running the code several times, the following dead cycle occurs:

There are several times when the array is out of bounds: Here we focus on the analysis of why there is an endless loop, and check the endless loop through the naming of jps and jstack. The results are as follows: you can see the location of the endless loop from the stack information. Through this information, you can clearly know that the endless loop occurs in the capacity expansion function of HashMap, and the root is in the transfer function, jdk1 The transfer function of HashMap in 7 is as follows:

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

Summarize the main functions of this function:

After expanding the table to newTable, you need to transfer the original data to newTable. Pay attention to 10-12 lines of code. Here you can see that in the process of transferring elements, the header insertion method is used, that is, the order of the linked list will be reversed. This is also the key point to form an endless loop. The following is a detailed analysis.

1.1 analysis process of dead cycle caused by capacity expansion

prerequisite:

Here assume

  1. hash algorithm is a simple key mod linked list size.

  2. At first, if the hash table size=2 and key=3,7,5, they are all in table[1].

  3. Then resize to make size 4.

The data structure before resizing is as follows:

If you are in a single threaded environment, the final results are as follows: the transfer process here will not be described in detail. As long as you understand what the transfer function is doing, its transfer process and how to reverse the linked list should not be difficult.

Then, in A multithreaded environment, suppose that two threads A and B are put ting. Thread A hangs when it executes the code in line 11 of the transfer function. Because the function plays A very important role in the analysis here, it is posted again.

At this time, the running results in thread A are as follows: after thread A is suspended, thread B executes normally and completes the resize operation. The results are as follows: since thread B has finished executing, according to the Java memory model, the entries in newTable and table are the latest values in main memory: 7 next=3,3.next=null.

At this time, switch to thread A. when thread a is suspended, the memory value is as follows: e=3, next=7, newTable[3]=null. The code execution process is as follows:

newTable[3]=e ----> newTable[3]=3
e=next ----> e=7

The results are as follows:

Continue cycle:

e=7
next=e.next ----> next=3[[value from main memory]
e.next=newTable[3] ----> e.next=3[[value from main memory]
newTable[3]=e ----> newTable[3]=7
e=next ----> e=3

The results are as follows:

Cycle again:

e=3
next=e.next ----> next=null
e.next=newTable[3] ----> e.next=7 Namely: 3.next=7
newTable[3]=e ----> newTable[3]=3
e=next ----> e=null

Note that this cycle: e.next=7, while in the last cycle, 7 Next = 3, a circular linked list appears, and the e=null loop ends.

The results are as follows:

In subsequent operations, as long as polling the hashmap data structure is involved, a life and death cycle will occur here, resulting in tragedy.

1.2 analysis process of data loss caused by capacity expansion

Following the above analysis process, initially:Thread A and thread B perform put operation. Similarly, thread A hangs:

At this time, the running results of thread A are as follows: at this time, thread B has obtained the CPU time slice and completed the resize operation: also note that since the execution of thread B is completed, both newTable and table are the latest values: 5 next=null.

At this time, switch to thread A. when thread a is suspended: e=7, next=5, newTable[3]=null.

When newtable[i]=e is executed, 7 is placed in the position of table[3], and next=5. Then proceed to the next cycle:

e=5
next=e.next ----> next=null,Fetch from main memory
e.next=newTable[1] ----> e.next=5,Fetch from main memory
newTable[1]=e ----> newTable[1]=5
e=next ----> e=null

Place 5 in the position of table[1]. At this time, the e=null cycle ends, 3 elements are lost, and a circular linked list is formed. And cause dead loop in subsequent operation of hashmap.

2.jdk1. HashMap in 8

In jdk1 HashMap is optimized in 8. In case of hash collision, the head insertion method is no longer used, but the tail of the linked list is directly inserted. Therefore, there will be no ring linked list, but it is still unsafe in the case of multithreading. Here we look at jdk1 Source code of put operation of HashMap in 8:

This is jdk1 The main function of put operation in HashMap in 8. Pay attention to the code in line 6. If there is no hash collision, the element will be inserted directly. If thread A and thread B perform put operation at the same time, the hash values of these two different data are the same, and the location data is null, so both threads A and B will enter line 6 code.

Suppose that thread A is suspended before data insertion, while thread B executes normally to insert data. Then thread A obtains the CPU time slice. At this time, thread A does not need to hash judgment. The problem occurs: thread A will overwrite the data inserted by thread B, resulting in thread insecurity.

Here is just a brief analysis of jdk1 The thread insecurity problem in HashMap in 8 will be summarized in the follow-up, and specific analysis will be carried out at that time.

summary

Firstly, HashMap is thread unsafe, which is mainly reflected in:

  1. In jdk1 In 7, in a multithreaded environment, capacity expansion will cause ring chain or data loss.

  2. In jdk1 In 8, data coverage occurs in a multithreaded environment.

Keywords: Java Interview security

Added by kovudalion on Mon, 28 Feb 2022 02:41:31 +0200