High paid programmer & interview question Series 50. Do you know how concurrent HashMap is expanded?

I Interview questions and analysis

1. Today's interview question

What is the underlying principle of ConcurrentHashMap?

Do you know how concurrent HashMap is expanded?

.......

2. Topic analysis

In the previous four articles, Yige introduced the general functions and features of ConcurrentHashMap, the underlying data structure of ConcurrentHashMap in JDK 7 and 8, as well as the core attributes in ConcurrentHashMap. The links are as follows:

High paid programmer & interview question series 45 are you familiar with concurrent HashMap?

High paid programmer & interview question series 46: what are the underlying principles and data structures of ConcurrentHashMap in JDK7

High paid programmer & interview question series 47 talk about the underlying principle of ConcurrentHashMap in JDK8. What is the difference between HashMap and ConcurrentHashMap?

High paid programmer & interview question series 48 what does sizeCtl of ConcurrentHashMap in JDK8 mean? What is the maximum capacity and load factor?

High paid programmer & interview questions series 49: talk about the source code and data addition process of ConcurrentHashMap#put method

Starting from this article, brother Yi will focus on analyzing the source code of the ConcurrentHashMap#transfer() method in JDK 8, focusing on the capacity expansion mechanism. This is the high-frequency test site during our interview.

II Implementation logic of transfer() expansion method [key points]

1. Capacity expansion mechanism of concurrenthashmap

During our interview, another common test point in ConcurrentHashMap is about the capacity expansion mechanism of arrays, so next I will focus on how to expand ConcurrentHashMap.

When the capacity of ConcurrentHashMap is insufficient, the table array needs to be expanded. In fact, the basic logic of capacity expansion is very similar to that of HashMap, and the capacity expansion of ConcurrentHashMap is also doubled. After capacity expansion, the array capacity is twice that of the original. However, because it supports concurrent capacity expansion, it is more complex because it supports multi-threaded capacity expansion without locking.

The purpose of this is not only to meet the requirements of concurrency, but also to reduce the time impact of capacity expansion. Because during capacity expansion, it always involves copying from one "array" to another. If this operation can be carried out concurrently, it is very efficient.

2. Implementation steps of capacity expansion mechanism

The implementation steps of concurrent HashMap expansion can be divided into two parts, which are as follows:

Part 1 starts by building a nextTable array. Its capacity is twice that of the original. This operation is completed in a single thread. This operation will use RESIZE_STAMP_SHIFT constant is realized through one operation.

The second part is to copy the data from the original table array to the nextTable array, where multi-threaded operation is allowed.

2.1 constructing nextTable array in single thread

The general implementation process of the array is actually a process of traversal and replication.

In this process, the number of times i to be traversed will be obtained according to the operation, and then the element at position i will be obtained by using the tabAt() method, and the following judgment will be made:

  • If this location is empty, the forwardNode node is placed at the i location in the original table, which is also the key point to trigger concurrent capacity expansion;
  • If this position is a Node node (FH > = 0) and is the head Node of a linked list, construct an inverted linked list and place them at the i and i+n positions of the nextTable respectively;
  • If this position is a TreeBin node (FH < 0), do a reverse order processing, judge whether untreefi is required, and put the processing results at the i and i+n positions of the nextTable respectively.

After traversing all nodes, the data replication is completed. At this time, let nextTable be the new table, and update sizeCtl to 0.75 times the new capacity to complete the capacity expansion.

2.2 node traversal in multithreading

There is a judgment in the source code of the transfer() method. If the traversed node is a forward node, it will continue to traverse backward. Coupled with the locking operation to the node, the multithreading control is completed. When multithreading traverses a node, it processes a node and sets the value of the corresponding node to forward; When another thread sees forward, it will traverse backward. In this way, the cross operation completes the replication work, and also solves the problem of thread safety.

3. Transfer expansion method

The transfer() capacity expansion method is a very efficient and complex method. Before reading the specific transfer source code, let's first understand when the capacity expansion operation of ConcurrentHashMap will be triggered. There are two situations in which the capacity expansion operation may be triggered:

  • After calling the put() method to add an element, the addCount() method will be called to update the size and check whether expansion is needed. When the number of array elements reaches the expansion threshold, the transfer() method will be triggered;
  • Trigger the tryprevize() operation, which will trigger capacity expansion. There are two cases that trigger the tryprevize operation. The first case: when the number of linked list elements on a node reaches the threshold of 8, it is necessary to turn the linked list into a red black tree. Before structure conversion, it will first judge whether the array length n is less than the threshold min_ TREEIFY_ Capability, the default value is 64. If it is less than, it will call the tryprevize method to expand the array length to twice the original length, trigger the transfer() method, and then readjust the position of the node. The second case: the tryprevize operation will be triggered first during the putAll operation.

4. transfer() source code

Next, let's take a look at the source code of the transfer() method and see how the concurrent HashMap achieves capacity expansion.

/**
* A transitional table can only be used during capacity expansion
*/
private transient volatile Node<K,V>[] nextTable;

 /**
     * Moves and/or copies the nodes in each bin to new table. See
     * above for explanation.
     */
    private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
        int n = tab.length, stride;
        if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
            stride = MIN_TRANSFER_STRIDE; // subdivide range
        if (nextTab == null) {            // initiating
            try {
                @SuppressWarnings("unchecked")
                Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];//Construct a nextTable object with twice the original capacity
                nextTab = nt;
            } catch (Throwable ex) {      // try to cope with OOME
                sizeCtl = Integer.MAX_VALUE;
                return;
            }
            nextTable = nextTab;
            transferIndex = n;
        }
        int nextn = nextTab.length;
        ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);//Construct a connected node pointer for flag bit
        boolean advance = true;//If the key attribute of concurrent capacity expansion is equal to true, it indicates that this node has been processed
        boolean finishing = false; // to ensure sweep before committing nextTab
        for (int i = 0, bound = 0;;) {
            Node<K,V> f; int fh;
            //The function of the while loop body is to control i -- through i -- to traverse the nodes in the original hash table in turn
            while (advance) {
                int nextIndex, nextBound;
                if (--i >= bound || finishing)
                    advance = false;
                else if ((nextIndex = transferIndex) <= 0) {
                    i = -1;
                    advance = false;
                }
                else if (U.compareAndSwapInt
                         (this, TRANSFERINDEX, nextIndex,
                          nextBound = (nextIndex > stride ?
                                       nextIndex - stride : 0))) {
                    bound = nextBound;
                    i = nextIndex - 1;
                    advance = false;
                }
            }
            if (i < 0 || i >= n || i + n >= nextn) {
                int sc;
                if (finishing) {
                 //If all nodes have finished copying, assign nextTable to table and empty the temporary object nextTable
                    nextTable = null;
                    table = nextTab;
                    sizeCtl = (n << 1) - (n >>> 1);//The capacity expansion threshold is set to 1.5 times the original capacity, which is still 0.75 times the current capacity
                    return;
                }
                //The CAS method is used to update the capacity expansion threshold, where the sizectl value is reduced by one, indicating that a new thread is added to participate in the capacity expansion operation
                if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                    if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                        return;
                    finishing = advance = true;
                    i = n; // recheck before commit
                }
            }
            //If the traversed node is null, it is put into the ForwardingNode pointer
            else if ((f = tabAt(tab, i)) == null)
                advance = casTabAt(tab, i, null, fwd);
            //If you traverse the ForwardingNode node, it indicates that this point has been processed. Skip it directly. This is the core of controlling concurrent expansion
            else if ((fh = f.hash) == MOVED)
                advance = true; // already processed
            else {
              //Node locking
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        Node<K,V> ln, hn;
                        //If FH > = 0, it proves that this is a Node node
                        if (fh >= 0) {
                            int runBit = fh & n;
                            //The following work is to construct two linked lists. One is the original linked list, and the other is the reverse order of the original linked list
                            Node<K,V> lastRun = f;
                            for (Node<K,V> p = f.next; p != null; p = p.next) {
                                int b = p.hash & n;
                                if (b != runBit) {
                                    runBit = b;
                                    lastRun = p;
                                }
                            }
                            if (runBit == 0) {
                                ln = lastRun;
                                hn = null;
                            }
                            else {
                                hn = lastRun;
                                ln = null;
                            }
                            for (Node<K,V> p = f; p != lastRun; p = p.next) {
                                int ph = p.hash; K pk = p.key; V pv = p.val;
                                if ((ph & n) == 0)
                                    ln = new Node<K,V>(ph, pk, pv, ln);
                                else
                                    hn = new Node<K,V>(ph, pk, pv, hn);
                            }
                            //Insert a linked list at the i position of the nextTable
                            setTabAt(nextTab, i, ln);
                            //Insert another linked list at the i+n position of nextTable
                            setTabAt(nextTab, i + n, hn);
                            //Inserting a forwardNode node at the i position of the table indicates that the node has been processed
                            setTabAt(tab, i, fwd);
                            //Set advance to true and return to the while loop above to perform i-- operations
                            advance = true;
                        }
                        //Processing the TreeBin object is similar to the above procedure
                        else if (f instanceof TreeBin) {
                            TreeBin<K,V> t = (TreeBin<K,V>)f;
                            TreeNode<K,V> lo = null, loTail = null;
                            TreeNode<K,V> hi = null, hiTail = null;
                            int lc = 0, hc = 0;
                            //Construct positive order and reverse order linked lists
                            for (Node<K,V> e = t.first; e != null; e = e.next) {
                                int h = e.hash;
                                TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                                if ((h & n) == 0) {
                                    if ((p.prev = loTail) == null)
                                        lo = p;
                                    else
                                        loTail.next = p;
                                    loTail = p;
                                    ++lc;
                                }
                                else {
                                    if ((p.prev = hiTail) == null)
                                        hi = p;
                                    else
                                        hiTail.next = p;
                                    hiTail = p;
                                    ++hc;
                                }
                            }
                            //If the tree structure is no longer needed after capacity expansion, it is inversely converted to a linked list structure
                            ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                            hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                             //Insert a linked list at the i position of the nextTable    
                            setTabAt(nextTab, i, ln);
                            //Insert another linked list at the i+n position of nextTable
                            setTabAt(nextTab, i + n, hn);
                             //Inserting a forwardNode node at the i position of the table indicates that the node has been processed
                            setTabAt(tab, i, fwd);
                            //Set advance to true and return to the while loop above to perform i-- operations
                            advance = true;
                        }
                    }
                }
            }
        }
    }

The transfer() method is really complex. I believe many people don't have the patience to read it line by line. Even after reading it for a while, they don't understand what it means, so brother Yi will split and interpret the above source code for you.

5. Interpretation of the source code of transfer [ key ]

5.1 set step stripe

In the transfer() method, a step size stripe will be set first, and the minimum value of this stripe is equal to MIN_TRANSFER_STRIDE=16

int n = tab.length, stride;
if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
    stride = MIN_TRANSFER_STRIDE; // subdivide range

5.2 initialize the nextTab array

Then judge whether the nextTab is empty. If the incoming nextTab is empty, it will be initialized. The periphery will ensure that when the first thread initiating migration calls this method, nextTab=null, and when subsequent threads call again, nextTab will no longer be null. Then assign nextTab to the global variable nextTable.

if (nextTab == null) {// initiating
   try {
       @SuppressWarnings("unchecked")
       Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
       nextTab = nt;
    } catch (Throwable ex) {      // try to cope with OOME
         sizeCtl = Integer.MAX_VALUE;
         return;
    }
         nextTable = nextTab;
         transferIndex = n;
}

5.3 create ForwardingNode node

Next, a new ForwardingNode is created. The hash value of this node = MOVED=-1.

int nextn = nextTab.length;
ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
boolean advance = true;
boolean finishing = false; // to ensure sweep before committing nextTab

5.4 execute a dead cycle

Then, the state will be changed in an endless loop. There may be three situations here.

In the first case, - I > = bound or finishing is True:

After subtracting 1 from i, judge whether i is greater than the minimum bound of the boundary. If it is less than bound, it indicates that the last interval task received has been completed and the next interval task needs to be received. Normally, when entering the loop for the first time, i this judgment will fail, and the following nextIndex assignment operation will be performed.

In the second case, if it is less than or equal to 0, it means that there is no interval, then change i to - 1, and the advance status becomes false, so it will no longer advance, indicating that the expansion has ended and the current thread can exit. This - 1 will be judged in the following if block to enter the completion state.

The third case is to get the task interval of the thread. The lower limit is bound, the upper limit is i, bound = nextindex stripe, i=nextIndex.

for (int i = 0, bound = 0;;) {
    Node<K,V> f; int fh;
    while (advance) {
        int nextIndex, nextBound;
        if (--i >= bound || finishing)
            advance = false;
        else if ((nextIndex = transferIndex) <= 0) {
            i = -1;
            advance = false;
        }
        else if (U.compareAndSwapInt
                 (this, TRANSFERINDEX, nextIndex,
                  nextBound = (nextIndex > stride ?
                               nextIndex - stride : 0))) {
            bound = nextBound;
            i = nextIndex - 1;
            advance = false;
        }
}

5.5 judge whether to end the capacity expansion task of the current thread

If i is less than 0 or not within the subscript range of the tab array, get the last interval task according to the above judgment and end the capacity expansion task of the current thread.

if (i < 0 || i >= n || i + n >= nextn) {
    int sc;
    if (finishing) {
      //Assign the new nextTab to the table attribute to complete the migration
        nextTable = null;
        table = nextTab; // Update table
      //Recalculate sizeCtl: n is the length of the original array, so the value obtained by sizeCtl will be 0.75 times the length of the new array
        sizeCtl = (n << 1) - (n >>> 1);
        return;
    }
    if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
        if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
            return;
        finishing = advance = true;
        i = n; // recheck before commit
    }
}

5.6 create a fwd node for occupation

Next, judge that if the value at position i in the old tab is null, CAS will be used to write a fwd node for occupation.

else if ((f = tabAt(tab, i)) == null)
    advance = casTabAt(tab, i, null, fwd);

5.7 lock to add data

Next, this part is the key code for capacity expansion.

else {// Here, it means that this position has an actual value and is not a placeholder. Lock this node. Why is it locked to prevent inserting data into the linked list when putVal
            synchronized (f) {
                // Judge whether the bucket node at i subscript is the same as f
                if (tabAt(tab, i) == f) {
                    Node<K,V> ln, hn;// Low, high bucket
                    // If the hash value of f is greater than 0. The hash of TreeBin is - 2
                    if (fh >= 0) {
                        // Sum the old length (the nth of the first operand is in the nth bit of the second operand. If both are 1, the nth of the result is also 1, otherwise it is 0)
                        // Since the length of the Map is to the power of 2 (numbers such as 00000, 1000), there are only two results taken from the length, one is 0 and the other is 1
                        //  If the result is 0, Doug Lea puts it in the low position and vice versa. The purpose is to re hash the linked list and put it in the corresponding position so that the new take in algorithm can hit him.
                        int runBit = fh & n;
                        Node<K,V> lastRun = f; // The hash value of the tail node is not equal to that of the head node
                        // Traverse this bucket
                        for (Node<K,V> p = f.next; p != null; p = p.next) {
                            // Take the hash value of each node in the bucket
                            int b = p.hash & n;
                            // If the hash value of the node is different from that of the first node
                            if (b != runBit) {
                                runBit = b; // Update runBit to determine whether lastRun should be assigned to ln or hn.
                                lastRun = p; // This lastRun ensures that the values of the following nodes are the same as their own values, so as to avoid unnecessary loops later
                            }
                        }
                        if (runBit == 0) {// If the last updated runBit is 0, set the lower node
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun; // If the last updated runBit is 1, set the high node
                            ln = null;
                        }// Loop again and generate two linked lists with lastRun as the stop condition, so as to avoid unnecessary loops (the same results are taken after lastRun)
                        for (Node<K,V> p = f; p != lastRun; p = p.next) {
                            int ph = p.hash; K pk = p.key; V pv = p.val;
                            // If the result of the sum operation is 0, it is still in the low order
                            if ((ph & n) == 0) // If it is 0, create a low node
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else // 1 creates a high order
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        // In fact, this is similar to hashMap 
                        // Set the lower linked list to the i of the new linked list
                        setTabAt(nextTab, i, ln);
                        // Set the high linked list and add n to the original length
                        setTabAt(nextTab, i + n, hn);
                        // Set the old linked list as a placeholder
                        setTabAt(tab, i, fwd);
                        // Keep pushing back
                        advance = true;

The above code is difficult to understand. We can cooperate with the following figure to help understand it. In the figure, the gray circle represents FH & n = 0, and the green circle represents FH & n= 0 After a cycle, if LastRun=8 and Runbit=0, ln=LastRun(8 nodes) and hn=null. After another cycle, you can get both ln low-level linked list and hn high-level linked list.

5.8 turn linked list into red black tree

The last piece of code is to judge whether the red black tree transfer is required.

else if (f instanceof TreeBin) {
    TreeBin<K,V> t = (TreeBin<K,V>)f;
    TreeNode<K,V> lo = null, loTail = null;
    TreeNode<K,V> hi = null, hiTail = null;
    int lc = 0, hc = 0;
    // ergodic
    for (Node<K,V> e = t.first; e != null; e = e.next) {
        int h = e.hash;
        TreeNode<K,V> p = new TreeNode<K,V>
            (h, e.key, e.val, null, null);
        // The same judgment as the linked list, and the operation = = 0 are placed in the low order
        if ((h & n) == 0) {
            if ((p.prev = loTail) == null)
                lo = p;
            else
                loTail.next = p;
            loTail = p;
            ++lc;
        } // Not 0, put it high
        else {
            if ((p.prev = hiTail) == null)
                hi = p;
            else
                hiTail.next = p;
            hiTail = p;
            ++hc;
        }
    }
    // If the number of nodes in the tree is less than or equal to 6, it will be converted to a linked list. On the contrary, a new tree will be created
    ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
        (hc != 0) ? new TreeBin<K,V>(lo) : t;
    hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
        (lc != 0) ? new TreeBin<K,V>(hi) : t;
    // Low tree
    setTabAt(nextTab, i, ln);
    // High order number
    setTabAt(nextTab, i + n, hn);
    // Set old as placeholder
    setTabAt(tab, i, fwd);
    // Keep pushing back
    advance = true;
}
}
}

6. Transfer summary [ key points ]

This transfer() method is really too complex for us to remember, so brother Yi will give you a brief summary. We just need to remember that transfer() roughly performs the following functions:

  • Step 1: calculate the number of bit buckets that each thread can process each time. According to the length of the Map, calculate the number of bit buckets (table array) to be processed by each thread (CPU). By default, each thread processes 16 bit buckets at a time. If it is less than 16, it is forced to become 16 bit buckets.
  • Step 2: initialize nextTab. If the new nextTab passed in is empty, the nextTab is initialized, which is twice the original table by default.
  • Step 3: introduce ForwardingNode, advance and finishing variables to assist in capacity expansion. ForwardingNode indicates that the node has been processed and does not need to be processed again; Advance indicates whether the thread can move down to the next bit bucket (true: indicates it can move down); Finishing indicates whether to end capacity expansion (true: end capacity expansion, false: not end capacity expansion).
  • Step 4: lock before data migration. In the process of data transfer, a synchronized lock will be added to lock the head node and synchronize the operation to prevent inserting data into the linked list during putVal.
  • Step 5: perform data migration. If the node on the bit bucket is a linked list or red black tree, the node data will be divided into low and high. The calculation rule is to sum the hash value of the node with the length of the table container before capacity expansion. If the result is 0, the data will be placed in the low order of the new table (the i-th position in the current table, or the i-th position in the new table); If the result is not 0, it will be placed in the high position of the new table (the i-th position in the current table, and the position in the new table is I + the length of the current table container).
  • Step 6: judge whether tree is needed. If the red black tree is mounted on the bit bucket, it is not only necessary to separate the low and high nodes, but also to judge whether the low and high nodes are stored in the form of linked list or red black tree in the new table.

III epilogue

This article {focuses on the source code of the concurrent hashmap#transfer() expansion method of JDK 8 and the process of the expansion mechanism. Next, I} will explain the source code of the ConcurrentHashMap#get() method in JDK 8 through another article. Please get ready.

Keywords: Java Interview

Added by huzefahusain on Wed, 05 Jan 2022 06:08:59 +0200