Analysis and interpretation of the most easy to understand HashMap source code

The article has been included in Github.com/niumoo/JavaNotes , and Java programmers need to master the core knowledge. Welcome to Star and advice.
Welcome to my official account , articles are updated weekly.

As one of the most commonly used collection classes, HashMap needs to be understood in a simple way. This article will go deep into the HashMap source code, analyze its storage structure and working mechanism.

1. Storage structure of HashMap

The data storage structure of HashMap is a node < K, V > array. In Java 7, it is an entry < K, V > array, but the structure is the same

public class HashMap<k,v> extends AbstractMap<k,v> implements Map<k,v>, Cloneable, Serializable {
    // array
    transient Node<k,v>[] table;    
	static class Node<k,v> implements Map.Entry<k,v> {
        final int hash;
        final K key;
        V value;
        // Linked list
        Node<k,v> next;
        ....
	}
	.....
}

The storage structure is mainly an array linked list, like the figure below.

2. put() of HashMap

In Java 8, the put method of HashMap is as follows, and I have commented on the important code in detail.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
// Calculate hash value and (& amp;), non (~), or (|), exclusive or (^)
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
   
    Node<k,v>[] tab; Node<k,v> p; int n, i;
    // Initialize resize() if the array is empty
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If Node does not exist in the calculated location, create Node insertion directly
    if ((p = tab[i = (n - 1) &amp; hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        // If Node exists in the calculated location, linked list processing
        Node<k,v> e; K k;
        // If the hash value and k value are exactly the same, directly overwrite
        if (p.hash == hash &amp;&amp;((k = p.key) == key || (key != null &amp;&amp; key.equals(k))))
            e = p;
        // If the index location element already exists and is a red black tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<k,v>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // If the value to be put this time does not exist
            for (int binCount = 0; ; ++binCount) {
                // Tail insertion
                if ((e = p.next) == null) {
                    // Find the node whose next is empty in the node list, and create a new node to insert
                    p.next = newNode(hash, key, value, null);
                    // If the number of nodes in the list exceeds treeify_ Treehold (8), converted to red black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If the same key is found in the node list
                if (e.hash == hash &amp;&amp;
                    ((k = e.key) == key || (key != null &amp;&amp; key.equals(k))))
                    break;
                p = e;
            }
        }
        // If node e has a value, put it into the array table []
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
 
    ++modCount;
    // Current size is larger than critical size, expand capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

For example, if the put key is the letter a, the current HashMap capacity is the initial capacity 16, and the calculated location is 1.

# int hash = key.hashCode()
# hash = hash ^ (hash >>> 16)
# The formula index = (n - 1) & hash / / N is capacity
    
hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         //Result = 0000 0001
# The calculated position is 1

Summarize the HashMap put process.

  1. Calculate the hash value of the key.

    The calculation method is (key = = null)? 0: (H= key.hashCode ()) ^ (h >>> 16);

  2. Check whether the current array is empty. If it is empty, initialization is required. The initialization capacity is 16, and the default load factor is 0.75.

  3. Calculates the coordinates of the key in the array.

    Calculation method: (capacity - 1) & hash

    Because capacity is always the power of 2, the binary value of - 1 is always all 1. It is convenient to perform and operate with hash value.

  4. If the calculated coordinate element is empty, create a node to join, and put ends.

    1. If the current array capacity is larger than the capacity set by load factor, expand the capacity.
  5. If the calculated coordinate element has a value.

    1. If the element value on the coordinate is exactly the same as the value key to be added, overwrite the original value.

    2. If the element on the coordinate is a red black tree, add the value and key to the red black tree.

    3. If the element on the coordinate is different from the element to be added (the tailing method is added).

      1. If the next node is empty, add the value and key to the next node.

      2. If the next node is not empty, loop through the next node.

        If it is found that the key of the next node is the same as the key to be added, the corresponding value is replaced with the new value.

      3. If the next node of the loop looks up more than 8 layers and is not empty, convert this location element to a red black tree.

3. get() of HashMap

In Java 8, the source code of get method is as follows, and I have made comments.

public V get(Object key) {
    Node<k,v> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<k,v> getNode(int hash, Object key) {
    Node<k,v>[] tab; Node<k,v> first, e; int n; K k;
    // Enter this if only if the storage array already exists
    if ((tab = table) != null &amp;&amp; (n = tab.length) > 0 &amp;&amp;
        (first = tab[(n - 1) &amp; hash]) != null) {
        // first is the coordinate element obtained
        if (first.hash == hash &amp;&amp; // always check first node
            ((k = first.key) == key || (key != null &amp;&amp; key.equals(k))))
            // The same key indicates that first is the desired element, and returns
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                // If it is a red black tree, find the result from the red black tree
                return ((TreeNode<k,v>)first).getTreeNode(hash, key);
            do {
                // Loop traversal search
                if (e.hash == hash &amp;&amp;
                    ((k = e.key) == key || (key != null &amp;&amp; key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
 }

get method process summary.

  1. Calculate the hash value of the key.
  2. If the storage array is not empty and the element on the calculated position is not empty. Continue, otherwise, return Null.
  3. If the key value of the obtained element is equal, it means that it is found and the element is returned.
  4. If the key value of the obtained element is not equal, look for the element of the next node.
    1. If the element is a red black tree, look in the red black tree.
    2. It is not a red black tree. Traverse the next node to find it. If it is found, it will return.

4. Hash rules of HashMap

  1. Calculate hash value int hash= key.hashCode ().
  2. The and or up hash value is moved 16 bits to the right without sign. hash = hash ^ (hash >>> 16).
  3. The location calculation formula index = (n - 1) & hash, where n is the capacity.

Some students may have doubts about hash ^ (hash > > > 16). I'm curious about why I want to use hash value exclusive or move up the hash value to the right without sign by 16 bits? Here is an example to demonstrate the reason.

Assume that the hash value is 0001 0100 1100 0010 0110 0001‬0010 0000, and the current capacity is 16.

hash        = 0001 0100 1100 0010 0110 0001‬ 0010 0000  ---
                                                         |And or calculation
hash >>> 16 = 0000 0000 0000 0000 0001 0100 1100 0010  ---
------------------------------------------------------ 
hash result = 0001 0100 1100 0010 0111 0101 1110 0100---
                                                         |& and operations
 Capacity-1 = 0000 0000 0000 0000 0000 0000 0000 1111---
------------------------------------------------------
#Get position = 0000 0000 0000 0000 0000 0000 0000 0100 get position is 4

If you add another data, you will get a hash value of 0100 0000 1110 0010 1010 0010 ‬ 0001 0000, capacity or 16. What is the location of the calculation?

hash        = 0100 0000 1110 0010 1010 0010‬ 0001 0000  ---
                                                         |And or calculation
hash >>> 16 = 0000 0000 0000 0000 0001 0100 1100 0010  ---
------------------------------------------------------ 
hash result = 0100 0000 1110 0010 1011 0110 1101 0010---
                                                         |& and operations
 Capacity-1 = 0000 0000 0000 0000 0000 0000 0000 1111---
------------------------------------------------------
#Get position = 0000 0000 0000 0000 0000 0000 0000 0010 get position is 2

In the above two examples, we get that one is 4 and the other is 2, which are just two binary numbers I input casually. If these two numbers don't go through hash ^ (hash > > > 16) operation, how will the position change?

hash        = 0001 0100 1100 0010 0110 0001‬ 0010 0000
 Capacity-1 = 0000 0000 0000 0000 0000 0000 0000 1111
------------------------------------------------------
        Result = 0000 0000 0000 0000 0000 0000 0000
 #Get position 0 
hash        = 0100 0000 1110 0010 1010 0010‬ 0001 0000
 Capacity-1 = 0000 0000 0000 0000 0000 0000 0000 1111
------------------------------------------------------
        Result = 0000 0000 0000 0000 0000 0000 0000
 #Get position 0 

It can be found that the positions are all 0, and the conflict probability is increased. It can be seen that hash ^ (hash > > > 16) can make the high 16 bits and the low 16 bits of the hash value of the data be mixed with or, which can reduce the probability of data insertion conflict when the low order is the same.

5. Initialization size of HashMap

  1. Initialization size is 16, why 16?

    This may be because each expansion is twice the capacity. Choosing the power 16 of 2 as the initial capacity is beneficial to re calculate the location of Hash. Why is 16? I think it's an empirical value. In theory, as long as it's the power of 2, there's no problem.

6. Capacity expansion method of HashMap

What is the load factor? The load factor is 0.75.

What is the expansion method? See the source code description.

  /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<k,v>[] resize() {
        Node<k,v>[] oldTab = table;
        // Existing capacity
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // Existing capacity expansion threshold
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // If the current length is already greater than the maximum capacity. End expansion
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // If it is smaller than the maximum capacity after doubling, and the existing capacity is greater than or equal to the initial capacity, it will be doubled
            else if ((newCap = oldCap &lt;&lt; 1) &lt; MAXIMUM_CAPACITY &amp;&amp; oldCap >= DEFAULT_INITIAL_CAPACITY)
               // Double the capacity expansion threshold
                newThr = oldThr &lt;&lt; 1; // double threshold
        }
        // Current capacity = 0, but the current record capacity > 0, obtain the current record capacity.
       else if (oldThr > 0) // initial capacity was placed in threshold
            // Enter here, and the explanation is by specifying the constructor of capacity and load factor
            newCap = oldThr;
        else {    	           // zero initial threshold signifies using defaults
            // Enter here to explain that it is through nonparametric construction
            // New capacity
            newCap = DEFAULT_INITIAL_CAPACITY;
            // New threshold
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            // Calculate the expansion threshold
            newThr = (newCap &lt; MAXIMUM_CAPACITY &amp;&amp; ft &lt; (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<k,v>[] newTab = (Node<k,v>[])new Node[newCap];
        table = newTab;
        // If oldtab! = null, it means capacity expansion; otherwise, it means initialization and return directly
        if (oldTab != null) {
            for (int j = 0; j &lt; oldCap; ++j) {
                Node<k,v> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // If there is no element in the next node of the current element, the recalculation position of the current element is directly put into
                    if (e.next == null)
                        newTab[e.hash &amp; (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        // If the current node is a red black tree
                        ((TreeNode<k,v>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<k,v> loHead = null, loTail = null;
                        Node<k,v> hiHead = null, hiTail = null;
                        Node<k,v> next;
                        do {
                            next = e.next;
                            // ==0, position unchanged
                            if ((e.hash &amp; oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // e. Hash & oldcap! = 0, position changed to: position + capacity before expansion
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

How to determine the position of elements in the array during the expansion is determined by if ((e.hash & oldcap) = = 0).

hash HEX(97)  = 0110 0001‬
n    HEX(16)  = 0001 0000
--------------------------
         //Result = 0000 0000
# e. Hash & oldcap = 0 calculate location or pre expansion location
    
hash HEX(17)  = 0001 0001‬
n    HEX(16)  = 0001 0000
--------------------------
         //Result = 00010000
#  e. Hash & oldcap! = 0, the calculated position is pre expansion position + pre expansion capacity

It can also be seen from the above analysis that if ((e.hash & oldcap) = = 0) can be used to determine the position after capacity expansion only when the capacity is the power of 2 each time.

7. Red black tree in HashMap

The implementation of HashMap in Java 8 adds a red black tree. When there are eight linked list nodes, the linked list will be converted into a red black tree. When there are less than six, the linked list will be returned. The reason is that when there are too many nodes, using red black tree can find nodes more efficiently. After all, the red black tree is a binary search tree.

  1. When the number of nodes is, the list will turn into a red black tree.

    When the number of nodes in the list is greater than or equal to 8, the list will be converted into a tree structure.

  2. When the number of nodes is, the red black tree will return to the list.

    When the number of nodes is less than or equal to 6, the tree will be transformed into a linked list.

  3. Why there is a difference between transition conditions 8 and 6.

    If there is no difference, it's 8. If you insert and delete elements frequently, and the number of linked lists just hovers around 8, then you will frequently turn the linked list to tree, tree to linked list.

8. Why are all capacities powers of two?

When the capacity is the power of 2, the hash value of key and then & amp; (capacity-1) determine the position, the collision probability will be relatively low, because when the capacity is the power of 2, the binary number after subtracting 1 is all 1, so the result of the operation is equal to the number of bits after the hash value and 1.

Here is an example.

hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         //Result = 0000 0001
# The calculated position is 1
hash HEX(99)  = 0110 0011‬
n-1  HEX(15)  = 0000 1111
--------------------------
         //Result = 0000 0011
# The calculated position is 3
hash HEX(101)  = 0110 0101‬
n-1  HEX(15)   = 0000 1111
--------------------------
         //Result = 0000 0101
# The calculated position is 5

If it is another capacity value, assuming it is 9, the probability of collision with the calculation result is relatively large.

hash HEX(97)  = 0110 0001‬
n-1  HEX(09)  = 0000 1001
--------------------------
         //Result = 0000 0001
# The calculated position is 1
hash HEX(99)  = 0110 0011‬
n-1  HEX(09)  = 0000 1001
--------------------------
         //Result = 0000 0001
# The calculated position is 1
hash HEX(101)  = 0110 0101‬
n-1  HEX(09)   = 0000 1001
--------------------------
         //Result = 0000 0001
# The calculated position is 1

In addition, each time it is a power of 2, it is also convenient to recalculate the location when the HashMap is expanded.

hash HEX(97)  = 0110 0001‬
n-1  HEX(15)  = 0000 1111
--------------------------
         //Result = 0000 0001
# The calculated position is 1
    
hash HEX(97)  = 0110 0001‬
n-1  HEX(31)  = 0001 1111
--------------------------
         //Result = 0000 0001
# The calculated position is 1

9. fail fast

HashMap traversal uses a fast failure mechanism, which is a universal mechanism in Java unsafe collections. This mechanism can make the collection change, delete and add operations by threads during traversal, which will trigger concurrent modification exceptions.

Its implementation mechanism is to save a copy of modCount before traversal, and each time the next element to be traversed is obtained, it will compare whether the current modCount and the saved modCount are equal.

Fast failure can also be regarded as a security mechanism. In this way, when multiple threads operate unsafe collections, because of the fast failure mechanism, exceptions will be thrown.

10. Thread safe Map

  1. use Collections.synchronizedMap(Map) create a thread safe Map

    Implementation principle: there is a variable final Object mutex;, and the operation method adds the synchronized (mutex) exclusive lock.

  2. Using Hashtable

  3. Using ConcurrentHashMap

Last words

The article has been included in Github.com/niumoo/JavaNotes , welcome Star and advice. There are also a lot of interview points for large factories, core knowledge that Java programmers need to master and other articles. I have also sorted out many of my words. Welcome Star and perfection, and hope we can become excellent together.

If the article is helpful, you can click "like" or "share". It's all support. I like it!
This article is updated every week. In order to pay attention to my updated articles and share dry cargo, I can pay attention to the unread code official account or the public number. My blog.

Keywords: Programming Java github less

Added by mediabob on Thu, 11 Jun 2020 06:09:09 +0300