HashMap source code analysis (1.8)

HashMap source code analysis (JDK1.8)

1, Implementation interface

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {}

HashMap inherits the AbstractMap class and implements the Map interface

public abstract class AbstractMap<K,V> implements Map<K,V> {}

The AbstractMap class implements the Map interface. This is a mistake admitted by the author, but it has little impact, so it has not been modified (capricious)

2, Constructor

1. Parameterless constructor

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; 
    }

Set the default loading factor to 0.75f for subsequent capacity expansion calculation

2. Constructor of one parameter

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

Pass the input parameters and the default load factor into the constructor of the two parameters

3. Constructor of two parameters

    public HashMap(int initialCapacity, float loadFactor) {
        //Some robust code is omitted here to enhance readability
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

The loading factor is not always 0.75f. If you call the construction method with two parameters, you can choose the expansion factor you want.
Threshold is the threshold (the array starts to expand when it reaches this value), but this is used to save the size of the initial container.

4.tableSizeFor() method

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

I was stunned to see these operations of shifting right (without sign bit) and or (|). At this time, you can enter a value to calculate it. After trying several times, you will find that no matter how much (x) you enter, it will return you a number larger than X and to the power of N-1 of 2. (input 3 returns 3, input 6 returns 7, input 14 returns 15, and so on). The principle is that after converting a number into binary, move the highest 1 to the right and up the previous number. This operation ensures that the binary of the number is all 1 (n-power-1 of 2). Finally, after + 1, it becomes N-power of 2. Then assign this value to threshold.

I'm finished

After reading version 1.7, I should be very confused. At present, only loadFactor and threshold are assigned, and the initialization size of the container is not given. Here should be the optimization point: when you create a HashMap, you are not used (put the value inside). Why should I give you the initialization size. I feel a little lazy.

3, Encapsulate node < K, V > objects

We all know that HashMap is a data structure of array and linked list. The problem is that arrays can only put the same data type. What type is the array of HashMap?

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

The above code is the simplified code. This is the element of the array in 1.8 (the Entry object in 1.7)

transient Node<K,V>[] table;

This is the underlying array of HashMap. transient seems to prevent serialization and can be ignored for the time being

4, Everything is ready. I only owe you

Personally, in 1.8, the core code should be put (one method solves all problems). There is not much nonsense. Look at the code directly

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

This is always the case when looking at the source code. There is only one method in one method. You have to click another layer each time to see (dolls), but here is a hash (key) method, which is almost missing...

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

The of this method is to calculate the hashcode value of the currently entered key (used for the subsequent array position calculation)

5.putVal() method

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) // Step 1
            n = (tab = resize()).length;// Step 2
        if ((p = tab[i = (n - 1) & hash]) == null)// Step 3
            tab[i] = newNode(hash, key, value, null); // Step 4
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

I will simplify some of the above codes. Similarly, this is also simplified, but it is not simplified at all. Each code is the essence

When putting the first key value pair

Overall process: when a HashMap is just created and the first key value pair is put into it, first enter through step 1, then initialize the size of the collection through the resize() method, and then enter step 4 through step 3 to put the Node into the array. So you should look at the resize () method first

6.resize() method (one out of three plus 1)

The resize () method is also relatively long. I've summarized here. It's about choosing one of the three ifs and then adding an if.

Preparation before one out of three

        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;

First choice

        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }

It is mainly the code in else if. Here, it is specified that the size of array expansion is < < 1, and the original size is multiplied by 2. At the same time, the new threshold is also < < 1

The second option (actually else if)

        else if (oldThr > 0)
            newCap = oldThr; 

In the preparation of one out of three, oldThr is the result (threshold) of the n-power of 2 obtained by the above stack of shift right operations. This result is used here to assign a value to newCap (actually the size of the set initialization)

Third choice (last else of three)

        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }

When new HashMap(), if the parameterless constructor is selected, the third selection will be executed, and the initialization collection size is
DEFAULT_ INITIAL_ Capability (1 < < 4 = 16), threshold is (16 * 0.75 = 12)

Finally, the explanation of adding 1 is a little complicated, which is left to everyone to read by themselves.

Hash conflict problem (one out of three plus one)

When there are other key value pairs at the position of the newly placed key value pair:

First choice

            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;

First compare the hash values of the original key and the newly added key. If they are the same, judge whether the key is the same object. If not, compare whether the key values are the same. (there is a hidden optimization point here. When the key is of String type, = = comparison will overwrite the subsequent equals comparison to improve efficiency)

Keywords: Java data structure

Added by daltman1967 on Thu, 23 Sep 2021 17:20:05 +0300