The underlying implementation principle of HashMap

Summary

The following is based on JDK 1.8

data structure

HashMap is actually an "array + linked list" data structure.

In the put operation, the subscript of the array is found by the internal definition algorithm, and the data is directly put into the element of the array. If the element of the array obtained by the algorithm already has elements (commonly known as hash conflict, the practical significance of the chain structure is to solve the problem of hash conflict). The list on this array element will be traversed, and the new data will be placed at the end of the list.

Objects that store data

Node objects of nodes stored in arrays or linked lists, the following is the source code

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

The storage object Node actually implements the Map.Entry object interface.

There are four attributes

  • final int hash;
    This property stores the hash value of the object's key, which is implemented as follows:
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
    There are also different ways to implement hashCode() depending on the data type of the key s that are passed in, which are not listed here.
  • final K key;
    key of data
  • V value;
    value of data
  • Node<K,V> next;
    Point to the next object, and when a hash conflict occurs, the array element appears in a linked list structure that uses next to point to the next element object
Problems caused by using linked list structure

The corresponding subscripts can be found efficiently by hashing algorithm from search to stop, but with the increase of data, hash collisions are too many. When searching for data, finding the linked list will search the corresponding data through traversal, which will make get data more and more inefficient. In jdk1.8, if the number of elements in the list is greater than or equal to 8, the list structure will be reorganized into a "red-black tree" structure, which makes the get efficiency much higher than that of the list when there are too many hash collisions.

Data insertion, put ()

Several important variables

  • transient Node<K,V>[] table;
    • With the increase of data, the number of arrays storing data keeps increasing. The length of the chef is 16:
      static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
      
  • final float loadFactor;
    • Load factor, multiplied by the original length of the array whenever expansion occurs, defaults to 0.75:
      static final float DEFAULT_LOAD_FACTOR = 0.75f;
      
  • int threshold;
    • If the maximum amount of data allowed to be stored exceeds this threshold, the resize() expansion will be triggered.
  • transient int modCount;
    • Record the number of changes in the internal structure, put operation (no coverage value is calculated), and other ___________.
  • transient int size;
    • Number of elements actually stored

Insert operation put()

    public V put(K key, V value) {
        // Pass in the hash value of key, key, value,
        return putVal(hash(key), key, value, false, true);
    }
    // Truly executed insertion methods
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // Construct element nodes           
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // Check whether the array is empty or if the length of the array is 0, and expand it.
        // An empty array length initializes because it is not initialized when HashMap is instantiated
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        
        // According to the hash value of the calculated key, the index points of the array that should be inserted are obtained.    
        // New elements are inserted when empty
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // The location that should be inserted is not empty. Data already exists.
        else {
            Node<K,V> e; K k;
            // Find the existing array elements, and if hash is equal and key is equal, override them directly
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // Whether the node is TreeNode (red-black tree structure), is that the red-black tree interpolates directly according to the key value pair.
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // This found element is neither the same as the element to be inserted nor the structure of the red-black tree, indicating that in the list, the next step is to traverse the list insertion.
            else {
                // Traversal list
                for (int binCount = 0; ; ++binCount) {
                    // Insert the list, loop until the last node of the list has not found the key, then insert after the last node
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // If the length of the list is longer than 8, it will be converted into a red-black tree structure.
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // Link list insertion, if key exists, directly overrides value
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // Number of Structural Changes +
        ++modCount;
        // When the actual stored element is larger than the critical value, the capacity is expanded (the critical value is obtained by the array length * load factor)
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The main operations are:

  1. Initialization
    • The first initialization is to determine whether the table is empty and the array length is empty. (In instantiating HashMap, an array is not initialized)
  2. insert
    • This value does not exist and the new node is inserted directly into the array
    • This value already exists in the array:
      • New values and existing values in arrays, key s and hash es are equal, covering directly
      • The node is a red-black tree structure, which calls the red-black tree function to insert
      • This interpolation is different from the original element, and it is not a red-black tree structure, so it is found in the list.
        • Traverse the list and find that the element exists and override it
        • After traversing the list, it is found that the element does not exist and inserted into the end of the list. Check if the length is longer than 8, then turn the list into a red-black tree.
  3. Judging whether to expand:
    • If the condition satisfies the threshold of element size > the maximum number of elements allowed, the expansion operation is performed again. Each expansion operation, the new array size will be twice the original array length.

Expansion operation resize()

Static languages have to be initialized with array lengths, but put operations make it hard to assemble arrays, so they need to be expanded.
The condition for triggering expansion is that the current actual number of elements > array len gt h * load factor, i.e. size > threshold, threshold is the threshold.

resize() source code
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // Initial capacity is the maximum allowable quantity
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // The old capacity is larger than the maximum allowable limit and does not expand
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // Double expansion of original length
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // First initialization, the old threshold is threshold 
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        
        // First initialization
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        
        // New Maximum Allowable Element Quantity Value
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        // New Array
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            // Traversing through old arrays
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // Put it directly into the new array according to the original index
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // Traversal list
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // Put it in the index
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // Original Index + oldCap
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Fetch data get()

Source code

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // Array subscripts are obtained by hashCode of key and addressing algorithm
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // If the key and hash in an array element are equal, they are returned directly
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // If not equal
            if ((e = first.next) != null) {
                // It's the structure of the mangrove tree. Look for it in the mangrove tree.
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // Traversing list elements
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

Keywords: JDK

Added by Dixen on Fri, 19 Jul 2019 12:57:49 +0300