Analysis of jdk8 HashMap source code

General narration

HashMap is the next implementation of Map interface, which provides the storage of key value pairs. JDK1.7 and before, data storage was realized through array + linked list, while JDK8 began to add red black tree, and the storage structure was optimized, that is, there are two storage methods: array + linked list or array + red black tree.

How to implement storage

Before you start, you can look at the relevant contents of the Hash table.

  1. Hash table: it is understood as a table containing multiple grids, and each grid stores elements with different hash values. For example, grid 1 stores objects with a hash value of 50, and grid 2 stores objects with a hash value of 100
  2. Hash collision: hash values calculated from different values should be different in general, so they will be stored in different grids in the hash table. However, if the amount of data is enlarged to a certain order of magnitude, different objects will have the same calculated hash value. Therefore, there may be multiple elements in the same grid in the hash table at the same time, which is called hash collision. It can also be used to explain that two objects with the same hash value in Java are not necessarily the same object, but the hash value of the same object must be the same. Only when both hash value and equals are true can it be said to be the same object.

After understanding the Hash table, it will be much clearer to look at the HashMap. How to decide which linked list to insert into which position of the array, in fact, the essence is why there is a Hash in the HashMap, because it is realized with the help of the Hash table, that is, the Hash value of the key to be stored is calculated, and the subscript value is obtained through a certain logical conversion, so the value corresponding to the key is stored in the subscript position of the array. If it is stored continuously according to this logic, it is bound to appear that the subscript positions calculated by multiple keys are the same, and the subscript position of the array already has elements. Therefore, in order to solve the "collision" problem, HashMap stores the elements with the same subscript after calculation through the linked list, then puts the head node of the linked list into the array, and solves the "collision" through the continuous pointing of the node.

This is the storage mode of array + linked list; The way of array + red black tree is just to replace the linked list with red black tree (the red black tree is also based on the linked list structure)

Source code

Class inheritance graph

Implementation of internal class linked list

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;// Calculate hash value
    final K key;// Key used to calculate hash
    V value;// Stored value
    Node<K,V> next;// Point to the next node (form a linked list)
    ...
}

field

// Serialization id
private static final long serialVersionUID = 362498820763181265L;
// Default initialization capacity: 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
// When no load factor is specified in the constructor, make the default load factor used
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// A threshold for the transformation of linked list into red black tree
static final int TREEIFY_THRESHOLD = 8;
// Threshold value of red black tree to linked list
static final int UNTREEIFY_THRESHOLD = 6;
// Minimum red black tree capacity
static final int MIN_TREEIFY_CAPACITY = 64;
// Linked list array
transient Node<K,V>[] table;
// Can be used to get iterators
transient Set<Map.Entry<K,V>> entrySet;
// Number of key value pairs actually stored in HashMap
transient int size;
// It is used to record the structural modification times of the existing Has and Map (the number of key value pairs and reHash will be recorded)
// Field is used to check thread insecurity and provide a fast failure mechanism
transient int modCount;
// The size after the next expansion, and the threshold of this penalty expansion
int threshold;
// Load factor
final float loadFactor;

Construction method

  • Construction with initial capacity and load factor
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
  • Construction with initialization capacity only (using default load factor)
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
  • Parameterless construction (all use default)
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
  • Construction method with Map parameters (often used when a new HashMap with the same key value pair is required)
public HashMap(Map<? extends K, ? extends V> m) {
	// Use default load (0.75)
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    // assignment
    putMapEntries(m, false);
}

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
	// Gets the size of the original map
    int s = m.size();
    if (s > 0) {
    	// table is still null when it comes in through construction
        if (table == null) { 
        	// It is calculated that the load threshold is just not reached
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
            	// Get a number larger than T and a power of 2, such as t=10, return 16, and complete it by constantly shifting
                threshold = tableSizeFor(t); 
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
             // The mapping of the old map is continuously assigned to the new map through a loop
            putVal(hash(key), key, value, false, evict);
        }
    }
}

// Suppose cap is 10
static final int tableSizeFor(int cap) {
    int n = cap - 1; // n = 9 1001
    n |= n >>> 1; // n = (n | (n >>> 1)) = (1001 | 0100) = 1101 = 13
    n |= n >>> 2; // n = (n | (n >>> 2)) = (1101 | 0011) = 1110 = 15
    n |= n >>> 4; // n = (n | (n >>> 4)) = (1110 | 0000) = 1110 = 15
    n |= n >>> 8; // n = (n | (n >>> 4)) = (1110 | 0000) = 1110 = 15
    n |= n >>> 16; // n = (n | (n >>> 4)) = (1110 | 0000) = 1110 = 15
    // Return n + 1 = 16
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; 
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // When the table is not initialized, it is initialized
    if ((tab = table) == null || (n = tab.length) == 0)
    	// Initialization completed, get size
        n = (tab = resize()).length;
        // Calculate (according to the hash value calculated by the key) & (n-1) calculate the subscript and judge null
    if ((p = tab[i = (n - 1) & hash]) == null)
    	// null is assigned
        tab[i] = newNode(hash, key, value, null); 
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    // Will modify record + 1
    ++modCount;
    // Judge whether capacity expansion is required
    if (++size > threshold) 
        resize();
    afterNodeInsertion(evict);
    return null;
}

Common API

Save values in HashMap

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)// When the table is not initialized, it is initialized
            n = (tab = resize()).length;// Initialization completed, get size
        if ((p = tab[i = (n - 1) & hash]) == null)// Calculate (according to the hash value calculated by the key) & (n-1) calculate the subscript and judge null
            tab[i] = newNode(hash, key, value, null); // null is assigned
        else {// Current location is not null (value already exists)
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;// The key is the same, the hash value is the same, and the value is the same. It is directly overwritten (if it is null, it is stored in the linked list first)
            else if (p instanceof TreeNode)// If it is a red black tree node, it is stored in a red black tree
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {// It is not a root node or a red black tree. It is stored in a linked list
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // Judge whether the conditions for turning to red black tree are met
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;// If the hash value is the same and the key is the same, it will be overwritten directly
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;// Will modify record + 1
        if (++size > threshold) // Judge whether capacity expansion is required
            resize();
        afterNodeInsertion(evict);
        return null;
    }
  • Get value from HashMap
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {// Get from the tree node or the linked list according to the situation
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
  • Delete elements by key
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}
  • Empty HashMap
public void clear() {
    Node<K,V>[] tab;
    modCount++;
    if ((tab = table) != null && size > 0) {// Check whether the map is empty
        size = 0;
        for (int i = 0; i < tab.length; ++i)// Just set all arrays to nul and wait for GC to complete recycling
            tab[i] = null;
    }
}
  • Determine whether value is included
public boolean containsValue(Object value) {
    Node<K,V>[] tab; V v;
    if ((tab = table) != null && size > 0) {
        for (int i = 0; i < tab.length; ++i) {// Traversal array
            for (Node<K,V> e = tab[i]; e != null; e = e.next) {// Traversal linked list
                if ((v = e.value) == value ||
                    (value != null && value.equals(v)))
                    return true;
            }
        }
    }
    return false;
}
  • Determine whether the key is included
public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

Attention to detail

Why should the in the hashCode method be multiplied by a constant?

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

First, multiply by a number to hash the hash value, and multiply by 31 is a statistical result. There are fewer hash collisions under this number, so it is written dead directly.

Added by svenski on Sun, 30 Jan 2022 03:47:06 +0200