Ten minutes of in-depth understanding of HashMap source code

Ten minutes to understand the HashMap source code in depth, after reading you can understand? I think it will take one more minute to master it completely! ____________

Finally came to the more complex HashMap, because of the internal variables, internal classes, methods are more, can not be as flat as ArrayList, so ready to cut in from several specific perspectives.

Bucket structure

Each storage location of HashMap is also called a bucket. When a Key & Value enters the map, it allocates a bucket to store according to its hash value.

Look at the definition of bucket: table is the so-called bucket structure, in other words, an array of nodes.

transient Node<K,V>[] table;
transient int size;

node

HashMap is a map structure, which is different from Collection structure. It does not store a single object, but stores key-value pairs.
So the most basic internal storage unit is the node: Node.

Nodes are defined as follows:

class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

It can be seen that besides storing three values of key, Vaue and hash, the Node also has a next pointer, so that multiple Nodes can form a one-way list. This is a way to resolve hash conflicts. If multiple nodes are allocated to the same bucket, they can form a linked list.

There is another type of node inside HashMap called TreeNode:

class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
 }

TreeNode is inherited from Node. It can form a red-black tree. Why is there such a thing? As mentioned above, if a node is hashed to the same bucket, it may lead to a very long list, which will lead to a sharp decline in access efficiency. If the key is comparable (implements the Comparable interface), HashMap converts the list into a balanced binary tree to save some efficiency. In practice, we hope that this phenomenon will never happen.

With this knowledge, you can see several definitions of HashMap constants:

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

TREEIFY_THRESHOLD, when the number of nodes in a bucket reaches this number, the list can be converted into a tree.
UNTREEIFY_THRESHOLD, when the number in a bucket is less than that, the tree is converted back to the linked list.
MIN_TREEIFY_CAPACITY, if the number of buckets is less than this, then the number of buckets should be expanded first, instead of converting the list into a tree.

put method: Key & Value

Insert interface:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The put method calls the private method putVal, but it's worth noting that the hash value of the key is not a hashCode directly used, and the final hash= (hashCode moved right 16) ^ hashCode.

When the hash value is mapped to the bucket position, the low part of the hash value is taken, so that if only the high part of a batch of keys is inconsistent, it will aggregate in the same bucket. (If the number of buckets is small, the key is Float type and is a continuous integer, this case will occur.)

Perform the insertion process:

V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;

        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

        //Code snippet 1
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);            
        else {
            Node<K,V> e; K k;
            //Code snippet 2
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //Code snippet 3    
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //Code snippet 4
                for (int binCount = 0; ; ++binCount) {
                    //Code snippet 4.1
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //Code snippet 4.2
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //Code snippet 5
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        //Code snippet 6
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The initial processing bucket array has not been allocated;
Code snippet 1:i = (n - 1) & hash calculates the location of the bucket corresponding to hash, because n is the meditation of 2, which is an efficient modular operation; if this location is empty, then directly create Node and put it in OK; otherwise, the node of the conflict location is recorded as P;
Code snippet 2: If the key of the node P is equal to the key of the incoming node, then the new value will be put into an existing node and marked as e.
Code snippet 3: If node P is a tree, insert key & value into the tree.
Code 4: P is the header of the list or a single node. In both cases, it can be done by scanning the list.
- Code snippet 4.1: If the list reaches the end, insert a new node and, if necessary, convert the list into a tree.
- Code Section 4.2: If the same key is found in the list, it will be treated as Code Section 2.
Code snippet 5: Save value into node e
Code snippet 6: If size exceeds a specific value, adjust the number of buckets. The strategy for resize is described below.

remove method

Understanding the put method makes it easy to remove the method. Explain the private method removeNode directly.

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;

    //Code snippet 1
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {

        //Code snippet 2:
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;

        //Code snippet 3:
        else if ((e = p.next) != null) {
            //Code snippet 3.1:
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                //Code snippet 3.2:
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }

        //Code snippet 4:
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {                 
            //Code snippet 4.1:
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            //Code snippet 4.2:
            else if (node == p)
                tab[index] = node.next;
            //Code snippet 4.3:
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

Code snippet 1: This if condition determines whether the barrel corresponding to hash is empty or not, and if so, there must be no key in the map; otherwise, the first node is marked P;
Code snippet 2: If the key of P node is equal to the parameter key, the node to be removed is found and recorded as node.
Code snippet 3: Scan other nodes in the bucket
- Code snippet 3.1: If this is a tree in the bucket, the tree lookup logic is executed.
- Code snippet 3.2: Execute linked list scanning logic;
Code snippet 4: If a node is found, try to delete it
- Code snippet 4.1: If it is a tree node, the node deletion logic of the tree is executed.
- Code snippet 4.2: node is the head node of the list. Put node.next into the bucket and it will be ok.
- Code snippet 4.3: Delete intermediate nodes in linked list

rehash

rehash is to redistribute the bucket and re-hash the original node to the new bucket location.

First look at two member variables related to the number of buckets

final float loadFactor;
int threshold;

Load Factor is a value set when creating HashMap, that is, the upper limit of the ratio between the number of entries contained in map and the number of buckets. Once the load of map reaches this value, the number of buckets needs to be expanded.
When the number of threshold map s reaches this value, we need to expand the bucket. Its value is basically equal to the capacity of the bucket * loadFactor. I feel that it is a cached value to speed up the relevant operations without calculating it every time.

Bucket expansion strategy, see the following function, if the required capacity is cap, the real expansion capacity is a 2 times greater than cap.
This dependence increases the capacity by a factor of 2 for each expansion.

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

This is the concrete extension logic.

Node<K,V>[] resize() {

     //The logic for calculating newCap is omitted here.

    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;

                //Branch 1
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //Branch 2
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //Branch 3
                else { // preserve order
                    //The list splitting logic is omitted here.   
                }
        }
    }
    return newTab;
}

First, a new bucket array is allocated.
Scanning old buckets, migrating elements;
Branch 1: There is only one new node in the bucket, so it can be put into the corresponding position of the new bucket.
Branch 2: Inside the bucket is a tree that executes tree splitting logic
Branch 3: Inside the bucket is a linked list, which implements the splitting logic of the linked list.

Since the number of new barrels is 2 times that of old ones, each old barrel can correspond to two or more new barrels without interference. So the migration logic above does not need to check whether there are nodes in the new bucket.

It can be seen that rehash costs a lot, and it is better to set an appropriate capacity to avoid rehash when initializing.

Finally, although the above code does not reflect, the number of buckets will only increase, not decrease, during the lifetime of HashMap.

iterator

The core of all iterators is this HashIterator

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null && (t = table) != null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }
}

For simplicity, only the next part of the code is retained. The principle is simple. Next points to the next node and must be in a bucket (the bucket is located in an index). So if there are other nodes in the same bucket, it must be found along next.next, whether it's a list or a tree. Otherwise, scan the next bucket.

With the above node iterator, other user-visible iterators are implemented through it.

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

view

Part of the KeySet code: This is not a stand-alone Set, but a view whose interface accesses all HashMap data internally.

final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
}

Entry Sets, Values and KeySet s are similar, and will not be repeated.

Summary of main points

1. Key & value is stored in nodes.
2. Nodes may be linked list nodes or tree nodes.
3. Distributing buckets to nodes according to key hash value.
4. If there are more than one node in the bucket, it will either form a list or a tree.
5. Loading factors limit the number of nodes and buckets, and expand the number of buckets when necessary.
6. The number of barrels must be 2 times. The process of redistributing barrels is called rehash, which is an expensive operation.

Write at the end

Keywords: Java less

Added by ironside82 on Wed, 10 Jul 2019 20:42:02 +0300

Programming VIP