Implementation principle of HashMap (just read this article)


HashMap is a collection container that senior java engineers must be proficient in. Its importance is almost equal to the importance of Volatile in concurrent programming (visibility and order).

Through the detailed explanation of graphic source code, this article deeply analyzes the important kernel knowledge of HashMap, which is easy to read, learn and understand. It's always good to collect and learn more in case of being asked in the interview.

I'm Mike, who has spent more than 10 years teaching the architecture technology of BAT first-line large factories. Key points of this article:

  1. Data structure of HashMap
  2. HashMap core members
  3. Node array of HashMap
  4. Data storage of HashMap
  5. Hash function of HashMap
  6. Hash conflict: linked hash table
  7. get method of HashMap: hash function
  8. put method of HashMap
  9. Why must 2^n be used for the number of slots?

Data structure of HashMap

Firstly, from the perspective of data structure, HashMap is the data structure of array + linked list + red black tree (red black tree is added to JDK1.8), as shown below:

Here we need to understand two questions:

*What is the underlying storage of data?
What are the advantages of this storage method*

Default initial capacity(Array default size):16,2 Integer power of

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 

 

 Maximum capacity

static final int MAXIMUM_CAPACITY = 1 << 30;

 

Default load factor

static final float DEFAULT_LOAD_FACTOR = 0.75f;

Load factor is used to measure HashMap The degree of fullness indicates that when map The data stored in the collection reaches 75% of the current array size%Capacity expansion is required

 

Linked list to red black tree boundary

static final int TREEIFY_THRESHOLD = 8;

 

The red black tree turns away from the boundary of the linked list

static final int UNTREEIFY_THRESHOLD = 6;

 

Hash bucket array

transient Node<K,V>[] table;

 

Number of elements actually stored

transient int size;

 

When map The data inside is larger than this threshold It will be expanded

int threshold   threshold = table.length * loadFactor

2.Node array

It can be seen from the source code that there is a very important field in the HashMap class, that is, Node[] table, that is, hash bucket array. Obviously, it is an array of nodes.

static class Node<K,V> implements Map.Entry<K,V> {

    final int hash;//Used to locate the array index position

    final K key;

    V value;

    Node<K,V> next;//The next Node in the linked list

 

    Node(int hash, K key, V value, Node<K,V> next) {

        this.hash = hash;

        this.key = key;

        this.value = value;

        this.next = next;

    }

 

    public final K getKey()        { return key; }

    public final V getValue()      { return value; }

    public final String toString() { return key + "=" + value; }

 

    public final int hashCode() {

        return Objects.hashCode(key) ^ Objects.hashCode(value);

    }



    public final V setValue(V newValue) {

        V oldValue = value;

        value = newValue;

        return oldValue;

    }



    public final boolean equals(Object o) {

        if (o == this)

            return true;

        if (o instanceof Map.Entry) {

            Map.Entry<?,?> e = (Map.Entry<?,?>)o;

            if (Objects.equals(key, e.getKey()) &&

                Objects.equals(value, e.getValue()))

                return true;

        }

        return false;

    }

}

Node is an internal class of HashMap, which implements map The entry interface is essentially a mapping (key value pair).

Data storage of HashMap

1. Hash table to store

HashMap uses a hash table to store data.

Hash table (also known as hash table) is a data structure accessed directly according to the key value. As long as you enter the value to be found, that is, key, you can find its corresponding value.

Hash table is actually an extension of array, which evolved from array. It can be said that if there is no array, there is no hash table.

2. Hash function

The element in the hash table is determined by the hash function. The Key key of the data element is taken as the independent variable. Through a certain functional relationship (called hash function), the calculated value is the storage address of the element.

Expressed as: Addr = H (key), as shown in the following figure:

The design of hash function in hash table is very important, which is also one of the key problems in the process of building hash table.

3. Core issues

Before creating a hash table, two main problems need to be solved:

1) Construct an appropriate hash function, and the values of uniformity H (key) are evenly distributed in the hash table

2) Conflict handling

Conflict: in a hash table, different keyword values correspond to the same storage location.

4. Hash conflict: linked hash table

In order to solve the conflict of hash table, address method and chain address method can be used to solve the problem. HashMap in Java adopts chain address method.

Chain address method, in short, is the combination of array and linked list, as shown in the following figure:

Hash function of HashMap

/**

* Recalculate hash value

*/

static final int hash(Object key) {

    

    int h;

 

     // h = key.hashCode() is the first step to get the hashCode value

     // H ^ (H > > > 16) is the high-order participation operation in the second step

    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

}

//Calculate array slot

(n - 1) & hash

hashCode the key to obtain a 32-bit int value H, and then use h XOR H > > > 16 bits. At jdk1 In the implementation of 8, the algorithm of high-order operation is optimized, which is realized by the high 16 bit XOR low 16 bit of hashCode():
(h = k.hashCode()) ^ (h >>> 16).

The benefits are:

The high-order and low-order values of hashcode can be mixed for XOR operation, and after mixing, the high-order information is added to the low-order information, so that the high-order information is retained in disguise.

It means that the high 16 bits of hash are also involved in the calculation of subscript. If there are more doped elements, the randomness of the generated hash value will increase and the hash collision will be reduced.

remarks:

 - ^XOR: 1 for different and 0 for the same
 -  >>> : Move right without symbol: fill 0 on the right
 - &Operation: if two digits are "1" at the same time, the result will be "1", otherwise it will be 0

H & (table. Length - 1) to get the save bit of the object, and the length of the underlying array of HashMap is always the nth power of 2.

Why must slot number use 2^n

1. To make the hash result more uniform

If the number of slots is not 16 but 17, the slot calculation formula becomes: (17 – 1) & hash

As can be seen from the above, the calculation results will converge greatly. After hashcode participates in the & operation, it is shielded by more zeros, and there are only two calculation results: 0 and 16, which is a disaster for hashmap. 2. Equivalent to length modulus

When length is always to the nth power of 2, H & (length-1) operation is equivalent to modulo length, that is, h%length, but & is more efficient than%.

Bit operation is more efficient than arithmetic operation because arithmetic operation will still be converted into bit operation.

The ultimate goal is to make the hash result more uniform, reduce hash collision and improve the operation efficiency of hashmap.

How to analyze HashMap:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,

               boolean evict) {

    Node<K,V>[] tab; Node<K,V> p; int n, i;

    

    // When the array of the current object is null or the array length is 0, the array needs to be initialized

    if ((tab = table) == null || (n = tab.length) == 0) {

        n = (tab = resize()).length;

    }

    

    // Use the hash and the value of the array length minus one to XOR to obtain the scattered array subscript, which indicates that the current

    // The key will be stored in this location. If there is no value in this location, you can directly create a new k-v node to store it

    // Where length n is a power of 2

    if ((p = tab[i = (n - 1) & hash]) == null) {

        tab[i] = newNode(hash, key, value, null);

    }

    

    // If you go to else, it indicates that there is already content in the array position indexed by the key, that is, there is a collision

    // At this time, we need more complex ways to deal with collisions, such as linked lists and trees

    else {

        Node<K,V> e; K k;

       

        //Node key exists, directly overwriting value

        if (p.hash == hash &&

            ((k = p.key) == key || (key != null && key.equals(k)))) {

            e = p;

        }

        // Judge that the chain is a red black tree

        else if (p instanceof TreeNode) {

            // Where this represents the current HashMap and tab is the array in the map

            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

        }

        else {  // Judge the chain as a linked list

            for (int binCount = 0; ; ++binCount) {

                // If the current collision node has no subsequent nodes, create a new node directly and append it

                if ((e = p.next) == null) {

                    p.next = newNode(hash, key, value, null);

                    // TREEIFY_THRESHOLD = 8

                    // Starting from 0, if it reaches 7, it means it is full of 8. At this time, it needs to be transferred

                    // Re determine whether to expand capacity or switch to red black tree

                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st

                        treeifyBin(tab, hash);

                    break;

                }

                // If a node with exactly the same key among the collision nodes is found, the old node is replaced with the new node

                if (e.hash == hash &&

                    ((k = e.key) == key || (key != null && key.equals(k))))

                    break;

                p = e;

            }

        }

        // At this time, e is the saved collided node, that is, the old node

        if (e != null) { // existing mapping for key

            V oldValue = e.value;

            // onlyIfAbsent is the call parameter of the method, indicating whether to replace the existing value,

            // In the default put method, this value is false, so the old value will be replaced with the new value

            if (!onlyIfAbsent || oldValue == null)

                e.value = value;

            // Callbacks to allow LinkedHashMap post-actions

            afterNodeAccess(e);

            return oldValue;

        }

    }

    // map changeability operation counter

    // For example, the structural changes of map, such as content increase or decrease or rehash, will directly lead to the concurrency of external map

    // Iteration causes a fail fast problem, and this value is the basis of comparison

    ++modCount;

   

     // size is the number of k-v included in the map

   // Capacity expansion when the maximum capacity is exceeded

    if (++size > threshold)

        resize();

    // Callbacks to allow LinkedHashMap post-actions

    afterNodeInsertion(evict);

    return null;

}

The overall execution process of the put method of HashMap is as follows:

① Judge whether the key value is empty or null for the array table[i]. Otherwise, execute resize() to expand the capacity;

② Calculate the hash value to the inserted array index I according to the key value. If table[i]==null, directly create a new node and add it;

③ Judge whether the first element of table[i] is the same as key. If it is the same, directly overwrite value;

④ Judge whether table[i] is a treeNode, that is, whether table[i] is a red black tree. If it is a red black tree, directly insert key value pairs into the tree;

⑤ Traverse the table[i] to determine whether the length of the linked list is greater than 8. If it is greater than 8, convert the linked list into a red black tree and perform the insertion operation in the red black tree. Otherwise, perform the insertion operation of the linked list; If the key is found to exist during traversal, you can directly overwrite the value;

⑥ After successful insertion, judge whether the actual number of key value pairs exceeds the maximum capacity threshold. If so, expand the capacity.

HashMap summary

HashMap underlying structure? Based on the implementation of Map interface and the structure of array + linked list, red black tree is added after JDK 1.8. The length of linked list > 8 becomes red black tree and < 6 becomes linked list

What happens when two objects have the same hashcode? Hash conflicts. HashMap solves hash conflicts through linked lists

What are the functions of equals() and hashCode() in HashMap? When adding and obtaining a HashMap, you need to hash() through the hashCode() of the key, and then calculate the subscript (n-1 & hash) to obtain the same location to be found. When a conflict (collision) occurs, use key The equals() method goes to the linked list or tree to find the corresponding node

When will HashMap be expanded? When the element of put reaches the capacity multiplied by the load factor, it defaults to 16 * 0.75

Implementation of hash? h = key. hashCode()) ^ (H > > > 16). The hashCode shifts 16 bits to the right without symbols, and then performs bitwise XOR to obtain the hash value of this key. Since the capacity of the hash table is the nth power of 2, at present, the lower and lower bits of the element's hashCode () are the same in many cases, which will lead to conflict (collision), Therefore, after 1.8, a shift operation was performed: the hashCode () of the element and the result of shifting itself to the right by 16 bits were different or different

Is HashMap thread safe? HashMap has high reading and writing efficiency, but because it is asynchronous, that is, reading and writing operations are not protected by locks, it is unsafe in multi-threaded scenarios and prone to data inconsistency. It is highly recommended in single threaded scenarios.

The above is the introduction of HashMap. The video version of this article is explained in detail, and messages can be obtained.

---END--

About the author: mikechen , more than ten years of BAT architecture experience, senior technical expert, once worked in Alibaba, Taobao and Baidu.

Welcome to the official account of individuals: mikechen's Internet architecture, and over ten years of experience in BAT architecture.

In the official account menu bar dialog box, reply to the key words of the structure, you can see my original 300 phase +BAT architecture technology series and 1000+ big factory interview questions answer set.

Keywords: Java MySQL Redis Spring Spring Boot

Added by pietbez on Mon, 17 Jan 2022 20:29:41 +0200