HashMap source code analysis

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    static final int hash(Object key) {
        int h;
        //The final hash value is obtained by performing or calculating the high and low bits of the hash value of the key
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

For the incoming key, call its hashcode() method to calculate the 32-bit hash value, move the hash value > > > 16 (move the unsigned right 16 bits, move the high 16 bits to the low bit, discard the low bit, and supplement the high 16 bits with 0), and then perform ^ (or) operation between the hash value and the moved hash value (1 if different). Get the final hash value.

Why do I have to perform or operation on the high and low hash values

Principle analysis: Hashmap uses an array to save data, and the index bit of the element is determined by the hash value.

tab [i = (n - 1) & hash] n indicates the length of the array. tab is an array of nodes. (n - 1) & hash (actually hash%length)

Assuming that the length of the prime group is 8 (the actual default minimum value is 16), the hash value calculated by the hashcode() method of key is 78897121, and the binary conversion is 100101100111101111100001. The & operations of the two are as follows

0000 0000 0000 0000 0000 0000 0000 0111

0000 0100 1011 0011 1101 1111 1110 0001

result

0000 0000 0000 0000 0000 0000 0000 0001

Since the & operation is the same as 1 to get 1, in fact, only the low bit 001 of the hash value participates in the operation (the number of bits involved depends on the length of the number, but generally the capacity of hashmap is not particularly large), which will affect the operation result. In order to minimize the probability of hash collision, the high and low hash values of the key are also or calculated, so that both the high and low hash values can participate in the index position operation.

Why are the high and low bits also or operations rather than & operations or | operations

This is because the & operation of 1 of the same 1 will bias the value on the bit bit of the hash value to and 0, and the operation of 1 to 1 will bias the value on the bit bit to 1^ Or the operation is different to 1, which retains the characteristics of high and low as much as possible.

Why is the HashMap capacity to the n th power of 2

In order to improve the operation efficiency of calculating the index position of elements. Using hash value to calculate the index position of elements in the array, we can easily think of the formula to determine the index position through modular operation e.hash% capacity. HashMap also uses this formula to consider that the computer median operation efficiency is much higher than that of mathematical operators. Therefore, HashMap uses e.hash & (capacity - 1) to replace the modular formula e.hash% capacity. Bit operation can be used to replace modular operation. The key lies in the special design that the size of capacity is the n-power of 2.

E.hash & (capacity - 1) = e.hash% capacity

From the binary point of view, e.hash / capacity = e.hash / 2 ⁿ = e.hash > > n, that is, move e.hash to the right by n bits, and the quotient of e.hash / 2 ⁿ is obtained. The removed part (lower n bits) is e.hash% 2 ⁿ, that is, the remainder. The key is how to efficiently obtain the low n bits of hash value.

Given that the binary form of 2 ⁿ is 1 followed by N zeros, the binary form of 2 ⁿ - 1 is n ones.
E.g. 8 = 2 ³， Its binary form is 1000, 7 = 2 ³ - 1. Its binary form is 111.

Taking e.hash as a positive number as an ex amp le (the derivation process of negative numbers is relatively complex and will not be discussed). According to the understanding of bitwise and (&) operation, e.hash & (2 ⁿ - 1) is to obtain the low n bits of e.hash, which is also a remainder.

0000 0000 0000 0000 0000 0000 0000 0111

0000 0100 1011 0011 1101 1111 1110 0101

have to

0000 0000 0000 0000 0000 0000 0000 0101

Therefore, we can deduce e.hash & (capacity - 1) = e.hash% capacity.

During capacity expansion, the index position of elements in the new array needs to be recalculated. However, e. hash & (capacity - 1) is not used again in hashmap, but a special law is given

When e.hash & oldcap = = 0, the index value of the node in the new array is the same as the old index value.
When e.hash & oldcap= 0, the index value of the node in the new array is the old index value + the capacity of the old array.

This also reflects the cleverness that the size of capacity is the n-th power of 2.

Set: before capacity expansion, the index value of node e in the old array is x; After capacity expansion, the index value of node e in the new array is y

When e.hash & oldcap = = 0, y = x

In the old array, the modular formula is e.hash & (oldcap - 1) = x, oldCap = 2 ⁿ, and 2 ⁿ - 1 is converted into binary representation as n 1s. The result obtained from the & operation (1 of the same 1) is the lower n bits of the hash value.

In the new array, the modular formula is e.hash & (newcap - 1) = y, newCap = 2oldCap = 2*2 ⁿ, which is converted into binary into n+1 ones. According to the operation of &, the lower n+1 bits of hash value are obtained.

If you want y=x, the value of the lower n bit of the hash value is the same as the value of the lower n+1 bit, then the n+1 bit of the hash value must be 0. 111 is equal to 0111, for example.

When e.hash & oldcap = e.hash & 2 ⁿ = 0, the nth + 1st bit of e.hash is 0. 2 ⁿ is converted into binary representation. 1 is followed by n zeros. According to the & operation, if you want to be equal to 0, the n+1 bit of the hash value must be equal to 0.

Deduce e.hash & oldcap= At 0, y = x + oldCap

According to the previous derivation, when e.hash & oldcap = e.hash & 2 ⁿ= 0, then the n+1 bit of the hash value must be equal to 1. Then E. hash & (newcap - 1) = y, newCap = 2oldCap = 2*2 ⁿ the result y is the lower n+1 bit of the hash value, and the n+1 bit is 1.

oldCap = 2 ⁿ converted to binary is 1 followed by n zeros, x = the lower n bits of the hash value, so y = x+2 ⁿ (oldCap)