# HashMap source code analysis

```    public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}

static final int hash(Object key) {
int h;
//The final hash value is obtained by performing or calculating the high and low bits of the hash value of the key
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}```

For the incoming key, call its hashcode() method to calculate the 32-bit hash value, move the hash value > > > 16 (move the unsigned right 16 bits, move the high 16 bits to the low bit, discard the low bit, and supplement the high 16 bits with 0), and then perform ^ (or) operation between the hash value and the moved hash value (1 if different). Get the final hash value.

## Why do I have to perform or operation on the high and low hash values

Principle analysis: Hashmap uses an array to save data, and the index bit of the element is determined by the hash value.

tab [i = (n - 1) & hash] n indicates the length of the array. tab is an array of nodes. (n - 1) & hash (actually hash%length)

Assuming that the length of the prime group is 8 (the actual default minimum value is 16), the hash value calculated by the hashcode() method of key is 78897121, and the binary conversion is 100101100111101111100001. The & operations of the two are as follows

0000 0000 0000 0000 0000 0000 0000 0111

&

0000 0100 1011 0011 1101 1111 1110 0001

result

0000 0000 0000 0000 0000 0000 0000 0001

Since the & operation is the same as 1 to get 1, in fact, only the low bit 001 of the hash value participates in the operation (the number of bits involved depends on the length of the number, but generally the capacity of hashmap is not particularly large), which will affect the operation result. In order to minimize the probability of hash collision, the high and low hash values of the key are also or calculated, so that both the high and low hash values can participate in the index position operation.

## Why are the high and low bits also or operations rather than & operations or | operations

This is because the & operation of 1 of the same 1 will bias the value on the bit bit of the hash value to and 0, and the operation of 1 to 1 will bias the value on the bit bit to 1^ Or the operation is different to 1, which retains the characteristics of high and low as much as possible.

## Why is the HashMap capacity to the n th power of 2

In order to improve the operation efficiency of calculating the index position of elements. Using hash value to calculate the index position of elements in the array, we can easily think of the formula to determine the index position through modular operation e.hash% capacity. HashMap also uses this formula to consider that the computer median operation efficiency is much higher than that of mathematical operators. Therefore, HashMap uses e.hash & (capacity - 1) to replace the modular formula e.hash% capacity. Bit operation can be used to replace modular operation. The key lies in the special design that the size of capacity is the n-power of 2.

### E.hash & (capacity - 1) = e.hash% capacity

From the binary point of view, e.hash / capacity = e.hash / 2 ⁿ = e.hash > > n, that is, move e.hash to the right by n bits, and the quotient of e.hash / 2 ⁿ is obtained. The removed part (lower n bits) is e.hash% 2 ⁿ, that is, the remainder. The key is how to efficiently obtain the low n bits of hash value.

Given that the binary form of 2 ⁿ is 1 followed by N zeros, the binary form of 2 ⁿ - 1 is n ones.
E.g. 8 = 2 ³， Its binary form is 1000, 7 = 2 ³ - 1. Its binary form is 111.

Taking e.hash as a positive number as an ex amp le (the derivation process of negative numbers is relatively complex and will not be discussed). According to the understanding of bitwise and (&) operation, e.hash & (2 ⁿ - 1) is to obtain the low n bits of e.hash, which is also a remainder.

0000 0000 0000 0000 0000 0000 0000 0111

&

0000 0100 1011 0011 1101 1111 1110 0101

have to

0000 0000 0000 0000 0000 0000 0000 0101

Therefore, we can deduce e.hash & (capacity - 1) = e.hash% capacity.

During capacity expansion, the index position of elements in the new array needs to be recalculated. However, e. hash & (capacity - 1) is not used again in hashmap, but a special law is given

When e.hash & oldcap = = 0, the index value of the node in the new array is the same as the old index value.
When e.hash & oldcap= 0, the index value of the node in the new array is the old index value + the capacity of the old array.

This also reflects the cleverness that the size of capacity is the n-th power of 2.

Set: before capacity expansion, the index value of node e in the old array is x; After capacity expansion, the index value of node e in the new array is y

### When e.hash & oldcap = = 0, y = x

In the old array, the modular formula is e.hash & (oldcap - 1) = x, oldCap = 2 ⁿ, and 2 ⁿ - 1 is converted into binary representation as n 1s. The result obtained from the & operation (1 of the same 1) is the lower n bits of the hash value.

In the new array, the modular formula is e.hash & (newcap - 1) = y, newCap = 2oldCap = 2*2 ⁿ, which is converted into binary into n+1 ones. According to the operation of &, the lower n+1 bits of hash value are obtained.

If you want y=x, the value of the lower n bit of the hash value is the same as the value of the lower n+1 bit, then the n+1 bit of the hash value must be 0. 111 is equal to 0111, for example.

When e.hash & oldcap = e.hash & 2 ⁿ = 0, the nth + 1st bit of e.hash is 0. 2 ⁿ is converted into binary representation. 1 is followed by n zeros. According to the & operation, if you want to be equal to 0, the n+1 bit of the hash value must be equal to 0.

### Deduce e.hash & oldcap= At 0, y = x + oldCap

According to the previous derivation, when e.hash & oldcap = e.hash & 2 ⁿ= 0, then the n+1 bit of the hash value must be equal to 1. Then E. hash & (newcap - 1) = y, newCap = 2oldCap = 2*2 ⁿ the result y is the lower n+1 bit of the hash value, and the n+1 bit is 1.

oldCap = 2 ⁿ converted to binary is 1 followed by n zeros, x = the lower n bits of the hash value, so y = x+2 ⁿ (oldCap)

## HashMap source code analysis

```    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
//p represents the old node
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
//1. The table array is empty. You need to call resize() to initialize the capacity
n = (tab = resize()).length;
//2. (n-1) & hash calculates the index value. If the index position key is null, it is directly stored in the index position
if ((p = tab[i = (n - 1) & hash]) == null)
//The index position is null and can be stored directly
tab[i] = newNode(hash, key, value, null);
else {
//3. If key exists
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
//3.1 if the index position key exists, overwrite the value value
e = p;
else if (p instanceof TreeNode)
//3.2 there is a tree at the index position, so it is necessary to traverse the tree node. If the same key is found in the traversal process, the old value will be returned. Otherwise, it will be added to the tree and the new balance tree will be created.
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
//If you traverse the list and find that the position of key 3.3 is not the same, you need to insert it into the list. If the linked list length > = 8 and the array length is greater than 64, convert the linked list into a tree
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
//If the linked list length > = 8 and the array length is greater than 64, convert the linked list into a tree
treeifyBin(tab, hash);
break;
}
//Find the same key and jump out of the loop
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
//e!=null indicates that the same ket is found and the value is overwritten
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
//If the number of key value pairs in hashmap is greater than the expansion threshold, expand the array
resize();
afterNodeInsertion(evict);
return null;
}```
```    final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
//ps1
//If there are already elements in the original array
if (oldCap >= MAXIMUM_CAPACITY) {
//If the capacity of the original array has reached the maximum capacity and cannot be expanded, return it as it is
threshold = Integer.MAX_VALUE;
return oldTab;
}
//Double the capacity and expansion threshold
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
//ps2
//If the initial capacity is specified when constructing the map, it will enter here when the array is empty
newCap = oldThr;
else {               // zero initial threshold signifies using defaults
//ps3
//Initialization capacity and expansion threshold are the default values
newCap = DEFAULT_INITIAL_CAPACITY;
}
if (newThr == 0) {
//In the case of ps2, initialize the expansion threshold
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
// If the old array is not empty, it indicates that it is a capacity expansion operation, which involves the transfer of elements
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
// If the current position element is not empty, it needs to be transferred to the new array
if ((e = oldTab[j]) != null) {
//Release the elements in the old array position, and the elements will be garbage collected
oldTab[j] = null;
if (e.next == null)
//If there is no next element in this element, it means that there is no linked list in this position,
//Map directly to the new array using hash
//When the capacity of HashMap is the nth power of 2, the modular operation can be replaced by bitwise sum operation in both modular formula and expansion formula, which greatly improves the operation efficiency
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
// The low order above refers to 0 to oldCap-1 of the new array, and the high order specifies oldCap to newCap - 1
Node<K,V> next;
do {
next = e.next;
//When e.hash & oldcap = = 0, the index value of the node in the new array is the same as the old index value.
//The low linked list stores the linked list on the index value of the old array, and then puts the low linked list at the index value position of the new array
if ((e.hash & oldCap) == 0) {
if (loTail == null)
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
//The new array directly stores the linked list at the index position j of the old array
}
if (hiTail != null) {
hiTail.next = null;
//The new array j + oldCap is stored on the index bit, and the linked list on the index bit J of the old array
}
}
}
}
}
return newTab;
}```
```        //Threshold value of red black tree back to linked list
static final int UNTREEIFY_THRESHOLD = 6;
/** This method will be called when HashMap is expanded to: ((treenode < K, V >) e) split(this, newTab, j, oldCap);
* @param map Represents the HashMap to be expanded
* @param tab Represents the newly created array, which is used to store the data migrated from the old array
* @param index Represents the index of the old array
* @param bit Represents the length of the old array
*/
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
//Is ((treenode < K, V >) e) this object calls the split() method, so this refers to the (treenode < K, V >) e object
TreeNode<K,V> b = this;
// Relink into lo and hi lists, preserving order
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null;
//Used to determine whether the linked list needs to be converted into a red black tree
int lc = 0, hc = 0;
//Traverse the entire tree
for (TreeNode<K,V> e = b, next; e != null; e = next) {
//Cache the next node of the currently traversed node
next = (TreeNode<K,V>)e.next;
//Buffer the next node and set the reference of the next node to be empty to facilitate GC recycling
e.next = null;
if ((e.hash & bit) == 0) {
//When e.hash & oldcap = = 0, the index value of the node in the new array is the same as the old index value.
//Put these nodes into the low-level linked list, and then directly put the linked list in the corresponding index position of the new array
if ((e.prev = loTail) == null)
else
loTail.next = e;
loTail = e;
//The length of the linked list increases automatically
++lc;
}
else {
//Put these nodes into the low linked list,
if ((e.prev = hiTail) == null)
else
hiTail.next = e;
hiTail = e;
++hc;
}
}

if (lc <= UNTREEIFY_THRESHOLD)
//If the threshold is not reached, convert the tree node into a linked list node
//Because the element in the linked list is TreeNode, TreeNode is converted into Node here
else {
//The length of the low linked list reaches the threshold of converting red black tree
//If the high-level linked list is not empty, it indicates that the original red black tree has been split into two linked lists, and the red black tree needs to be reconstructed
//If the high-order linked list is empty, the low-order linked list is the previous red black tree
}
}
//ditto

if (hc <= UNTREEIFY_THRESHOLD)
else {
}
}
}```

## Reasons for unsafe HahsMap threads

In putval method, judge whether to develop hash collision. If it does not occur, it will be directly inserted into the array. The code is as follows:

```        if ((p = tab[i = (n - 1) & hash]) == null)
//The index position is null and can be stored directly
tab[i] = newNode(hash, key, value, null);```

If two threads call the put method at the same time and the storage index location is the same, it is judged that there are no elements in the location, and overwriting will occur.

And + + size in the source code are not atomic operations, which will lead to concurrency problems

Reference articles

https://blog.csdn.net/weixin_52801742/article/details/114162110

https://blog.csdn.net/weixin_38106322/article/details/104422422

Keywords: Java source code analysis HashMap

Added by Salis on Wed, 09 Feb 2022 05:04:35 +0200