Data structure STL -- implementation of hashMap HashMap by golang

Storage address of GitHub warehouse: https://github.com/hlccd/goSTL

summary

hash map is a two-layer structure, that is, the first layer uses dynamic array as bucket to store elements, and the second layer stores elements with conflicting hash values.

For any element inserted into it, you can calculate the hash value of its key, map it to the corresponding position in the bucket, and then insert it.

The biggest feature of hash mapping is that its lookup, insertion and deletion are O (1), but there may be problems of capacity expansion and shrinkage, and its time overhead will increase.

principle

For hash mapping, it mainly needs to hash the key, map the calculated value to the corresponding position of the hash bucket and insert it.

In this implementation, the dynamic array, that is, the hash bucket reuses the previously implemented vector, and when a hash conflict occurs, the conflicting values are stored in the avl tree.

For the expansion and contraction of its bucket, you can rely on the vector. After the expansion and contraction, you can insert all key values again.

Of course, for hash mapping, the best case is that all hashes will not conflict, that is, the avl tree has only root nodes, which will be faster for insertion and deletion. Therefore, it is necessary to avoid hash conflict as much as possible, that is, to design a good hash function.

Generally speaking, the hash function can adopt the square half method and bit by bit cumulative addition. Considering that the traversal of the numerical class will consume a lot of time in bit accumulation, the square half method is adopted for the numerical type and bit by bit cumulative addition is adopted for the string type.

The initial capacity and minimum capacity of vector are 16. This value is an empirical value. When it is set to 16, the hash conflict is the smallest.

For capacity expansion and capacity reduction, a capacity expansion factor is required, that is, when the current storage capacity reaches the multiple of the capacity expansion factor of the total capacity, capacity expansion is required. On the contrary, capacity reduction is required when the current storage capacity is less than the multiple of the capacity expansion factor after capacity reduction. For the capacity expansion factor, if it is set too small, although the probability of conflict decreases, the space utilization also decreases sharply. If it is set too large, the space utilization increases, but the hash conflict increases and the time cost increases. In this implementation, the expansion factor is set to 0.75. This value is the optimal solution with the least possibility of conflict calculated according to Poisson distribution and exponential distribution.

realization

hashMap hash mapping structure. This instance stores the pointer of the first layer vector and saves the hash function. The hash function in the hash map is passed in when creating. If it is not passed in, it is searched from the default hash function when inserting the first key value.

type hashMap struct {
	arr   *vector.Vector   //vector of the first layer
	hash  algorithm.Hasher //hash function 
	size  uint64           //Current storage quantity
	cap   uint64           //Capacity of vector
	mutex sync.Mutex       //Concurrency control lock
}

The index structure stores the key value structure.

type node struct {
	value interface{} //Elements stored in nodes
	num   int         //Number of this element
	depth int         //The depth of the node
	left  *node       //Left node pointer
	right *node       //Right node pointer
}

Interface

type hashMaper interface {
	Iterator() (i *Iterator.Iterator)        //Returns an iterator containing all value s in the hashMap container
	Size() (num uint64)                      //Returns the number of elements that have been stored in the hashMap
	Cap() (num uint64)                       //Returns the capacity of the storage space in the hashMap
	Clear()                                  //Empty hashMap
	Empty() (b bool)                         //Returns whether the hashMap is empty
	Insert(key, value interface{}) (b bool)  //Insert a value indexed by key into the hashMap. If it exists, it will be overwritten
	Erase(key interface{}) (b bool)          //Delete the value indexed by key in the hashMap
	GetKeys() (keys []interface{})           //Returns all keys in the hashMap
	Get(key interface{}) (value interface{}) //Find vlue with key as index
}

New

Create a hashMap hash mapping container and return it. The initial vector length is 16. If there is an incoming hash function, set the first hash function as the hash function of the hash mapping.

func New(hash ...algorithm.Hasher) (hm *hashMap) {
	var h algorithm.Hasher
	if len(hash) == 0 {
		h = nil
	} else {
		h = hash[0]
	}
	cmp := func(a, b interface{}) int {
		ka, kb := a.(*indexes), b.(*indexes)
		return comparator.GetCmp(ka.key)(ka.key, kb.key)
	}
	//Create a new vector and expand it to 16
	v := vector.New()
	for i := 0; i < 16; i++ {
		//Nested avl tree in vector
		v.PushBack(avlTree.New(false, cmp))
	}
	return &hashMap{
		arr:   v,
		hash:  h,
		size:  0,
		cap:   16,
		mutex: sync.Mutex{},
	}
}

Iterator

Take the hashMap hash map as the receiver, and put the value s of all saved index pointers in the hashMap into the iterator.

func (hm *hashMap) Iterator() (i *Iterator.Iterator) {
	if hm == nil {
		return nil
	}
	if hm.arr == nil {
		return nil
	}
	hm.mutex.Lock()
	//Retrieve all value s stored in the hashMap
	values := make([]interface{}, 0, 1)
	for i := uint64(0); i < hm.arr.Size(); i++ {
		avl := hm.arr.At(i).(*avlTree.AvlTree)
		ite := avl.Iterator()
		es := make([]interface{}, 0, 1)
		for j := ite.Begin(); j.HasNext(); j.Next() {
			idx := j.Value().(*indexes)
			es = append(es, idx.value)
		}
		values = append(values, es...)
	}
	//Put all value s into the iterator
	i = Iterator.New(&values)
	hm.mutex.Unlock()
	return i
}

Size

Take the hashMap hash map as the receiver and return the number of elements currently contained in the container. If the container is nil, return 0.

func (hm *hashMap) Size() (num uint64) {
	if hm == nil {
		return 0
	}
	return hm.size
}

Cap

Take the hashMap hash map as the receiver and return the current capacity of the container. If the container is nil, return 0.

func (hm *hashMap) Cap() (num uint64) {
	if hm == nil {
		return 0
	}
	return hm.cap
}

Clear

Take the hashMap hash map as the receiver, empty the elements carried in the container, set the size of the container to 0, set the capacity to 16, and rebuild and expand the vector to 16.

func (hm *hashMap) Clear() {
	if hm == nil {
		return
	}
	hm.mutex.Lock()
	//Rebuild the vector and expand it to 16
	v := vector.New()
	cmp := func(a, b interface{}) int {
		ka, kb := a.(*indexes), b.(*indexes)
		return comparator.GetCmp(ka.key)(ka.key, kb.key)
	}
	for i := 0; i < 16; i++ {
		v.PushBack(avlTree.New(false, cmp))
	}
	hm.arr = v
	hm.size = 0
	hm.cap = 16
	hm.mutex.Unlock()
}

Empty

Take the hashMap hash map as the receiver to judge whether the hash map contains elements. If it contains elements, it is not empty and returns false. If it does not contain elements, it is empty and returns true. If the container does not exist, it returns true.

func (hm *hashMap) Empty() (b bool) {
	if hm == nil {
		return false
	}
	return hm.size > 0
}

Insert

Take the hashMap hash map as the receiver and inject element e into the hash map. If the same key already exists, overwrite it. The overwrite is still regarded as successful insertion. If the same key does not exist, insert it directly.

func (hm *hashMap) Insert(key, value interface{}) (b bool) {
	if hm == nil {
		return false
	}
	if hm.arr == nil {
		return false
	}
	if hm.hash == nil {
		hm.hash = algorithm.GetHash(key)
	}
	if hm.hash == nil {
		return false
	}
	hm.mutex.Lock()
	//Calculate the hash value and find the corresponding avl tree
	hash := hm.hash(key) % hm.cap
	avl := hm.arr.At(hash).(*avlTree.AvlTree)
	idx := &indexes{
		key:   key,
		value: value,
	}
	//Determine whether it exists in the avl tree
	if avl.Count(idx) == 0 {
		//The same key does not exist in the avl tree. Just insert it
		avl.Insert(idx)
		hm.size++
		if hm.size >= hm.cap/4*3 {
			//Expand the capacity when the expansion conditions are met
			hm.expend()
		}
	} else {
		//cover
		avl.Insert(idx)
	}
	hm.mutex.Unlock()
	return true
}
expend

The hashMap hash map is used as the receiver to expand the capacity of the original vector, take out all the key values, let the vector expand the capacity by itself and empty the original nodes, and re insert all the key values into the vector after capacity expansion.

func (hm *hashMap) expend() {
	//Take out all key values
	idxs := make([]*indexes, 0, hm.size)
	for i := uint64(0); i < hm.arr.Size(); i++ {
		avl := hm.arr.At(i).(*avlTree.AvlTree)
		ite := avl.Iterator()
		for j := ite.Begin(); j.HasNext(); j.Next() {
			idxs = append(idxs, j.Value().(*indexes))
		}
	}
	cmp := func(a, b interface{}) int {
		ka, kb := a.(*indexes), b.(*indexes)
		return comparator.GetCmp(ka.key)(ka.key, kb.key)
	}
	//Expand the capacity of vector to its maximum capacity
	hm.arr.PushBack(avlTree.New(false, cmp))
	for i := uint64(0); i < hm.arr.Size()-1; i++ {
		hm.arr.At(i).(*avlTree.AvlTree).Clear()
	}
	for i := hm.arr.Size(); i < hm.arr.Cap(); i++ {
		hm.arr.PushBack(avlTree.New(false, cmp))
	}
	//Set the vector capacity to the hashMap capacity
	hm.cap = hm.arr.Cap()
	//Re insert all key values into the hashMap
	for i := 0; i < len(idxs); i++ {
		key, value := idxs[i].key, idxs[i].value
		hash := hm.hash(key) % hm.cap
		avl := hm.arr.At(hash).(*avlTree.AvlTree)
		idx := &indexes{
			key:   key,
			value: value,
		}
		avl.Insert(idx)
	}
}

Erase

Using the hashMap hash map as the receiver, delete the value with key as the index from the hashMap. If it exists, delete it. Otherwise, it ends directly. After the deletion is successful, size-1 may appear. After the deletion, size < 0.75 / 2 * cap and greater than 16 may appear. At this time, reduce the size.

func (hm *hashMap) Erase(key interface{}) (b bool) {
	if hm == nil {
		return false
	}
	if hm.arr == nil {
		return false
	}
	if hm.hash == nil {
		return false
	}
	hm.mutex.Lock()
	//Calculate the hash value of the key
	hash := hm.hash(key) % hm.cap
	avl := hm.arr.At(hash).(*avlTree.AvlTree)
	idx := &indexes{
		key:   key,
		value: nil,
	}
	//Delete the key value from the corresponding avl tree
	b = avl.Erase(idx)
	if b {
		//The deletion is successful. At this time, size-1 is used, and the shrinkage judgment is performed at the same time
		hm.size--
		if hm.size < hm.cap/8*3 && hm.cap > 16 {
			hm.shrink()
		}
	}
	hm.mutex.Unlock()
	return b
}
shrink

Take hashMap hash map as the receiver, shrink the original vector, take out all key values, let the vector shrink itself and empty all nodes. When the vector capacity is different from the beginning of shrink, it is regarded as the end of shrink, and then re insert all key values into the vector.

func (hm *hashMap) shrink() {
	//Take out all key values
	idxs := make([]*indexes, 0, hm.size)
	for i := uint64(0); i < hm.arr.Size(); i++ {
		avl := hm.arr.At(i).(*avlTree.AvlTree)
		ite := avl.Iterator()
		for j := ite.Begin(); j.HasNext(); j.Next() {
			idxs = append(idxs, j.Value().(*indexes))
		}
	}
	//Shrink the volume. When the cap of the vector is different from the initial, it indicates that the shrink volume is over
	cap := hm.arr.Cap()
	for ; cap == hm.arr.Cap(); {
		hm.arr.PopBack()
	}
	hm.cap = hm.arr.Cap()
	//Put all the key values back into the hashMap
	for i := 0; i < len(idxs); i++ {
		key, value := idxs[i].key, idxs[i].value
		hash := hm.hash(key) % hm.cap
		avl := hm.arr.At(hash).(*avlTree.AvlTree)
		idx := &indexes{
			key:   key,
			value: value,
		}
		avl.Insert(idx)
	}
}

GetKeys

Take the hashMap hash map as the receiver and return all key s in the hashMap.

func (hm *hashMap) GetKeys() (keys []interface{}) {
	if hm == nil {
		return nil
	}
	if hm.arr == nil {
		return nil
	}
	hm.mutex.Lock()
	keys = make([]interface{}, 0, 1)
	for i := uint64(0); i < hm.arr.Size(); i++ {
		avl := hm.arr.At(i).(*avlTree.AvlTree)
		ite := avl.Iterator()
		es := make([]interface{}, 0, 1)
		for j := ite.Begin(); j.HasNext(); j.Next() {
			idx := j.Value().(*indexes)
			es = append(es, idx.key)
		}
		keys = append(keys, es...)
	}
	hm.mutex.Unlock()
	return keys
}

Get

Take the hashMap hash map as the receiver, find the corresponding value with key and return it.

func (hm *hashMap) Get(key interface{}) (value interface{}) {
	if hm == nil {
		return
	}
	if hm.arr == nil {
		return
	}
	if hm.hash == nil {
		hm.hash = algorithm.GetHash(key)
	}
	if hm.hash == nil {
		return
	}
	hm.mutex.Lock()
	//Calculate hash value
	hash := hm.hash(key) % hm.cap
	//Find the key value corresponding to the hash value from the avl tree
	info := hm.arr.At(hash).(*avlTree.AvlTree).Find(&indexes{key: key, value: nil})
	hm.mutex.Unlock()
	if info == nil {
		return nil
	}
	return info.(*indexes).value
}

Use example

package main

import (
	"fmt"
	"github.com/hlccd/goSTL/data_structure/hashMap"
)

func main() {
	m:=hashMap.New()
	for i:=1;i<=17;i++{
		m.Insert(i,i)
	}
	fmt.Println("size=",m.Size())
	keys:=m.GetKeys()
	fmt.Println(keys)
	for i:=0;i< len(keys);i++{
		fmt.Println(m.Get(keys[i]))
	}
	for i:=m.Iterator().Begin();i.HasNext();i.Next(){
		fmt.Println(i.Value())
	}
	for i:=0;i< len(keys);i++{
		m.Erase(keys[i])
	}
}

Note: since the addition and deletion processes in the process are executed concurrently, the results are not exactly the same as those in the following example

size= 17
[1 8 16 2 14 3 4 9 12 5 15 17 6 10 13 7 11]
1
8
16
2
14
3
4
9
12
5
15
17
6
10
13
7
11
1
8
16
2
14
3
4
9
12
5
15
17
6
10
13
7
11

Time cost

Note: random numbers are inserted, that is, the problem of repeated insertion coverage may occur, so the value will be ≤ max

package main

import (
	"fmt"
	"github.com/hlccd/goSTL/data_structure/hashMap"
	"math/rand"
	"time"
)

func main() {
	max := 3000000
	num := 0
	r := rand.New(rand.NewSource(time.Now().UnixNano()))
	tm := time.Now()
	m := make(map[int]bool)
	for i := 0; i < max; i++ {
		num=r.Intn(4294967295)
		m[num] = true
	}
	fmt.Println("map Time consuming:", time.Since(tm))
	thm := time.Now()
	hm := hashMap.New()
	for i := 0; i < max; i++ {
		num=r.Intn(4294967295)
		hm.Insert(uint64(num), true)
	}
	fmt.Println("hashmap Add elapsed time:", time.Since(thm))
	thmd := time.Now()
	keys:=hm.GetKeys()
	for i := 0; i < len(keys); i++ {
		hm.Get(keys[i])
	}
	fmt.Println("size=",hm.Size())
	fmt.Println("hashmap Traversal time consumption:", time.Since(thmd))
}

map consumption time: 484.6992ms
hashmap addition time: 4.6206379s
size= 2999043
hashmap traversal time: 2.2020792s

Keywords: Go data structure

Added by always_confused on Sat, 25 Dec 2021 20:31:30 +0200