Storage address of GitHub warehouse: https://github.com/hlccd/goSTL
summary
hash map is a two-layer structure, that is, the first layer uses dynamic array as bucket to store elements, and the second layer stores elements with conflicting hash values.
For any element inserted into it, you can calculate the hash value of its key, map it to the corresponding position in the bucket, and then insert it.
The biggest feature of hash mapping is that its lookup, insertion and deletion are O (1), but there may be problems of capacity expansion and shrinkage, and its time overhead will increase.
principle
For hash mapping, it mainly needs to hash the key, map the calculated value to the corresponding position of the hash bucket and insert it.
In this implementation, the dynamic array, that is, the hash bucket reuses the previously implemented vector, and when a hash conflict occurs, the conflicting values are stored in the avl tree.
For the expansion and contraction of its bucket, you can rely on the vector. After the expansion and contraction, you can insert all key values again.
Of course, for hash mapping, the best case is that all hashes will not conflict, that is, the avl tree has only root nodes, which will be faster for insertion and deletion. Therefore, it is necessary to avoid hash conflict as much as possible, that is, to design a good hash function.
Generally speaking, the hash function can adopt the square half method and bit by bit cumulative addition. Considering that the traversal of the numerical class will consume a lot of time in bit accumulation, the square half method is adopted for the numerical type and bit by bit cumulative addition is adopted for the string type.
The initial capacity and minimum capacity of vector are 16. This value is an empirical value. When it is set to 16, the hash conflict is the smallest.
For capacity expansion and capacity reduction, a capacity expansion factor is required, that is, when the current storage capacity reaches the multiple of the capacity expansion factor of the total capacity, capacity expansion is required. On the contrary, capacity reduction is required when the current storage capacity is less than the multiple of the capacity expansion factor after capacity reduction. For the capacity expansion factor, if it is set too small, although the probability of conflict decreases, the space utilization also decreases sharply. If it is set too large, the space utilization increases, but the hash conflict increases and the time cost increases. In this implementation, the expansion factor is set to 0.75. This value is the optimal solution with the least possibility of conflict calculated according to Poisson distribution and exponential distribution.
realization
hashMap hash mapping structure. This instance stores the pointer of the first layer vector and saves the hash function. The hash function in the hash map is passed in when creating. If it is not passed in, it is searched from the default hash function when inserting the first key value.
type hashMap struct { arr *vector.Vector //vector of the first layer hash algorithm.Hasher //hash function size uint64 //Current storage quantity cap uint64 //Capacity of vector mutex sync.Mutex //Concurrency control lock }
The index structure stores the key value structure.
type node struct { value interface{} //Elements stored in nodes num int //Number of this element depth int //The depth of the node left *node //Left node pointer right *node //Right node pointer }
Interface
type hashMaper interface { Iterator() (i *Iterator.Iterator) //Returns an iterator containing all value s in the hashMap container Size() (num uint64) //Returns the number of elements that have been stored in the hashMap Cap() (num uint64) //Returns the capacity of the storage space in the hashMap Clear() //Empty hashMap Empty() (b bool) //Returns whether the hashMap is empty Insert(key, value interface{}) (b bool) //Insert a value indexed by key into the hashMap. If it exists, it will be overwritten Erase(key interface{}) (b bool) //Delete the value indexed by key in the hashMap GetKeys() (keys []interface{}) //Returns all keys in the hashMap Get(key interface{}) (value interface{}) //Find vlue with key as index }
New
Create a hashMap hash mapping container and return it. The initial vector length is 16. If there is an incoming hash function, set the first hash function as the hash function of the hash mapping.
func New(hash ...algorithm.Hasher) (hm *hashMap) { var h algorithm.Hasher if len(hash) == 0 { h = nil } else { h = hash[0] } cmp := func(a, b interface{}) int { ka, kb := a.(*indexes), b.(*indexes) return comparator.GetCmp(ka.key)(ka.key, kb.key) } //Create a new vector and expand it to 16 v := vector.New() for i := 0; i < 16; i++ { //Nested avl tree in vector v.PushBack(avlTree.New(false, cmp)) } return &hashMap{ arr: v, hash: h, size: 0, cap: 16, mutex: sync.Mutex{}, } }
Iterator
Take the hashMap hash map as the receiver, and put the value s of all saved index pointers in the hashMap into the iterator.
func (hm *hashMap) Iterator() (i *Iterator.Iterator) { if hm == nil { return nil } if hm.arr == nil { return nil } hm.mutex.Lock() //Retrieve all value s stored in the hashMap values := make([]interface{}, 0, 1) for i := uint64(0); i < hm.arr.Size(); i++ { avl := hm.arr.At(i).(*avlTree.AvlTree) ite := avl.Iterator() es := make([]interface{}, 0, 1) for j := ite.Begin(); j.HasNext(); j.Next() { idx := j.Value().(*indexes) es = append(es, idx.value) } values = append(values, es...) } //Put all value s into the iterator i = Iterator.New(&values) hm.mutex.Unlock() return i }
Size
Take the hashMap hash map as the receiver and return the number of elements currently contained in the container. If the container is nil, return 0.
func (hm *hashMap) Size() (num uint64) { if hm == nil { return 0 } return hm.size }
Cap
Take the hashMap hash map as the receiver and return the current capacity of the container. If the container is nil, return 0.
func (hm *hashMap) Cap() (num uint64) { if hm == nil { return 0 } return hm.cap }
Clear
Take the hashMap hash map as the receiver, empty the elements carried in the container, set the size of the container to 0, set the capacity to 16, and rebuild and expand the vector to 16.
func (hm *hashMap) Clear() { if hm == nil { return } hm.mutex.Lock() //Rebuild the vector and expand it to 16 v := vector.New() cmp := func(a, b interface{}) int { ka, kb := a.(*indexes), b.(*indexes) return comparator.GetCmp(ka.key)(ka.key, kb.key) } for i := 0; i < 16; i++ { v.PushBack(avlTree.New(false, cmp)) } hm.arr = v hm.size = 0 hm.cap = 16 hm.mutex.Unlock() }
Empty
Take the hashMap hash map as the receiver to judge whether the hash map contains elements. If it contains elements, it is not empty and returns false. If it does not contain elements, it is empty and returns true. If the container does not exist, it returns true.
func (hm *hashMap) Empty() (b bool) { if hm == nil { return false } return hm.size > 0 }
Insert
Take the hashMap hash map as the receiver and inject element e into the hash map. If the same key already exists, overwrite it. The overwrite is still regarded as successful insertion. If the same key does not exist, insert it directly.
func (hm *hashMap) Insert(key, value interface{}) (b bool) { if hm == nil { return false } if hm.arr == nil { return false } if hm.hash == nil { hm.hash = algorithm.GetHash(key) } if hm.hash == nil { return false } hm.mutex.Lock() //Calculate the hash value and find the corresponding avl tree hash := hm.hash(key) % hm.cap avl := hm.arr.At(hash).(*avlTree.AvlTree) idx := &indexes{ key: key, value: value, } //Determine whether it exists in the avl tree if avl.Count(idx) == 0 { //The same key does not exist in the avl tree. Just insert it avl.Insert(idx) hm.size++ if hm.size >= hm.cap/4*3 { //Expand the capacity when the expansion conditions are met hm.expend() } } else { //cover avl.Insert(idx) } hm.mutex.Unlock() return true }
expend
The hashMap hash map is used as the receiver to expand the capacity of the original vector, take out all the key values, let the vector expand the capacity by itself and empty the original nodes, and re insert all the key values into the vector after capacity expansion.
func (hm *hashMap) expend() { //Take out all key values idxs := make([]*indexes, 0, hm.size) for i := uint64(0); i < hm.arr.Size(); i++ { avl := hm.arr.At(i).(*avlTree.AvlTree) ite := avl.Iterator() for j := ite.Begin(); j.HasNext(); j.Next() { idxs = append(idxs, j.Value().(*indexes)) } } cmp := func(a, b interface{}) int { ka, kb := a.(*indexes), b.(*indexes) return comparator.GetCmp(ka.key)(ka.key, kb.key) } //Expand the capacity of vector to its maximum capacity hm.arr.PushBack(avlTree.New(false, cmp)) for i := uint64(0); i < hm.arr.Size()-1; i++ { hm.arr.At(i).(*avlTree.AvlTree).Clear() } for i := hm.arr.Size(); i < hm.arr.Cap(); i++ { hm.arr.PushBack(avlTree.New(false, cmp)) } //Set the vector capacity to the hashMap capacity hm.cap = hm.arr.Cap() //Re insert all key values into the hashMap for i := 0; i < len(idxs); i++ { key, value := idxs[i].key, idxs[i].value hash := hm.hash(key) % hm.cap avl := hm.arr.At(hash).(*avlTree.AvlTree) idx := &indexes{ key: key, value: value, } avl.Insert(idx) } }
Erase
Using the hashMap hash map as the receiver, delete the value with key as the index from the hashMap. If it exists, delete it. Otherwise, it ends directly. After the deletion is successful, size-1 may appear. After the deletion, size < 0.75 / 2 * cap and greater than 16 may appear. At this time, reduce the size.
func (hm *hashMap) Erase(key interface{}) (b bool) { if hm == nil { return false } if hm.arr == nil { return false } if hm.hash == nil { return false } hm.mutex.Lock() //Calculate the hash value of the key hash := hm.hash(key) % hm.cap avl := hm.arr.At(hash).(*avlTree.AvlTree) idx := &indexes{ key: key, value: nil, } //Delete the key value from the corresponding avl tree b = avl.Erase(idx) if b { //The deletion is successful. At this time, size-1 is used, and the shrinkage judgment is performed at the same time hm.size-- if hm.size < hm.cap/8*3 && hm.cap > 16 { hm.shrink() } } hm.mutex.Unlock() return b }
shrink
Take hashMap hash map as the receiver, shrink the original vector, take out all key values, let the vector shrink itself and empty all nodes. When the vector capacity is different from the beginning of shrink, it is regarded as the end of shrink, and then re insert all key values into the vector.
func (hm *hashMap) shrink() { //Take out all key values idxs := make([]*indexes, 0, hm.size) for i := uint64(0); i < hm.arr.Size(); i++ { avl := hm.arr.At(i).(*avlTree.AvlTree) ite := avl.Iterator() for j := ite.Begin(); j.HasNext(); j.Next() { idxs = append(idxs, j.Value().(*indexes)) } } //Shrink the volume. When the cap of the vector is different from the initial, it indicates that the shrink volume is over cap := hm.arr.Cap() for ; cap == hm.arr.Cap(); { hm.arr.PopBack() } hm.cap = hm.arr.Cap() //Put all the key values back into the hashMap for i := 0; i < len(idxs); i++ { key, value := idxs[i].key, idxs[i].value hash := hm.hash(key) % hm.cap avl := hm.arr.At(hash).(*avlTree.AvlTree) idx := &indexes{ key: key, value: value, } avl.Insert(idx) } }
GetKeys
Take the hashMap hash map as the receiver and return all key s in the hashMap.
func (hm *hashMap) GetKeys() (keys []interface{}) { if hm == nil { return nil } if hm.arr == nil { return nil } hm.mutex.Lock() keys = make([]interface{}, 0, 1) for i := uint64(0); i < hm.arr.Size(); i++ { avl := hm.arr.At(i).(*avlTree.AvlTree) ite := avl.Iterator() es := make([]interface{}, 0, 1) for j := ite.Begin(); j.HasNext(); j.Next() { idx := j.Value().(*indexes) es = append(es, idx.key) } keys = append(keys, es...) } hm.mutex.Unlock() return keys }
Get
Take the hashMap hash map as the receiver, find the corresponding value with key and return it.
func (hm *hashMap) Get(key interface{}) (value interface{}) { if hm == nil { return } if hm.arr == nil { return } if hm.hash == nil { hm.hash = algorithm.GetHash(key) } if hm.hash == nil { return } hm.mutex.Lock() //Calculate hash value hash := hm.hash(key) % hm.cap //Find the key value corresponding to the hash value from the avl tree info := hm.arr.At(hash).(*avlTree.AvlTree).Find(&indexes{key: key, value: nil}) hm.mutex.Unlock() if info == nil { return nil } return info.(*indexes).value }
Use example
package main import ( "fmt" "github.com/hlccd/goSTL/data_structure/hashMap" ) func main() { m:=hashMap.New() for i:=1;i<=17;i++{ m.Insert(i,i) } fmt.Println("size=",m.Size()) keys:=m.GetKeys() fmt.Println(keys) for i:=0;i< len(keys);i++{ fmt.Println(m.Get(keys[i])) } for i:=m.Iterator().Begin();i.HasNext();i.Next(){ fmt.Println(i.Value()) } for i:=0;i< len(keys);i++{ m.Erase(keys[i]) } }
Note: since the addition and deletion processes in the process are executed concurrently, the results are not exactly the same as those in the following example
size= 17
[1 8 16 2 14 3 4 9 12 5 15 17 6 10 13 7 11]
1
8
16
2
14
3
4
9
12
5
15
17
6
10
13
7
11
1
8
16
2
14
3
4
9
12
5
15
17
6
10
13
7
11
Time cost
Note: random numbers are inserted, that is, the problem of repeated insertion coverage may occur, so the value will be ≤ max
package main import ( "fmt" "github.com/hlccd/goSTL/data_structure/hashMap" "math/rand" "time" ) func main() { max := 3000000 num := 0 r := rand.New(rand.NewSource(time.Now().UnixNano())) tm := time.Now() m := make(map[int]bool) for i := 0; i < max; i++ { num=r.Intn(4294967295) m[num] = true } fmt.Println("map Time consuming:", time.Since(tm)) thm := time.Now() hm := hashMap.New() for i := 0; i < max; i++ { num=r.Intn(4294967295) hm.Insert(uint64(num), true) } fmt.Println("hashmap Add elapsed time:", time.Since(thm)) thmd := time.Now() keys:=hm.GetKeys() for i := 0; i < len(keys); i++ { hm.Get(keys[i]) } fmt.Println("size=",hm.Size()) fmt.Println("hashmap Traversal time consumption:", time.Since(thmd)) }
map consumption time: 484.6992ms
hashmap addition time: 4.6206379s
size= 2999043
hashmap traversal time: 2.2020792s