Go language Map basic principle entry level

Map underlying principle

map is a data structure used to store a series of unordered key value pairs, which are stored based on keys, so that we can quickly find the corresponding values through keys.

Introduction to internal implementation

The bottom layer of Go is a Hash list. The Hash list contains a group of pokes. When storing, deleting and searching key value pairs, all operations need to select a poke. The corresponding poke can be found by passing the key specified during operation mapping to the mapped hash function for calculation. The distribution of key value pairs is balanced by a reasonable number of buckets, which greatly improves the search efficiency.

Chestnuts:

p := map[string]string{"Red":"#da23"}

The above declares a map, and the key values are all string types. First, take a look at how the key is a string and stored at the bottom of the map.

Taking the string as the key of the map, the bottom layer will calculate the hash value through the hash function. However, the hash value represents the poke sequence number that can be used for storage within the sequence number range of the map. The obtained hash value is used to select the bucket, and is also used to store and find the specified key value pair, which is very convenient.

Deep analysis of map underlying

Go's map has its own underlying implementation principle, of which the core is implemented by hmap and bmap.

(1) What actions will the bottom layer do when Map is initialized?

Suppose you initialize a map with a capacity of 5 elements

mymap := make(map[string]string,10)

After you create a map, the bottom layer will create an hmap structure object, and then the hamp structure will be initialized, such as generating a hash factor hash and assigning it to the hamp object, initializing count to 0, and calculating the number of pokes. The source code location is: go / SRC / Runtime / map go

Source code:

func makemap(t *maptype, hint int, h *hmap) *hmap {
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// Initialize Hmap initialize Hmap, that is, create an Hmap
	if h == nil {
		h = new(hmap)
	}
  //Generate hash factor
	h.hash0 = fastrand()

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
  //Calculate B according to the parameters you sent, that is, after calculating B, you can know how many pokes you need to create
	B := uint8(0) //The default value is 0, and then the poked value is calculated through the overload factor
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
  //Create poke
	if h.B != 0 {
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}

(2) What happens when you add data?

mymap := make(map[string]string,10)
mymap["name"] = "Zhuge Liang"

Step 1: first, the hash value will be generated by combining the key you passed with the hash factor

Step 2: after the hash value is obtained, the B bit (the hash value is binary) determines where the data should be stored (Bmap)

Step 3: after confirming the poke, you can add data. The top 8 bits are stored in the tophash in Bmap. When the poke is full, you will find the overflow poke through overflow.

Source code:

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	//... Omit some codes
  //Compute hash
	hash := t.hasher(key, uintptr(h.hash0))
	h.flags ^= hashWriting
	if h.buckets == nil {
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
	}
  
again:
	bucket := hash & bucketMask(h.B)
	if h.growing() {
		growWork(t, h, bucket)
	}
	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
  // Calculate top
	top := tophash(hash)

	var inserti *uint8
	var insertk unsafe.Pointer
	var elem unsafe.Pointer

  //... Omit some codes
  

(3) What happens when reading data?

mymap := make(map[string]string,10)
mymap["name"] = "Zhuge Liang"
//Read data
value := mymap["name"]

Step 1: combine hash factor and key to generate hash value

Step 2: determine the corresponding BMAP of the key storage through the value of the last B bit of the hash value

Step 3: query the corresponding data according to the tophash (high 8 bits)

(4) Map expansion

When to start the capacity expansion mechanism?

Expansion conditions:

  • If the total number of map data / poke number > 6.5, it will trigger doubling and capacity expansion
  • Use too many overflow pokes (too many overflow pokes will slow down map processing)
    • When B < = 15, the number of overflow pokes used > = 2 to the power of B, an equal capacity expansion is triggered
    • When b > 15, the number of overflow pokes used > = the 15th power of 2, an equal capacity expansion is triggered

Source code:

func hashGrow(t *maptype, h *hmap) {
	// If we've hit the load factor, get bigger.
	// Otherwise, there are too many overflow buckets,
	// so keep the same number of buckets and "grow" laterally.
	bigger := uint8(1)
	if !overLoadFactor(h.count+1, h.B) {
		bigger = 0
		h.flags |= sameSizeGrow
	}
	oldbuckets := h.buckets
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

	flags := h.flags &^ (iterator | oldIterator)
	if h.flags&iterator != 0 {
		flags |= oldIterator
	}
	// commit the grow (atomic wrt gc)
	h.B += bigger
	h.flags = flags
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
	h.nevacuate = 0
	h.noverflow = 0

	if h.extra != nil && h.extra.overflow != nil {
		// Promote current overflow buckets to the old generation.
		if h.extra.oldoverflow != nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	if nextOverflow != nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}

	// the actual copying of the hash table data is done incrementally
	// by growWork() and evacuate().
}

(5) What happens to data migration after capacity expansion?

Double capacity expansion: in case of double capacity expansion, the migration is to import the old data into the new data and migrate according to the hash value.

Traverse the old poke

(6) Other extensions

What are the characteristics of map as a function parameter?

func changeMap(m map[string]string) {
	m["name"] = "Cao Cao"
	fmt.Printf("m=m%p,m=%v",m,m)
}
func main() {
	m1 := map[string]string{"name":"Zhuge Liang"}
	fmt.Printf("m1=%p,m1=%v\n",m1,m1)
	changeMap(m1)
}

Note: This chestnut proves that map, as a function parameter, is address transfer, not value transfer.

(7) map features

  • Key cannot be repeated
  • The key must be hashable (that is, it must be a computed hash)
  • disorder

Many details of Map are worth exploring. Some of them have not been mentioned here, such as Map concurrency security?

Partners who are used to reading on wechat can pay attention.

Keywords: Go

Added by joozt on Fri, 17 Dec 2021 12:44:36 +0200