go arrays and slices

Strictly speaking, there are three first-class citizen container types in Go: array, slice and mapping. Sometimes, we can think of string type and channel type as container type. This article mainly talks about arrays and slices.

In this article, we will first introduce the overview of container types and container values, then compare the similarities and differences between arrays and slices, as well as the scenario of slices passing as function parameters, and attach simple examples. Finally, we will discuss the expansion of slices.

1. Container type and container value

Each container (value) is used to represent and store a sequence or collection of elements. All elements in a container are of the same type. This same type is called the element type of this container's type (or simply the element type of this container).

Each element value stored in a container is associated with a key. Each element can be accessed through its key value. The key value type of a mapping type must be a comparable type.

Comparable type means that = = and are supported= The type of operation identifier comparison. In Go, other types except slice type, mapping type, function type, any structure type containing fields of non comparable type, and any array type whose element type is non comparable type are called comparable types.

The key value types of array and slice types are built-in type int. The key value corresponding to an element of an array or slice is always a non negative integer subscript. This non negative integer represents the sequential position of the element in all elements of the array or slice. This nonnegative integer subscript is also often referred to as an element index.

Each container value has a length attribute that indicates how many elements are currently stored in this container. The legal value range of non negative integer index key value associated with each element in an array or slice is left closed and right open interval [0, the length of this array or slice).

2. Similarities and differences between array and slice

2.1. Similarities and differences

The underlying data of slice is array. Slice is the encapsulation of array, which describes the fragment of an array. Both can access individual elements through subscripts.
The array is of fixed length. After the length is defined, it cannot be changed. In Go, array is uncommon because its length is part of the type, which limits its expressive ability. For example, [3]int and [4]int are different types.
Slicing is very flexible and can expand capacity dynamically. The type of slice is independent of the length.
Array is a continuous piece of memory. slice is actually a structure that contains three fields: length, capacity and underlying array.

2.2. Slice structure and example

slice data structure is as follows:

// runtime/slice.go
type slice struct {
    array unsafe.Pointer // Element pointer
    len   int // length 
    cap   int // capacity
}

Or represented by a diagram:

Note: the underlying array can be pointed to by multiple slices at the same time, so the operation on the elements of one slice may affect other slices.

1. Example 1

[3] Are int and [4]int of the same type?

no Because the length of the array is part of the type, which is different from slice.

2. Example 2

Is the processing of nil Slice and empty Slice consistent in Go?

2.1. First, Go's JSON standard library does not handle nil slice and empty slice consistently:

package main

import (
	"encoding/json"
	"fmt"
	"log"
)

func main(){
	var s1 []int
	s2 := []int{}
	b, err := json.Marshal(s1)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
	b, err = json.Marshal(s2)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

Output result:

null
[]

2.2. Generally, the wrong usage will report the error that the array is out of bounds, because it only declares slice but does not give the instantiated object:

var slice []int
slice[1] = 0

At this time, the slice value is nil, which can be used for functions that need to return slice. When the function is abnormal, it is guaranteed that the function will still have the return value of nil.

2.3.empty slice means that slice is not nil, but slice has no value. The underlying space of slice is empty. At this time, it is defined as follows:

slice := make([]int,0)
slice := []int{}

This is very useful when we query or process an empty list. It will tell us that we are returning a list, but there is no value in the list.

In short, nil slice and empty slice are different things, which need to be distinguished

3. Example 3

What is the output of the following code?

package main

import (
	"fmt"
)

func main() {
	slice := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
	s1 := slice[2:5]
	/*
	For slice interception s[low:high:max]:
	1. The first number (low) represents the starting point of the subscript (intercepted from this position),
	If the value of low is 0, it means to intercept from the first element;
	2. The second number (high) indicates where to end, that is, the end of the subscript (excluding this position),
	According to the formula (len = high low), that is, the second number minus the first number, and the difference is the data length.
	Here, the length can be understood as the number of data taken out.
	3. The third number is used to calculate the capacity. The so-called capacity refers to the maximum number of elements that the slice can hold at present.
	Calculate by formula (Cap = max low), that is, subtract the first number from the third number.
	*/
	s2 := s1[2:6:7]

	s2 = append(s2, 100)
	s2 = append(s2, 200)

	s1[2] = 20

	fmt.Println(s1)
	fmt.Println(s2)
	fmt.Println(slice)

}

Output result:

[2 3 20]
[4 5 6 7 100 200]
[0 1 2 3 20 5 6 7 100 9]

s1 from slice index 2 (closed interval) to index 5 (open interval, the element is actually taken to index 4), the length is 3, and the capacity defaults to the end of the array, which is 8. s2 is 5 from index 2 (closed interval) to index 6 (open interval, the element is really taken to index 5) of s1 and from capacity to index 7 (open interval, really to index 6).

Next, an element 100 is added to the tail of s2:

s2 = append(s2, 100)

The s2 capacity is just enough, so it can be added directly. However, this modifies the element at the corresponding position of the original array. This change can be seen in both array and s1.

Add element 200 to s2 again:

s2 = append(s2, 100)

At this time, the capacity of s2 is not enough, so it's time to expand the capacity. Therefore, s2 set up a new stove and copy the original elements to a new location to expand their capacity. In addition, in order to cope with another capacity expansion caused by possible append in the future, s2 will leave more buffer s during this capacity expansion to expand the new capacity to twice the original capacity, that is, 10.

Finally, modify the element whose s1 index is 2:

s1[2] = 20

This time, only the elements in the corresponding position of the original array will be affected. It doesn't affect s2 anymore. People have gone away.

In addition, when printing s1, only elements within the length of s1 will be printed. Therefore, only three elements will be printed, although its underlying array has more than three elements.

3. The slice is passed as a function parameter

As we mentioned earlier, slice is actually a structure that contains three members: len, cap and array. Respectively represent the slice length, capacity and the address of the underlying data.

When slice is used as a function parameter, it is an ordinary structure. In fact, it is easy to understand: if slice is directly passed, in the view of the caller, the actual parameter slice will not be changed by the operation in the function; If the pointer of slice is passed, the original slice will be changed in the view of the caller.

Note that no matter whether the slice or slice pointer is passed, if the data of the slice underlying array is changed, the underlying data of the argument Slice will be reflected. Why can the data of the underlying array be changed? Good understanding: the underlying data is a pointer in the slice structure, although the slice structure itself will not be changed, that is, the underlying data address will not be changed; However, through the pointer to the underlying data, the underlying data of the slice can be changed. There is no problem.

You can get the address of the array through the array field of slice. In the code, the value of the underlying array element of slice is changed directly through an operation like s[i]=10.

In addition, it is worth noting that the function parameters of Go language are passed only by value, not by reference.

Take a look at a code snippet:

package main

func main() {
    s := []int{1, 1, 1}
    f(s)
    fmt.Println(s)
}

func f(s []int) {
    // i is just a copy and cannot change the value of the element in s
    /*for _, i := range s {
        i++
    }
    */

    for i := range s {
        s[i] += 1
    }
}

Run it and the program output:

[2 2 2]

The way of subscript reference really changes the underlying data of the original slice.

Here, when calling the f function, a copy of slice is passed, that is, in the f function, s is only a copy of s in the main function. Inside the f function, the effect on s will not change the s of the outer main function. As mentioned above, the slice structure itself will not change in the f function. Even if we change the underlying data of the slice by subscript reference, the underlying data address will remain unchanged.

To really change the outer slice, you can only assign the returned new slice to the original slice, or pass a pointer to the slice to the function. Let's take another example:

package main

import "fmt"

func myAppend(s []int) []int {
    // Although s is changed here, it will not affect the s of the outer function
    s = append(s, 100)
    return s
}

func myAppendPtr(s *[]int) {
    // Will change the outer layer s itself
    *s = append(*s, 100)
    return
}

func main() {
    s := []int{1, 1, 1}
    newS := myAppend(s)

    fmt.Println(s)
    fmt.Println(newS)

    s = newS

    myAppendPtr(&s)
    fmt.Println(s)
}

Operation results:

[1 1 1]
[1 1 1 100]
[1 1 1 100 100]

In the myAppend function, although s is changed, it is only a value transfer and will not affect the s of the outer layer. Therefore, the result printed in the first line is still [1].

newS is a new slice, which is based on s. Therefore, it prints the result after adding a 100: [1 100].

Finally, assign newS to s, and s really becomes a new slice. Then, pass an s pointer to the myAppendPtr function, and this time it is really changed: [1 100 100].

4. Expansion of slices

Generally, the expansion is only caused after adding elements to slice. The append element calls the append function.

Let's take a look at the prototype of the append function:

func append(slice []Type, elems ...Type) []Type

The parameter length of the append function is variable, so you can append multiple values to slice. You can also use Pass in slice and directly append a slice:

slice = append(slice, elem1, elem2)
slice = append(slice, anotherSlice...)

The return value of the append function is a new slice. The Go compiler does not allow you to call the append function without using the return value:

append(slice, elem1, elem2)
append(slice, anotherSlice...)

Therefore, the above usage is wrong and cannot be compiled.

Use append to append elements to slice, which is actually adding elements to the underlying array. However, the length of the underlying array is fixed. If the element pointed to by the index len-1 is the last element of the underlying array, it cannot be added.

At this time, slice will migrate to a new memory location, and the length of the new underlying array will increase, so that the new elements can be placed. At the same time, in order to cope with the append operation that may occur again in the future, a certain buffer is reserved for the length of the new underlying array, that is, the capacity of the new slice. Otherwise, every time you add elements, migration will occur, and the cost is too high.

The buffer size reserved by the new slice is regular. Most articles on the Internet describe this:

When the capacity of the original slice is less than 1024, the capacity of the new slice becomes twice that of the original slice; The original slice capacity exceeds 1024, and the new slice capacity becomes 1.25 times of the original.

Here's the conclusion: the above description is wrong.

To illustrate that the above rule is wrong, look at the following code:

package main

import "fmt"

func main() {
	s := make([]int, 0)

	oldCap := cap(s)

	for i := 0; i < 2048; i++ {
		s = append(s, i)

		newCap := cap(s)

		if newCap != oldCap {
			fmt.Printf("[%d -> %4d] cap = %-4d  |  after append %-4d  cap = %-4d\n",
				0, i-1, oldCap, i, newCap)
			oldCap = newCap
		}
	}
}

First, create an empty slice, and then continue to append new elements in a loop. Then record the change of capacity, and whenever the capacity changes, record the old capacity and the capacity after adding elements, and record the elements in the slice at this time. In this way, I can observe the capacity changes of new and old slices, so as to find out the law.

Operation results:

[0 ->   -1] cap = 0     |  after append 0     cap = 1
[0 ->    0] cap = 1     |  after append 1     cap = 2
[0 ->    1] cap = 2     |  after append 2     cap = 4
[0 ->    3] cap = 4     |  after append 4     cap = 8
[0 ->    7] cap = 8     |  after append 8     cap = 16
[0 ->   15] cap = 16    |  after append 16    cap = 32
[0 ->   31] cap = 32    |  after append 32    cap = 64
[0 ->   63] cap = 64    |  after append 64    cap = 128
[0 ->  127] cap = 128   |  after append 128   cap = 256
[0 ->  255] cap = 256   |  after append 256   cap = 512
[0 ->  511] cap = 512   |  after append 512   cap = 1024
[0 -> 1023] cap = 1024  |  after append 1024  cap = 1280
[0 -> 1279] cap = 1280  |  after append 1280  cap = 1696
[0 -> 1695] cap = 1696  |  after append 1696  cap = 2304

When the capacity of the old slice is less than 1024, the capacity of the new slice is indeed twice that of the old slice. At present, it is still correct.

However, when the capacity of the old slice is greater than or equal to 1024, the situation changes. When the element 1280 is added to the slice, the capacity of the old slice is 1280, and then it becomes 1696. The relationship between the two is not 1.25 times (1696 / 1280 = 1.325). After adding 1696, the new capacity 2304 is certainly not 1.25 times that of 1696.

It can be seen that the expansion strategy in various online articles is not correct. We directly move out the source code:

When adding elements to slice, if the capacity is not enough, the growslice function will be called, so let's look at its code directly.

// go 1.14.6 src/runtime/slice.go:76
// et:slice element type, old: old slice,cap: new minimum capacity required
func growslice(et *_type, old slice, cap int) slice {
	// ......
	newcap := old.cap
	doublecap := newcap + newcap
	if cap > doublecap {
		newcap = cap
	} else {
		if old.len < 1024 {
			newcap = doublecap
		} else {
			// Check 0 < newcap to detect overflow
			// and prevent an infinite loop.
			for 0 < newcap && newcap < cap {
				newcap += newcap / 4
			}
			// Set newcap to the requested cap when
			// the newcap calculation overflowed.
			if newcap <= 0 {
				newcap = cap
			}
		}
	}
	// ......
}

If you only look at the first half, the law of newcap mentioned in various articles on the Internet is right. The reality is that the latter part also makes a memory alignment for newcap, which is related to the memory allocation strategy. After memory alignment, the capacity of the new slice should be greater than or equal to 2 times or 1.25 times that of the old slice.

After that, apply for memory from the Go memory manager, copy the data in the old slice, and add the element of append to the new underlying array.

Finally, a new slice is returned to the caller of the growslice function. The length of this slice has not changed, but the capacity has increased.

Here are some examples:

1. Example 1

package main

import "fmt"

func main() {
    s := []int{5}
    s = append(s, 7)
    s = append(s, 9)
    x := append(s, 11)
    y := append(s, 12)
    fmt.Println(s, x, y)
}

Explanation:

code	Slice corresponding state
s := []int{5}	s has only one element, [5]
s = append(s, 7)	s capacity expansion, capacity becomes 2, [5, 7]
s = append(s, 9)	s expansion, capacity becomes 4, [5, 7, 9]. Note that at this time, the length of s is 3 and there are only 3 elements
x := append(s, 11)	Because the underlying array of s still has space, it will not be expanded. In this way, the underlying array becomes [5, 7, 9, 11]. Note that at this time, s = [5, 7, 9], the capacity is 4; x = [5, 7, 9, 11], capacity 4. Here s does not change
y := append(s, 12)	Here, an element is added at the end of the s element. Since the length of s is 3 and the capacity is 4, 12 is directly filled in where the index of the underlying array is 3. Results: s = [5, 7, 9], y = [5, 7, 9, 12], x = [5, 7, 9, 12], the length and capacity of X and y were 4

Therefore, the final execution result of the program is:

[5 7 9] [5 7 9 12] [5 7 9 12]

It should be noted here that after the append function is executed, a new slice will be returned, and it will not affect the incoming slice.

2. Example 2

For append, let's take a look at this example:

package main

import "fmt"

func main() {
    s := []int{1,2}
    s = append(s,4,5,6)
    fmt.Printf("len=%d, cap=%d",len(s),cap(s))
}

The operation result is:

len=5, cap=6

As summarized in various articles on the Internet: when the length of the slice is less than 1024, the capacity will be doubled each time. When element 4 is added, the capacity becomes 4; When adding element 5, it remains unchanged; When element 6 is added, the capacity is doubled to 8. The running result of the above code should be:

len=5, cap=8

This is obviously wrong. Let's take a closer look at why. Move out the source code again:

// go 1.14.6 src/runtime/slice.go:76
// et:slice element type, old: old slice,cap: new minimum capacity required
func growslice(et *_type, old slice, cap int) slice {
	// ......
	newcap := old.cap
	doublecap := newcap + newcap
	if cap > doublecap {
		newcap = cap
	} else {
		// ......
	}
	// ......
	// Specialize for common values of et.size.
	// For 1 we don't need any division/multiplication.
	// For sys.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
	// For powers of 2, use a variable shift.
	switch {
	case et.size == 1:
		// ......
	case et.size == sys.PtrSize:
		// ......
		capmem = roundupsize(uintptr(newcap) * sys.PtrSize)
		// ......
		newcap = int(capmem / sys.PtrSize)
	case isPowerOfTwo(et.size):
		// ......
	default:
		// ......
	}
}

The parameters of this function are the type of element, the old slice and the minimum capacity required by the new slice.

In the example, s used to have only two elements. len and cap are both 2. After three elements are attached, the length becomes 5 and the minimum capacity becomes 5. That is, when calling the growslice function, the third parameter passed in should be 5. cap=5. On the one hand, doublecap is twice the capacity of the original slice, which is equal to 4. The first if condition is satisfied, so newcap becomes 5.

Then the roundupsize function is called and 40 is passed in. (in the code, ptrSize refers to the size of a pointer, and the size of int type is 8 on 64 bit machines)

Let's look at memory alignment and move out the code of roundupsize function:

// Returns size of the memory block that mallocgc will allocate if you ask for the size.
func roundupsize(size uintptr) uintptr {
	if size < _MaxSmallSize {
		if size <= smallSizeMax-8 {
			return uintptr(class_to_size[size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]])
		} else {
			//......
		}
	}
	//......
}

const _MaxSmallSize = 32768
const smallSizeMax = 1024
const smallSizeDiv = 8

Obviously, we will eventually return the result of this formula:

class_to_size[size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]]

These are two slice s about memory allocation in the Go source code. class_to_size get the object size divided by span through spanClass. And size_to_class8 means to get its spanClass through size.

var size_to_class8 = [smallSizeMax/smallSizeDiv + 1]uint8{0, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31}

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

The size we pass in is equal to 40. So (size+smallSizeDiv-1)/smallSizeDiv = 5; Get size_ to_ The element with index 5 in the Class8 array is 4; Get class_ to_ The element with index 4 in size is 48. Finally, the capacity of the new slice is 6:

newcap = int(capmem / ptrSize) // 6

As for the origin of the above two magic arrays, it will not be expanded.

3. Example 3

What happens when you add elements to a slice of nil? Why?

In fact, nil slice or empty slice can obtain the expansion of the underlying array by calling the append function. Finally, mallocgc is called to apply for a piece of memory from Go's memory manager, and then assigned to the original nil slice or empty slice, and then changed into a "real" slice.

To sum up, this article introduces the relevant knowledge of array and slice, and focuses on the analysis of the expansion principle of slice. Understanding these is also helpful for us to properly use slice for development and avoid some slicing problems caused by the change of underlying array.

Reference article:

https://qcrao91.gitbook.io/go/shu-zu-he-qie-pian

https://gfw.go101.org/article/container.html

Added by refiking on Tue, 08 Mar 2022 01:32:30 +0200

Programming VIP