go series - string underlying implementation

String


The go standard library builtin defines all built-in types. The source code is located in Src / builtin / builtin go

  • string is a collection of 8-bit bytes
  • Usually, but not necessarily, UTF-8 encoded text.
  • In addition, two important points were mentioned:
    • string can be null (length 0), but not nil
    • string object cannot be modified

data structure

  • tringStruct.str: the first address of the string

  • stringStruct.len: length of string

  • It is similar to slicing, except that slicing also has a member representing capacity

  • In fact, string and slice, to be exact, byte slice are often converted

string operation

statement

// As shown in the following code, you can declare a string variable to give the initial value
var str string
str = "Hello World"

// The process of string construction is to construct stringStruct according to the string, and then convert it into string
// The source code of the conversion is as follows:
import "unsafe"

func gostringnocopy(str *byte) string { 
    // Building a string from a string address
	ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)} 
    // Construct stringStruct first
	s := *(*string)(unsafe.Pointer(&ss)) 
    // Then convert stringStruct to string
	return s
}
  • string
    • In the runtime package is stringStruct
    • External rendering is called string

[] byte to string

func GetStringBySlice(s []byte) string {
	return string(s)
}
  • This conversion requires a memory copy
  • The conversion process is as follows:
    • Apply for memory space according to the length of the slice
      • Suppose the memory address is p and the slice length is len(b)
    • Build string
      • (string.str = p;string.len = len;)
    • Copy data (copy the data in the slice to the newly applied memory space)

Example diagram of conversion:

string to [] byte

func GetSliceByString(str string) []byte {
	return []byte(str)
}
  • A memory copy is also required
  • The process is as follows:
    • Request slice memory space
    • Copy string to slice

Schematic diagram of conversion:

String splicing

// Strings can be easily spliced, as follows:
str := "Str1" + "Str2" + "Str3"
  • Even if a lot of strings need to be spliced, the performance is guaranteed
    • Because the memory space of the new string is allocated at one time
    • Therefore, the performance consumption is mainly on copying data
  • A string of concatenated statements
    • At compile time, it is stored in a slice
    • The splicing process needs to traverse the slices twice
    • The first traversal obtains the total string length and applies for memory accordingly
    • The second iteration copies the strings one by one.
// String splicing pseudocode is as follows:
func concatstring(a []string) string {
	// Total string length after splicing
	length := 0
	
	for _, str := range a {
		length += length(str)
	}
	
	// Generates a string of the specified size, returns a string and a slice, and both share memory space
	s, b := rawstring(length)
	
	for _, str := range a {
		// string cannot be modified. It can only be modified by slicing
		copy(b, str)
		b = b[len(str):]
	}
	return s
}
// Because string s cannot be modified directly
// So here we use the rawstring() method to initialize a string of a specified size
// A slice is returned at the same time, and they share the same memory space
// Later, the data is copied to the slice, which indirectly modifies the string



// The source code of rawstring() is as follows:
// Generate a new string, and the returned string and slice share the same space
func rawstring(size int) (s string, b []byte) {
	p := mallocgc(uintptr(size), nil, false)

	stringStructOf(&s).str = p
	stringStructOf(&s).len = size

	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}
	return
}

Why does the string not support modification?

  • Like string in C + + language, it has memory space, and modifying string is supported

  • But in the implementation of Go

    • string contains no memory space, only a pointer to memory

      • The advantage is that string s become very lightweight
      • It can be easily transmitted
      • Don't worry about memory copies
    • A string usually points to a string literal

      • The string literal storage location is a read-only segment, not on the heap or stack
      • That's why there is a convention that string cannot be modified

Will converting [] byte to string copy memory?

  • When byte slice is converted to string
  • Instead of copying memory, a string is returned directly
  • The pointer (string.str) points to the memory of the slice.
  • For example, the compiler will recognize the following temporary scenarios:
    • Use m[string(b)] to find the map (map is string as key, and slice B is temporarily converted into string)
    • String splicing, such as "<" + "string(b)" + ">"
    • String comparison: string(b) = = "foo"
  • Because the byte slice is temporarily converted into a string, the string reference failure caused by changing the byte slice into a string can be avoided. Therefore, it is not necessary to copy the memory to create a new string at this time

How to choose between string and [] byte

Due to their different data structures, their derived methods are also different, which should be selected according to the actual application scenario

  • string is good at scenes

    • Scenarios requiring string comparison
    • Scenarios that do not require nil strings
  • [] byte is good at scenes

    • The scene of modifying the string, especially when the modification granularity is 1 byte;
    • The return value of the function needs to be expressed in nil;
    • Scenes requiring slicing operation;
  • Although string is not applicable to as many scenarios as [] byte

    • However, because the string is intuitive, it still exists in a large number in practical applications
    • In the low-level implementation, [] byte is used more

Keywords: Go Back-end

Added by Person on Sat, 11 Dec 2021 03:13:06 +0200