Go language core 36 lecture (go language practice and application 21) -- learning notes

43 | data types in bufio package (Part 2)

In the last article, I mentioned that the data types in the bufio package mainly include reader, Scanner, writer and ReadWriter. And focus on the bufio.Reader type and bufio.Writer type. Today, we continue to focus on the content of bufio.Reader for learning.

Knowledge expansion

Question: what are the different reading methods of bufio.Reader type?

bufio.Reader type has many pointer methods for reading data. There are four methods that can be used as representatives of different reading processes: Peek, Read, ReadSlice and ReadBytes.

The function of the Peek method of the Reader value is to read and return n unread bytes in its buffer, and it will start reading from the index position represented by the read count.

When the buffer is not filled and the number of unread bytes is less than n, the method will call the fill method to start the buffer filling process. However, if it finds an error when it last filled the buffer, it will not fill it again.

If n given by the caller is greater than the length of the buffer, or the number of unread bytes in the buffer is less than N, the Peek method will return "a sequence of all unread bytes" as the first result value.

At the same time, it usually puts "the value of the buffer. Errbufferfull variable (hereinafter referred to as the buffer full error)"

It is returned as the second result value to indicate that although the buffer is compressed and filled, it still cannot meet the requirements.

Only when none of the above conditions occurs can the Peek method return: "n bytes starting with the read count" and "nil indicating that no error has occurred".

The Peek method of bufio.Reader type has a distinctive feature: even if it reads the data in the buffer, it will not change the value of the read count.

This is not the case with other Read methods of this type. Take this type of Read method for example. It sometimes copies the unread bytes in the buffer to the byte slice represented by its parameter p, and immediately increases the value of the Read count according to the actual number of bytes copied.

This method does this when there are unread bytes in the buffer. However, at other times, the read count of its value will be equal to the write count, indicating that there are no unread bytes in the buffer at this time.
When there are no unread bytes in the buffer, the Read method will first check whether the length of parameter p is greater than or equal to the length of the buffer. If so, the Read method simply gives up filling the buffer with data, and instead reads the data directly from its underlying reader and copies it to P. This means that it completely crosses the buffer and connects both sides of data supply and demand.

It should be noted that the Peek method is different from here when encountering similar situations (which of the two methods is better depends on the specific use scenario).

The Peek method fills the buffer when the condition is met, and directly returns all unread bytes in the buffer when it finds that the value of parameter n is greater than the length of the buffer.

If the buffer length we set is very large, the execution time of the method in this case may be relatively long. The main reason is that it takes a long time to fill the buffer.

According to the process performed by the fill method, it will try to fill the writable space in the buffer. However, in most cases, the Read method will not write data to the buffer, especially in the case described above, that is, there are no unread bytes in the buffer, and the length of parameter p is greater than or equal to the length of the buffer.

At this time, the method will read the data directly from the underlying reader, so the reading speed of the data has become the decisive factor in the execution time of the method in this case.

Of course, what I'm talking about here is just where time-consuming operations are more likely to occur in some cases. All conclusions should be based on the objective results of performance testing.

Go back to the internal process of the Read method. If there are no unread bytes in the buffer, but its length is greater than the length of parameter p, the method will first reset the values of Read count and write count to 0, and then try to fill the buffer from beginning to end with the data obtained from the underlying reader.

Note, however, that the attempt here will only be made once. This is true regardless of whether the data can be obtained at this time and whether there are errors during acquisition. The fill method is different from this. As long as there is no error, it will try many times, so it is more likely to get some data.

However, the two methods have one thing in common: as long as they write the obtained data to the buffer, they will update the value of the written count in time.

Let's talk about the ReadSlice method and ReadBytes method. Generally speaking, the function of these two methods is to continuously read data until the separator given by the caller is encountered.

The ReadSlice method first looks for a delimiter in the unread portion of its buffer. If it cannot be found and the buffer is not full, the method will first fill the buffer by calling the fill method, and then look for it again.

If an error occurs during the filling process, it will return the unread part of the buffer as a result and return the corresponding error value.

Note that during this process, it is possible that the separator cannot be found even though the buffer has been filled.

At this time, the ReadSlice method will take the entire buffer (that is, the byte slice represented by the buf field) as the first result value, and the error that the buffer is full (that is, the value of the buffer.errbufferfull variable) as the second result value.

The buffer filled by the fill method must contain only unread bytes from beginning to end, so this is reasonable.

Of course, once the ReadSlice method finds the delimiter, it will cut out the corresponding byte slice containing the delimiter on the buffer and return the slice as the result value. This method correctly sets the value of the read count whether the delimiter is found or not.

For example, before returning all unread bytes in the buffer or byte slices representing all buffers, it assigns the value of the write count to the read count to indicate that there are no unread bytes in the buffer.

If ReadSlice is an easy method to give up halfway, it can be said that the ReadBytes method is quite persistent.

The ReadBytes method reads data from the buffer again and again by calling the ReadSlice method until the delimiter is found.

During this process, the ReadSlice method may return all read bytes and corresponding error values because the buffer is full, but the ReadBytes method will always ignore such errors and call the ReadSlice method again, so that the latter will continue to fill the buffer and look for separators in it.

This process will never end unless the error value returned by the ReadSlice method does not represent an error that the buffer is full, or it finds a delimiter.

If the search process is over, whether the delimiter is found or not, the ReadBytes method will assemble all bytes read in this process into a byte slice according to the reading order, and take it as the first result value. If the process ends because of an error, it will also take the error value as the second result value.

Among the numerous reading methods of bufio.Reader type, the ReadSlice method relies on the ReadBytes method and the ReadLine method. However, the latter has nothing special in the reading process, so I won't repeat it here.

In addition, the ReadString method of this type completely depends on the ReadBytes method. The former only makes a simple type conversion on the result value returned by the latter.

Finally, I would like to remind you that there is a security problem that needs your attention. Peek method, ReadSlice method and ReadLine method of bufio.Reader type may cause content disclosure.

This is mainly because they normally return directly buffer based byte slices. I explained what is called content disclosure when talking about bytes.Buffer type. You can go back and see.

The caller can access other parts of the buffer through the result values returned by these methods, and even modify the contents of the buffer. This is usually dangerous.

package main

import (
	"bufio"
	"fmt"
	"strings"
)

func main() {
	comment := "Package bufio implements buffered I/O. " +
		"It wraps an io.Reader or io.Writer object, " +
		"creating another object (Reader or Writer) that " +
		"also implements the interface but provides buffering and " +
		"some help for textual I/O."
	basicReader := strings.NewReader(comment)
	fmt.Printf("The size of basic reader: %d\n", basicReader.Size())

	size := 300
	fmt.Printf("New a buffered reader with size %d ...\n", size)
	reader1 := bufio.NewReaderSize(basicReader, size)
	fmt.Println()

	fmt.Print("[ About 'Peek' method ]\n\n")
	// Example 1.
	peekNum := 38
	fmt.Printf("Peek %d bytes ...\n", peekNum)
	bytes, err := reader1.Peek(peekNum)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Peeked contents(%d): %q\n", len(bytes), bytes)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader1.Buffered())
	fmt.Println()

	fmt.Print("[ About 'Read' method ]\n\n")
	// Example 2.
	readNum := 38
	buf1 := make([]byte, readNum)
	fmt.Printf("Read %d bytes ...\n", readNum)
	n, err := reader1.Read(buf1)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", n, buf1)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader1.Buffered())
	fmt.Println()

	fmt.Print("[ About 'ReadSlice' method ]\n\n")
	// Example 3.
	fmt.Println("Reset the basic reader ...")
	basicReader.Reset(comment)
	fmt.Println("Reset the buffered reader ...")
	reader1.Reset(basicReader)
	fmt.Println()

	delimiter := byte('(')
	fmt.Printf("Read slice with delimiter %q...\n", delimiter)
	line, err := reader1.ReadSlice(delimiter)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", len(line), line)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader1.Buffered())
	fmt.Println()

	delimiter = byte('[')
	fmt.Printf("Read slice with delimiter %q...\n", delimiter)
	line, err = reader1.ReadSlice(delimiter)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", len(line), line)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader1.Buffered())
	fmt.Println()

	// Example 4.
	fmt.Println("Reset the basic reader ...")
	basicReader.Reset(comment)
	size = 200
	fmt.Printf("New a buffered reader with size %d ...\n", size)
	reader2 := bufio.NewReaderSize(basicReader, size)
	fmt.Println()

	delimiter = byte('[')
	fmt.Printf("Read slice with delimiter %q...\n", delimiter)
	line, err = reader2.ReadSlice(delimiter)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", len(line), line)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader2.Buffered())
	fmt.Println()

	fmt.Print("[ About 'ReadBytes' method ]\n\n")
	// Example 5.
	fmt.Println("Reset the basic reader ...")
	basicReader.Reset(comment)
	size = 200
	fmt.Printf("New a buffered reader with size %d ...\n", size)
	reader3 := bufio.NewReaderSize(basicReader, size)
	fmt.Println()

	delimiter = byte('[')
	fmt.Printf("Read bytes with delimiter %q...\n", delimiter)
	line, err = reader3.ReadBytes(delimiter)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", len(line), line)
	fmt.Printf("The number of unread bytes in the buffer: %d\n", reader3.Buffered())
	fmt.Println()

	// Examples 6 and 7.
	fmt.Print("[ About contents leak ]\n\n")
	showContentsLeak(comment)
}

func showContentsLeak(comment string) {
	// Example 6.
	basicReader := strings.NewReader(comment)
	fmt.Printf("The size of basic reader: %d\n", basicReader.Size())

	size := len(comment)
	fmt.Printf("New a buffered reader with size %d ...\n", size)
	reader4 := bufio.NewReaderSize(basicReader, size)
	fmt.Println()

	peekNum := 7
	fmt.Printf("Peek %d bytes ...\n", peekNum)
	bytes, err := reader4.Peek(peekNum)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Peeked contents(%d): %q\n", len(bytes), bytes)
	fmt.Println()

	// Just expand the previously obtained byte slice bytes,
	// You can use it to read or even modify subsequent content in the buffer.
	bytes = bytes[:cap(bytes)]
	fmt.Printf("The all of the contents in the buffer:\n%q\n", bytes)
	fmt.Println()

	blank := byte(' ')
	fmt.Println("Set blanks into the contents in the buffer ...")
	for _, i := range []int{55, 56, 57, 58, 66, 67, 68} {
		bytes[i] = blank
	}
	fmt.Println()

	peekNum = size
	fmt.Printf("Peek %d bytes ...\n", peekNum)
	bytes, err = reader4.Peek(peekNum)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Peeked contents(%d):\n%q\n", len(bytes), bytes)
	fmt.Println()

	// Example 7.
	// The ReadSlice method has the same problem.
	delimiter := byte(',')
	fmt.Printf("Read slice with delimiter %q...\n", delimiter)
	line, err := reader4.ReadSlice(delimiter)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Read contents(%d): %q\n", len(line), line)
	fmt.Println()

	line = line[:cap(line)]
	fmt.Printf("The all of the contents in the buffer:\n%q\n", line)
	fmt.Println()

	underline := byte('_')
	fmt.Println("Set underlines into the contents in the buffer ...")
	for _, i := range []int{89, 92, 103} {
		line[i] = underline
	}
	fmt.Println()

	peekNum = size
	fmt.Printf("Peek %d bytes ...\n", peekNum)
	bytes, err = reader4.Peek(peekNum)
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	fmt.Printf("Peeked contents(%d): %q\n", len(bytes), bytes)
}

summary

We introduced the data types in the bufio package in a long time, with the focus on the bufio.Reader type.

The bufio.Reader type represents the reader that carries the buffer. Its value needs to accept an underlying reader when it is initialized, and the type of the latter must be the implementation of the io.Reader interface.

The buffer in the Reader value is actually a data storage intermediary, which is between the underlying Reader and the reading method and its caller. Generally, the reading method of such value will first read data from the buffer of the value, and when necessary, read part of the data from the underlying Reader in advance and fill it into the buffer for later use. Filling the buffer is usually performed by the fill method of the value.

During the filling process, the fill method sometimes compresses the buffer. Among the many reading methods owned by the Reader value, there are four methods that can be used as representatives of different reading processes: Peek, Read, ReadSlice and ReadBytes.

The Peek method is characterized in that even if the data in the buffer is Read, the value of the Read count will not be changed. When the length of the parameter value is too large and there are no unread bytes in the buffer, the Read method will cross the buffer and directly ask for data from the underlying reader.

The ReadSlice method looks for the given delimiter in the unread part of the buffer and fills the buffer if necessary.

If the separator cannot be found after the buffer is filled, the method returns the entire buffer as the first result value and returns the error that the buffer is full.

The ReadBytes method fills the buffer again and again by calling the ReadSlice method and looks for a delimiter in it. This process will continue until an unexpected error occurs or a separator is found.

The ReadLine method of the Reader value depends on its ReadSlice method, while its ReadString method depends entirely on the ReadBytes method.

In addition, it is worth noting that the Peek method, ReadSlice method and ReadLine method of the Reader value may cause the disclosure of the contents in its buffer.

Finally, let's talk about the bufio.Writer type. The function of writing the data temporarily stored in the buffer of this type of value into its underlying writer is mainly realized by its Flush method.

All data writing methods for such values will call its Flush method when necessary. Generally, these Write methods will first Write data into the buffer of the value to which they belong, and then increase the written count in the value. However, sometimes, the Write method and the ReadFrom method also cross the buffer and Write data directly to its underlying writer.

Remember that although these write methods call the Flush method from time to time, it is always safest to explicitly call this method after writing all the data.

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"strings"
)

func main() {
	comment := "Writer implements buffering for an io.Writer object. " +
		"If an error occurs writing to a Writer, " +
		"no more data will be accepted and all subsequent writes, " +
		"and Flush, will return the error. After all data has been written, " +
		"the client should call the Flush method to guarantee all data " +
		"has been forwarded to the underlying io.Writer."
	basicWriter1 := &strings.Builder{}

	size := 300
	fmt.Printf("New a buffered writer with size %d ...\n", size)
	writer1 := bufio.NewWriterSize(basicWriter1, size)
	fmt.Println()

	// Example 1.
	begin, end := 0, 53
	fmt.Printf("Write %d bytes into the writer ...\n", end-begin)
	writer1.WriteString(comment[begin:end])
	fmt.Printf("The number of buffered bytes: %d\n", writer1.Buffered())
	fmt.Printf("The number of unused bytes in the buffer: %d\n",
		writer1.Available())
	fmt.Println("Flush the buffer in the writer ...")
	writer1.Flush()
	fmt.Printf("The number of buffered bytes: %d\n", writer1.Buffered())
	fmt.Printf("The number of unused bytes in the buffer: %d\n",
		writer1.Available())
	fmt.Println()

	// Example 2.
	begin, end = 0, 326
	fmt.Printf("Write %d bytes into the writer ...\n", end-begin)
	writer1.WriteString(comment[begin:end])
	fmt.Printf("The number of buffered bytes: %d\n", writer1.Buffered())
	fmt.Printf("The number of unused bytes in the buffer: %d\n",
		writer1.Available())
	fmt.Println("Flush the buffer in the writer ...")
	writer1.Flush()
	fmt.Println()

	// Example 3.
	basicWriter2 := &bytes.Buffer{}
	fmt.Printf("Reset the writer with a bytes buffer(an implementation of io.ReaderFrom) ...\n")
	writer1.Reset(basicWriter2)
	reader := strings.NewReader(comment)
	fmt.Println("Read data from the reader ...")
	writer1.ReadFrom(reader)
	fmt.Printf("The number of buffered bytes: %d\n", writer1.Buffered())
	fmt.Printf("The number of unused bytes in the buffer: %d\n",
		writer1.Available())
}

Thinking questions

Today's question is: what are the main functions of the bufio.Scanner type? What are its characteristics?

Note source code

https://github.com/MingsonZheng/go-core-demo

This work adopts Knowledge sharing Attribution - non-commercial use - sharing in the same way 4.0 international license agreement License.

Welcome to reprint, use and republish, but be sure to keep the signature Zheng Ziming (including link: http://www.cnblogs.com/MingsonZheng/ ), shall not be used for commercial purposes, and the works modified based on this article must be distributed under the same license.

Added by Snake PHP on Mon, 06 Dec 2021 02:45:15 +0200

Programming VIP