Understand WebSocket thoroughly

preface

Before the emergence of WebSocket, the front-end and back-end interaction usually used Ajax for HTTP API communication. However, if there are projects with real-time requirements, such as PVP battle or push message in chat room or game, the front-end needs to poll the back-end regularly. However, too fast polling may lead to too much pressure on the back-end service, and too slow polling may lead to low real-time performance. WebSocket provides two-way communication for browser / client, server and server, maintains the long connection between client and server, and supports two-way push messages

What is WebSocket

WebSocket, like HTTP, belongs to the seventh layer of OSI network protocol, supports two-way communication, and the bottom connection adopts TCP. WebSocket is not a new protocol. It needs to be upgraded by HTTP when it is used, so the port it uses is 80 (or 443, which is upgraded by HTTPS). WebSocket Secure (wss) is the encrypted version of WebSocket (ws). The following figure shows the establishment and communication diagram of WebSocket.

Special attention should be paid to

quick get start

(Note: This article uses golang language, but the principles are reasonable)

There are usually two ways to use Go language. One is to use the net/http library built in Go language to write WebSocket server, and the other is to use the official WebSocket language library of Go language encapsulated by gorilla. Github address: gorilla/websocket (note, of course, there are other web socket encapsulation libraries, which are commonly used at present). gorilla library provides a demo of chat room. This article will start with this example

1. Start service

gorilla/websocket chat room README.md

$ go get github.com/gorilla/websocket
$ cd `go list -f '{{.Dir}}' github.com/gorilla/websocket/examples/chat`
$ go run *.go

2. Open the browser page and enter http://localhost:8080/

As an example, I opened two pages, as shown in the following figure

Enter hello,myname is james on the left. The two windows display this sentence at the same time. F12 opens the debugging window. In ws under the network tab, the data on the left has just entered uplink and downlink messages, while there are only downlink messages on the right, indicating that the messages are indeed sent to the server through the connection on the left and broadcast to all clients.

3. How to establish a connection

Similarly, open the request header in ws under the network tab in the F12 debugging window above

Request message:

GET ws://localhost:8080/ws HTTP/1.1
Host: localhost:8080
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Upgrade: websocket
Origin: http://localhost:8080
Sec-WebSocket-Version: 13
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Cookie: Goland-cd273d2a=102d1f43-0418-4ea3-9959-2975794fdfe3
Sec-WebSocket-Key: 2e1HXejEZhjvYEEVOEE79g==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

The GET request starts with ws: / /, which is different from HTTP /path
Upgrade: websocket and Connection: Upgrade identity upgrade HTTP to WebSocket
SEC websocket key: 2e1HXejEZhjvYEEVOEE79g = = where 2e1HXejEZhjvYEEVOEE79g = = base64 of 6 random bytes is used to identify a connection, not for encryption
SEC WebSocket version: 13 specifies the protocol version of WebSocket

Reply message:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: WPaVPwi6nk4cFFxS8NJ3BIwAtNE=

101 indicates that the HTTP protocol of this connection will be changed. The changed protocol is the WebSocket protocol specified by Upgrade: websocket

Where sec websocket accept: wpavpwi6nk4cffxs8nj3biwatne = the server obtains the SEC websocket keybase64 in the Request Header, decodes it, splices it with 258EAFA5-E914-47DA-95CA-C5AB0DC85B11, calculates the summary through SHA1, encodes it with base64, and fills it in the SEC websocket accept field

Therefore, the functions of SEC websocket key and SEC websocket accept are mainly to prevent the client from accidentally upgrading websocket, that is, to verify the handshake of websocket and to avoid accepting non websocket clients (such as HTTP clients) For details, see RFC6455 or What is Sec-WebSocket-Key for?

4. gorilla/websocket code

Viewing the demo code of chat above is a good material for learning

var addr = flag.String("addr", ":8080", "http service address")

func serveHome(w http.ResponseWriter, r *http.Request) {
    log.Println(r.URL)
    if r.URL.Path != "/" {
        http.Error(w, "Not found", http.StatusNotFound)
        return
    }
    if r.Method != "GET" {
        http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
        return
    }
    http.ServeFile(w, r, "home.html")
}

func main() {
    flag.Parse()
    hub := newHub()
    go hub.run()
    http.HandleFunc("/", serveHome)
    http.HandleFunc("/ws", func(w http.ResponseWriter, r *http.Request) {
        serveWs(hub, w, r)
    })
    err := http.ListenAndServe(*addr, nil)
    if err != nil {
        log.Fatal("ListenAndServe: ", err)
    }
}

serveHome mainly returns the HTML resources of the chat room, and serveWs mainly accepts the HTTP client upgrade websocket request and handles the broadcast of chat information (all ws connections to the whole room)

<!DOCTYPE html>
<html lang="en">
<head>
<title>Chat Example</title>
<script type="text/javascript">
window.onload = function () {
    var conn;
    var msg = document.getElementById("msg");
    var log = document.getElementById("log");

    function appendLog(item) {
        var doScroll = log.scrollTop > log.scrollHeight - log.clientHeight - 1;
        log.appendChild(item);
        if (doScroll) {
            log.scrollTop = log.scrollHeight - log.clientHeight;
        }
    }

    document.getElementById("form").onsubmit = function () {
        if (!conn) {
            return false;
        }
        if (!msg.value) {
            return false;
        }
        conn.send(msg.value);
        msg.value = "";
        return false;
    };

    if (window["WebSocket"]) {
        conn = new WebSocket("ws://" + document.location.host + "/ws");
        conn.onclose = function (evt) {
            var item = document.createElement("div");
            item.innerHTML = "<b>Connection closed.</b>";
            appendLog(item);
        };
        conn.onmessage = function (evt) {
            var messages = evt.data.split('\n');
            for (var i = 0; i < messages.length; i++) {
                var item = document.createElement("div");
                item.innerText = messages[i];
                appendLog(item);
            }
        };
    } else {
        var item = document.createElement("div");
        item.innerHTML = "<b>Your browser does not support WebSockets.</b>";
        appendLog(item);
    }
};
</script>
<style type="text/css">
html {
    overflow: hidden;
}
...

</style>
</head>
<body>
<div id="log"></div>
<form id="form">
    <input type="submit" value="Send" />
    <input type="text" id="msg" size="64" autofocus />
</form>
</body>
</html>

The above is home HTML here is mainly about conn.onclose processing connection closing, conn.onmessage processing received messages, conn.send processing sent messages, and of course conn.onopen processing connection establishment, which will not be repeated here

// serveWs handles websocket requests from the peer.
func serveWs(hub *Hub, w http.ResponseWriter, r *http.Request) {
    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        log.Println(err)
        return
    }
    client := &Client{hub: hub, conn: conn, send: make(chan []byte, 256)}
    client.hub.register <- client

    // Allow collection of memory referenced by the caller by doing all work in
    // new goroutines.
    go client.writePump()
    go client.readPump()
}

conn, err := upgrader.Upgrade(w, r, nil) is mainly used to upgrade HTTP to WebSocket, and the bottom layer uses http Hijacker hijacks the underlying TCP connection, and then you can use this connection for two - terminal communication

// Hub maintains the set of active clients and broadcasts messages to the
// clients.
type Hub struct {
   // Registered clients.
   clients map[*Client]bool

   // Inbound messages from the clients.
   broadcast chan []byte

   // Register requests from the clients.
   register chan *Client

   // Unregister requests from clients.
   unregister chan *Client
}

Hub mainly maintains all connections in the room. When a client is established and added to the clients map, it will be removed from the clients map every time the connection is disconnected

go client.readPump() is responsible for writing the messages sent by the client to broadcast, while go client Writepump () is responsible for broadcasting the messages in the broadcast to all clients in the room recorded by clients. The details of the websocket protocol will be discussed later. We will continue to look at the source code.

WebSocket protocol

ws has been quickly introduced above. How to define the communication format of WebSocket? Let's introduce it in this section

Picture up to rfc6455#section-5.2

FIN: 1 bit

1 indicates the last post slice of the sliced message

RSV1, RSV2, RSV3: 1 bit each

Generally all 0. When the client and server negotiate the extension, the value is defined by the negotiation

Opcode: 4 bit s

The Opcode value determines how the subsequent data payload should be parsed

%x0: represents a continuation frame. When Opcode is 0, it means that the data transmission adopts data fragmentation, and the currently received data frame is one of the data fragmentation.
%x1: indicates that this is a text frame
%x2: indicates that this is a binary frame
%x3-7: reserved operation code for subsequent defined non control frames.
%x8: indicates that the connection is disconnected.
%x9: indicates that this is a ping operation.
%xA: indicates that this is a pong operation.
%xB-F: reserved operation code for subsequent defined control frames

Mask: 1 bit

Whether to mask the data payload. The mask can only be used when the client sends data to the server.

If the Mask is 1, a masking key will be defined in the masking key and used to unmask the data payload. For all data frames sent from the client to the server, the Mask is 1.

The mask is mainly used to prevent malicious clients from caching the resources of other websites on the reverse proxy, causing users of other websites to use malicious code prevented by malicious attackers.

Payload length: the length of the data payload, in bytes. 7 bit, or 7+16 bit, or 1+64 bit.

Suppose the number Payload length === x, if

X is 0 ~ 126: the length of data is x bytes.
x is 126: the next 2 bytes represent a 16 bit unsigned integer whose value is the length of the data.
x is 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data.

In addition, if the payload length occupies more than one byte, the binary expression of payload length adopts the network order (big endian, the important bits first).

Masking key: 0 or 4 bytes (32 bit)

All data frames transmitted from the client to the server are masked. The Mask is 1 and carries a 4-byte masking key. If Mask is 0, there is no masking key.

Note: the length of load data does not include the length of mask key.

Payload data: (x+y) bytes

Load data: including extended data and application data. Where, the extended data is x bytes and the application data is y bytes.

Extended data: if the extension is not negotiated, the extended data is 0 bytes. All extensions must declare the length of extension data, or how to calculate the length of extension data. In addition, how the extension is used must be negotiated during the handshake phase. If the extended data exists, the load data length must include the length of the extended data.

Application data: any application data occupies the remaining position of the data frame after the extended data (if there is extended data). The length of application data is obtained by subtracting the length of extended data from the length of load data

Example

I wonder if you are confused about the continuation pin in FIN and Opcode?

First message

FIN=1, indicating the last data frame of the current message. After receiving the current data frame, the server can process the message. opcode=0x1, indicating that the client sends text type.

Second message

FIN=0, opcode=0x1, indicating that the message is sent in text type, and the message has not been sent yet, and there are subsequent data frames.
FIN=0, opcode=0x0, indicating that the message has not been sent, and there are subsequent data frames. The current data frame needs to be connected after the previous data frame.
FIN=1, opcode=0x0, indicating that the message has been sent and there is no subsequent data frame. The current data frame needs to be connected after the previous data frame. The server can assemble the associated data frames into a complete message.

Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!

Source code analysis of gorilla/websocket

Read message

gorilla/websocket uses the helper method of (c *Conn) ReadMessage() (messageType int, p []byte, err error) to quickly read a message. If a message consists of multiple data frames, it will be spliced into a complete message and returned to the business layer

// ReadMessage is a helper method for getting a reader using NextReader and
// reading from that reader to a buffer.
func (c *Conn) ReadMessage() (messageType int, p []byte, err error) {
    var r io.Reader
    messageType, r, err = c.NextReader()
    if err != nil {
        return messageType, nil, err
    }
    p, err = ioutil.ReadAll(r)
    return messageType, p, err
}

This method mainly obtains a Reader, and then reads all the data in the Reader

func (c *Conn) NextReader() (messageType int, r io.Reader, err error) {
   // Close previous reader, only relevant for decompression.
   if c.reader != nil {
      c.reader.Close()
      c.reader = nil
   }

   c.messageReader = nil
   c.readLength = 0

   for c.readErr == nil {
      frameType, err := c.advanceFrame()
      if err != nil {
         c.readErr = hideTempErr(err)
         break
      }

      if frameType == TextMessage || frameType == BinaryMessage {
         c.messageReader = &messageReader{c}
         c.reader = c.messageReader
         if c.readDecompress {
            c.reader = c.newDecompressionReader(c.reader)
         }
         return frameType, c.reader, nil
      }
   }

Since the Reader is not concurrent and safe, each subsequent process handles the Reader's read operation. c.advanceFrame() is the core code, which mainly analyzes the type of the message. If a message is divided into multiple frames, the message type is given in the first frame. The format of the parsed data frame is consistent with that explained in the above protocol. You can view the source code in detail, which will not be repeated here.

Are you curious about why ioutil Readall (R) can read the data of the whole message. What if a message is divided into multiple data frames? Analyze the source code together

func (r *messageReader) Read(b []byte) (int, error) {
   c := r.c
   if c.messageReader != r {
      return 0, io.EOF
   }

   for c.readErr == nil {

      if c.readRemaining > 0 {
         if int64(len(b)) > c.readRemaining {
            b = b[:c.readRemaining]
         }
         n, err := c.br.Read(b)
         c.readErr = hideTempErr(err)
         if c.isServer {
            c.readMaskPos = maskBytes(c.readMaskKey, c.readMaskPos, b[:n])
         }
         rem := c.readRemaining
         rem -= int64(n)
         c.setReadRemaining(rem)
         if c.readRemaining > 0 && c.readErr == io.EOF {
            c.readErr = errUnexpectedEOF
         }
         return n, c.readErr
      }

      if c.readFinal {
         c.messageReader = nil
         return 0, io.EOF
      }

      frameType, err := c.advanceFrame()
      switch {
      case err != nil:
         c.readErr = hideTempErr(err)
      case frameType == TextMessage || frameType == BinaryMessage:
         c.readErr = errors.New("websocket: internal error, unexpected text or binary in Reader")
      }
   }

   err := c.readErr
   if err == io.EOF && c.messageReader == r {
      err = errUnexpectedEOF
   }
   return 0, err
}

ioutil.ReadAll(r) will encounter io When EOF returns, messageReader will be put back in nextReader. This method will read in the for loop until the last frame is read and return io EOF, if the last frame is delayed to the server due to network reasons, the method will block until func (c *Conn) SetReadDeadline(t time.Time) error is triggered to return the upper timeout

 if c.readFinal {
     c.messageReader = nil
     return 0, io.EOF
}

Write message

If the data is large, it needs to be split into multiple frames. The principle is similar to reading messages, so it will not be repeated.

w, err := c.conn.NextWriter(websocket.TextMessage)
if err != nil {
   return
}
w.Write(message)
c.conn.Close()

Keep connected - heartbeat mechanism

In order to ensure that the TCP channel connection between the client and the server is not disconnected, WebSocket uses the heartbeat mechanism to judge the connection status. If no response is received within the timeout period, the connection is considered disconnected, the connection is closed and resources are released. The process is as follows

Sender - > receiver: ping
Receiver - > sender: pong

The operations of ping and pong correspond to the two control frames of WebSocket, and the opcode s are 0x9 and 0xA respectively.

Analysis of gorilla/websocket code

func (c *Client) writePump() {
    ticker := time.NewTicker(pingPeriod)
    defer func() {
        ticker.Stop()
        c.conn.Close()
    }()
    for {
        select {
        case message, ok := <-c.send:
            //Broadcast chat information, omitted
        case <-ticker.C:
      //Send heartbeat regularly
            c.conn.SetWriteDeadline(time.Now().Add(writeWait))
            if err := c.conn.WriteMessage(websocket.PingMessage, nil); err != nil {
                return
            }
        }
    }
}

If no Pong response is received regularly, you can actively close the connection and release resources.

Of course, if the control frame of Ping/Pong is received in the func (c *Conn) advanceFrame() (int, error) method, the hook function registered on conn will be called automatically

case PongMessage:
        if err := c.handlePong(string(payload)); err != nil {
            return noFrame, err
        }
case PingMessage:
        if err := c.handlePing(string(payload)); err != nil {
            return noFrame, err
        }

Write at the end

This paper introduces the websocket protocol with, and shows the actual websocket through the chat example of gorilla/websocket encapsulation library.

I wonder if you have noticed that the above chat example can not be used in the production environment, because there are many actual clients that may establish connections with multiple servers. How do you need to transform it?

How to maintain the mapping between the client and the room in the room? Replace the Hub in the example with Redis?
How do messages guarantee order? Each room has a distributed queue and is granted a management thread to process?

Leave your thoughts and we'll discuss them together

Reference documents

Keywords: Go Back-end http websocket

Added by MrJW on Sun, 16 Jan 2022 22:34:58 +0200

Programming VIP