preface
Before the emergence of WebSocket, the front-end and back-end interaction usually used Ajax for HTTP API communication. However, if there are projects with real-time requirements, such as PVP battle or push message in chat room or game, the front-end needs to poll the back-end regularly. However, too fast polling may lead to too much pressure on the back-end service, and too slow polling may lead to low real-time performance. WebSocket provides two-way communication for browser / client, server and server, maintains the long connection between client and server, and supports two-way push messages
What is WebSocket
WebSocket, like HTTP, belongs to the seventh layer of OSI network protocol, supports two-way communication, and the bottom connection adopts TCP. WebSocket is not a new protocol. It needs to be upgraded by HTTP when it is used, so the port it uses is 80 (or 443, which is upgraded by HTTPS). WebSocket Secure (wss) is the encrypted version of WebSocket (ws). The following figure shows the establishment and communication diagram of WebSocket.
Special attention should be paid to
quick get start
(Note: This article uses golang language, but the principles are reasonable)
There are usually two ways to use Go language. One is to use the net/http library built in Go language to write WebSocket server, and the other is to use the official WebSocket language library of Go language encapsulated by gorilla. Github address: gorilla/websocket (note, of course, there are other web socket encapsulation libraries, which are commonly used at present). gorilla library provides a demo of chat room. This article will start with this example
1. Start service
gorilla/websocket chat room README.md
$ go get github.com/gorilla/websocket $ cd `go list -f '{{.Dir}}' github.com/gorilla/websocket/examples/chat` $ go run *.go
2. Open the browser page and enter http://localhost:8080/
As an example, I opened two pages, as shown in the following figure
Enter hello,myname is james on the left. The two windows display this sentence at the same time. F12 opens the debugging window. In ws under the network tab, the data on the left has just entered uplink and downlink messages, while there are only downlink messages on the right, indicating that the messages are indeed sent to the server through the connection on the left and broadcast to all clients.
3. How to establish a connection
Similarly, open the request header in ws under the network tab in the F12 debugging window above
<img src="http://tva1.sinaimg.cn/large/8dfd1ceegy1gyecwl7xxrj211o0oigwp.jpg" alt="image.png" style="zoom: 25%;" /><img src="http://tva1.sinaimg.cn/large/8dfd1ceegy1gyeczxc6ukj20we0powna.jpg" alt="image.png" style="zoom:25%;" />
Request message:
GET ws://localhost:8080/ws HTTP/1.1 Host: localhost:8080 Connection: Upgrade Pragma: no-cache Cache-Control: no-cache User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Upgrade: websocket Origin: http://localhost:8080 Sec-WebSocket-Version: 13 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9 Cookie: Goland-cd273d2a=102d1f43-0418-4ea3-9959-2975794fdfe3 Sec-WebSocket-Key: 2e1HXejEZhjvYEEVOEE79g== Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
- The GET request starts with ws: / /, which is different from HTTP /path
- Upgrade: websocket and Connection: Upgrade identity upgrade HTTP to WebSocket
- SEC websocket key: 2e1HXejEZhjvYEEVOEE79g = = where 2e1HXejEZhjvYEEVOEE79g = = base64 of 6 random bytes is used to identify a connection, not for encryption
- SEC WebSocket version: 13 specifies the protocol version of WebSocket
Reply message:
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: WPaVPwi6nk4cFFxS8NJ3BIwAtNE=
101 indicates that the HTTP protocol of this connection will be changed. The changed protocol is the WebSocket protocol specified by Upgrade: websocket
Where sec websocket accept: wpavpwi6nk4cffxs8nj3biwatne = the server obtains the SEC websocket keybase64 in the Request Header, decodes it, splices it with 258EAFA5-E914-47DA-95CA-C5AB0DC85B11, calculates the summary through SHA1, encodes it with base64, and fills it in the SEC websocket accept field
Therefore, the functions of SEC websocket key and SEC websocket accept are mainly to prevent the client from accidentally upgrading websocket, that is, to verify the handshake of websocket and to avoid accepting non websocket clients (such as HTTP clients) For details, see RFC6455 or What is Sec-WebSocket-Key for?
4. gorilla/websocket code
Viewing the demo code of chat above is a good material for learning
var addr = flag.String("addr", ":8080", "http service address") func serveHome(w http.ResponseWriter, r *http.Request) { log.Println(r.URL) if r.URL.Path != "/" { http.Error(w, "Not found", http.StatusNotFound) return } if r.Method != "GET" { http.Error(w, "Method not allowed", http.StatusMethodNotAllowed) return } http.ServeFile(w, r, "home.html") } func main() { flag.Parse() hub := newHub() go hub.run() http.HandleFunc("/", serveHome) http.HandleFunc("/ws", func(w http.ResponseWriter, r *http.Request) { serveWs(hub, w, r) }) err := http.ListenAndServe(*addr, nil) if err != nil { log.Fatal("ListenAndServe: ", err) } }
serveHome mainly returns the HTML resources of the chat room, and serveWs mainly accepts the HTTP client upgrade websocket request and handles the broadcast of chat information (all ws connections to the whole room)
<!DOCTYPE html> <html lang="en"> <head> <title>Chat Example</title> <script type="text/javascript"> window.onload = function () { var conn; var msg = document.getElementById("msg"); var log = document.getElementById("log"); function appendLog(item) { var doScroll = log.scrollTop > log.scrollHeight - log.clientHeight - 1; log.appendChild(item); if (doScroll) { log.scrollTop = log.scrollHeight - log.clientHeight; } } document.getElementById("form").onsubmit = function () { if (!conn) { return false; } if (!msg.value) { return false; } conn.send(msg.value); msg.value = ""; return false; }; if (window["WebSocket"]) { conn = new WebSocket("ws://" + document.location.host + "/ws"); conn.onclose = function (evt) { var item = document.createElement("div"); item.innerHTML = "<b>Connection closed.</b>"; appendLog(item); }; conn.onmessage = function (evt) { var messages = evt.data.split('\n'); for (var i = 0; i < messages.length; i++) { var item = document.createElement("div"); item.innerText = messages[i]; appendLog(item); } }; } else { var item = document.createElement("div"); item.innerHTML = "<b>Your browser does not support WebSockets.</b>"; appendLog(item); } }; </script> <style type="text/css"> html { overflow: hidden; } ... </style> </head> <body> <div id="log"></div> <form id="form"> <input type="submit" value="Send" /> <input type="text" id="msg" size="64" autofocus /> </form> </body> </html>
The above is home HTML here is mainly about conn.onclose processing connection closing, conn.onmessage processing received messages, conn.send processing sent messages, and of course conn.onopen processing connection establishment, which will not be repeated here
// serveWs handles websocket requests from the peer. func serveWs(hub *Hub, w http.ResponseWriter, r *http.Request) { conn, err := upgrader.Upgrade(w, r, nil) if err != nil { log.Println(err) return } client := &Client{hub: hub, conn: conn, send: make(chan []byte, 256)} client.hub.register <- client // Allow collection of memory referenced by the caller by doing all work in // new goroutines. go client.writePump() go client.readPump() }
conn, err := upgrader.Upgrade(w, r, nil) is mainly used to upgrade HTTP to WebSocket, and the bottom layer uses http Hijacker hijacks the underlying TCP connection, and then you can use this connection for two - terminal communication
// Hub maintains the set of active clients and broadcasts messages to the // clients. type Hub struct { // Registered clients. clients map[*Client]bool // Inbound messages from the clients. broadcast chan []byte // Register requests from the clients. register chan *Client // Unregister requests from clients. unregister chan *Client }
Hub mainly maintains all connections in the room. When a client is established and added to the clients map, it will be removed from the clients map every time the connection is disconnected
go client.readPump() is responsible for writing the messages sent by the client to broadcast, while go client Writepump () is responsible for broadcasting the messages in the broadcast to all clients in the room recorded by clients. The details of the websocket protocol will be discussed later. We will continue to look at the source code.
WebSocket protocol
ws has been quickly introduced above. How to define the communication format of WebSocket? Let's introduce it in this section
Picture up to rfc6455#section-5.2
FIN: 1 bit
1 indicates the last post slice of the sliced message
RSV1, RSV2, RSV3: 1 bit each
Generally all 0. When the client and server negotiate the extension, the value is defined by the negotiation
Opcode: 4 bit s
The Opcode value determines how the subsequent data payload should be parsed
- %x0: represents a continuation frame. When Opcode is 0, it means that the data transmission adopts data fragmentation, and the currently received data frame is one of the data fragmentation.
- %x1: indicates that this is a text frame
- %x2: indicates that this is a binary frame
- %x3-7: reserved operation code for subsequent defined non control frames.
- %x8: indicates that the connection is disconnected.
- %x9: indicates that this is a ping operation.
- %xA: indicates that this is a pong operation.
- %xB-F: reserved operation code for subsequent defined control frames
Mask: 1 bit
Whether to mask the data payload. The mask can only be used when the client sends data to the server.
If the Mask is 1, a masking key will be defined in the masking key and used to unmask the data payload. For all data frames sent from the client to the server, the Mask is 1.
The mask is mainly used to prevent malicious clients from caching the resources of other websites on the reverse proxy, causing users of other websites to use malicious code prevented by malicious attackers.
Payload length: the length of the data payload, in bytes. 7 bit, or 7+16 bit, or 1+64 bit.
Suppose the number Payload length === x, if
- X is 0 ~ 126: the length of data is x bytes.
- x is 126: the next 2 bytes represent a 16 bit unsigned integer whose value is the length of the data.
- x is 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data.
In addition, if the payload length occupies more than one byte, the binary expression of payload length adopts the network order (big endian, the important bits first).
Masking key: 0 or 4 bytes (32 bit)
All data frames transmitted from the client to the server are masked. The Mask is 1 and carries a 4-byte masking key. If Mask is 0, there is no masking key.
Note: the length of load data does not include the length of mask key.
Payload data: (x+y) bytes
Load data: including extended data and application data. Where, the extended data is x bytes and the application data is y bytes.
Extended data: if the extension is not negotiated, the extended data is 0 bytes. All extensions must declare the length of extension data, or how to calculate the length of extension data. In addition, how the extension is used must be negotiated during the handshake phase. If the extended data exists, the load data length must include the length of the extended data.
Application data: any application data occupies the remaining position of the data frame after the extended data (if there is extended data). The length of application data is obtained by subtracting the length of extended data from the length of load data
Example
I wonder if you are confused about the continuation pin in FIN and Opcode?
First message
FIN=1, indicating the last data frame of the current message. After receiving the current data frame, the server can process the message. opcode=0x1, indicating that the client sends text type.
Second message
- FIN=0, opcode=0x1, indicating that the message is sent in text type, and the message has not been sent yet, and there are subsequent data frames.
- FIN=0, opcode=0x0, indicating that the message has not been sent, and there are subsequent data frames. The current data frame needs to be connected after the previous data frame.
- FIN=1, opcode=0x0, indicating that the message has been sent and there is no subsequent data frame. The current data frame needs to be connected after the previous data frame. The server can assemble the associated data frames into a complete message.
Client: FIN=1, opcode=0x1, msg="hello" Server: (process complete message immediately) Hi. Client: FIN=0, opcode=0x1, msg="and a" Server: (listening, new message containing text started) Client: FIN=0, opcode=0x0, msg="happy new" Server: (listening, payload concatenated to previous message) Client: FIN=1, opcode=0x0, msg="year!" Server: (process complete message) Happy new year to you too!
Source code analysis of gorilla/websocket
Read message
gorilla/websocket uses the helper method of (c *Conn) ReadMessage() (messageType int, p []byte, err error) to quickly read a message. If a message consists of multiple data frames, it will be spliced into a complete message and returned to the business layer
// ReadMessage is a helper method for getting a reader using NextReader and // reading from that reader to a buffer. func (c *Conn) ReadMessage() (messageType int, p []byte, err error) { var r io.Reader messageType, r, err = c.NextReader() if err != nil { return messageType, nil, err } p, err = ioutil.ReadAll(r) return messageType, p, err }
This method mainly obtains a Reader, and then reads all the data in the Reader
func (c *Conn) NextReader() (messageType int, r io.Reader, err error) { // Close previous reader, only relevant for decompression. if c.reader != nil { c.reader.Close() c.reader = nil } c.messageReader = nil c.readLength = 0 for c.readErr == nil { frameType, err := c.advanceFrame() if err != nil { c.readErr = hideTempErr(err) break } if frameType == TextMessage || frameType == BinaryMessage { c.messageReader = &messageReader{c} c.reader = c.messageReader if c.readDecompress { c.reader = c.newDecompressionReader(c.reader) } return frameType, c.reader, nil } }
Since the Reader is not concurrent and safe, each subsequent process handles the Reader's read operation. c.advanceFrame() is the core code, which mainly analyzes the type of the message. If a message is divided into multiple frames, the message type is given in the first frame. The format of the parsed data frame is consistent with that explained in the above protocol. You can view the source code in detail, which will not be repeated here.
Are you curious about why ioutil Readall (R) can read the data of the whole message. What if a message is divided into multiple data frames? Analyze the source code together
func (r *messageReader) Read(b []byte) (int, error) { c := r.c if c.messageReader != r { return 0, io.EOF } for c.readErr == nil { if c.readRemaining > 0 { if int64(len(b)) > c.readRemaining { b = b[:c.readRemaining] } n, err := c.br.Read(b) c.readErr = hideTempErr(err) if c.isServer { c.readMaskPos = maskBytes(c.readMaskKey, c.readMaskPos, b[:n]) } rem := c.readRemaining rem -= int64(n) c.setReadRemaining(rem) if c.readRemaining > 0 && c.readErr == io.EOF { c.readErr = errUnexpectedEOF } return n, c.readErr } if c.readFinal { c.messageReader = nil return 0, io.EOF } frameType, err := c.advanceFrame() switch { case err != nil: c.readErr = hideTempErr(err) case frameType == TextMessage || frameType == BinaryMessage: c.readErr = errors.New("websocket: internal error, unexpected text or binary in Reader") } } err := c.readErr if err == io.EOF && c.messageReader == r { err = errUnexpectedEOF } return 0, err }
ioutil.ReadAll(r) will encounter io When EOF returns, messageReader will be put back in nextReader. This method will read in the for loop until the last frame is read and return io EOF, if the last frame is delayed to the server due to network reasons, the method will block until func (c *Conn) SetReadDeadline(t time.Time) error is triggered to return the upper timeout
if c.readFinal { c.messageReader = nil return 0, io.EOF }
Write message
If the data is large, it needs to be split into multiple frames. The principle is similar to reading messages, so it will not be repeated.
w, err := c.conn.NextWriter(websocket.TextMessage) if err != nil { return } w.Write(message) c.conn.Close()
Keep connected - heartbeat mechanism
In order to ensure that the TCP channel connection between the client and the server is not disconnected, WebSocket uses the heartbeat mechanism to judge the connection status. If no response is received within the timeout period, the connection is considered disconnected, the connection is closed and resources are released. The process is as follows
- Sender - > receiver: ping
- Receiver - > sender: pong
The operations of ping and pong correspond to the two control frames of WebSocket, and the opcode s are 0x9 and 0xA respectively.
Analysis of gorilla/websocket code
func (c *Client) writePump() { ticker := time.NewTicker(pingPeriod) defer func() { ticker.Stop() c.conn.Close() }() for { select { case message, ok := <-c.send: //Broadcast chat information, omitted case <-ticker.C: //Send heartbeat regularly c.conn.SetWriteDeadline(time.Now().Add(writeWait)) if err := c.conn.WriteMessage(websocket.PingMessage, nil); err != nil { return } } } }
If no Pong response is received regularly, you can actively close the connection and release resources.
Of course, if the control frame of Ping/Pong is received in the func (c *Conn) advanceFrame() (int, error) method, the hook function registered on conn will be called automatically
case PongMessage: if err := c.handlePong(string(payload)); err != nil { return noFrame, err } case PingMessage: if err := c.handlePing(string(payload)); err != nil { return noFrame, err }
Write at the end
This paper introduces the websocket protocol with, and shows the actual websocket through the chat example of gorilla/websocket encapsulation library.
I wonder if you have noticed that the above chat example can not be used in the production environment, because there are many actual clients that may establish connections with multiple servers. How do you need to transform it?
- How to maintain the mapping between the client and the room in the room? Replace the Hub in the example with Redis?
- How do messages guarantee order? Each room has a distributed queue and is granted a management thread to process?
Leave your thoughts and we'll discuss them together