WebSocket foundation and application series -- grab a WebSocket package

1 Why do I need WebSocket

WebSocket is produced to meet the growing demand of real-time communication based on Web.

In the traditional Web, in order to realize real-time communication, the general way is to send requests continuously by using HTTP protocol, that is, Polling.

However, this method not only wastes bandwidth (HTTP HEAD is relatively large), but also leads to the occupation of server CPU (accepting requests without information).



Using WebSocket technology can greatly optimize the problems mentioned above:



2 Introduction to websocket

WebSocket protocol was born in 2008 and became an international standard in 2011. All browsers already support it.

It is a network technology for full duplex communication between browser and server, which is provided from HTML5. It belongs to application layer protocol. It is based on TCP transmission protocol and reuses HTTP handshake channel.

Its biggest feature is that the server can actively push information to the client, and the client can also actively send information to the server. It is a real two-way equal dialogue, which belongs to a kind of server push technology.

Other features include:

  1. Based on TCP protocol, the server-side implementation is relatively easy.

  2. It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the handshake phase adopts HTTP protocol, so it is not easy to shield the handshake and can pass through various HTTP proxy servers.

  3. Less control overhead. After the connection is created, when the ws client and server exchange data, the packet header controlled by the protocol is small. Without the header, the header from the server to the client is only 2 ~ 10 bytes (depending on the packet length). For the client to the server, an additional 4-byte mask needs to be added. The HTTP protocol needs to carry a complete header for each communication.

  4. You can send text or binary data.

  5. There is no homology restriction, and the client can communicate with any server.

  6. The protocol identifier is ws (wss if encrypted), and the server URL is the URL.

  7. Support extension. ws protocol defines extensions. Users can extend the protocol or implement customized sub protocols. (for example, support custom compression algorithm, etc.)

2.1 relationship among websocket, HTTP and TCP

In the figure below, we only need to know that HTTP, WebSocket and other protocols are at the highest level of the OSI model: the application layer. The IP protocol works in the network layer (layer 3), and the TCP protocol works in the transport layer (layer 4).

HTTP, WebSocket and other application layer protocols transmit data based on TCP protocol. We can understand these high-level protocols as the encapsulation of TCP. Since everyone uses TCP protocol, the connection and disconnection of everyone should follow the three handshakes and four waves in TCP protocol, but the content sent after connection is different or the disconnection time is different.



2.2 HTML5 and WebSocket

WebSocket API is part of HTML5 standard, but this does not mean that WebSocket must be used in HTML or only in browser based applications.

In fact, many languages, frameworks and servers provide WebSocket support, such as:

  • Libwebsocket.com based on C org

  • Based on node JS socket io

  • Python based ws4py

  • WEB C + + based on socket++

  • Apache support for WebSocket: Apache Module mod_proxy_wstunnel

  • Nginx support for WebSockets: NGINX as a WebSockets Proxy, nginx announcements support for websocket protocol, WebSocket proxying

  • lighttpd support for WebSocket: mod_websocket

3 example and packet capture analysis

3.1 introduction example

Let's take a look at a simple example and have an intuitive feeling. Examples include WebSocket server (Node.js) and WebSocket client.


// Import WebSocket module:
const WebSocket = require('ws');

// Reference Server class:
const WebSocketServer = WebSocket.Server;

// Instantiation:
const wss = new WebSocketServer({
port: 3000
wss.on('connection', function (ws) {
console.log(`[SERVER] connection()`);
ws.on('message', function (message) {
console.log(`[SERVER] Received: ${message}`);
ws.send(`message from server: ${message}`, (err) => {
if (err) {
console.log(`[SERVER] error: ${err}`);


const WebSocket = require('ws');
const ws = new WebSocket('ws://localhost:3000/');

ws.on('open', function open() {
console.log('[CLIENT]: open')

ws.on('close', function close(){
console.log('[CLIENT]: close');
ws.on('message', function incoming(data) {
console.log('[CLIENT]: Received:',data);
ws.on('ping', function(){
console.log('[CLIENT]: ping')

Operation results

Server output

[SERVER] connection()
[SERVER] Received: something

Client output

[CLIENT]: open
[CLIENT]: Received: message from server: something

3.2 how to establish connection from packet capture

Tool preparation

  1. Install Wireshark packet capture software;

  2. Select the local loopback network in Capture;



  1. Write the filter condition TCP in the filter Port = = 3000 (WS service port).

So you can catch the bag you want:



In order to better compare WebSocket connection and data transmission with TCP and HTTP, let's take a look at the packets of TCP and HTTP.

TCP packet capture

Server code

const net = require('net');

const server = net.createServer();

server.on('connection', (socket) => {
socket.on('data', (data) => {
console.log('Receive from client:', data.toString('utf8'));
socket.write('Hello, I am from server.');

server.listen(3000, () => {
console.log('Server is listenning on 3000');

Client code

const net = require('net');

const client = new net.Socket();


client.connect(3000, () => {
console.log('Connected to server.');
client.write('Hello, I am from client.');
client.on('data', (data) => {
console.log('Receive from server:', data);

Packet capture results



Briefly understand TCP FLAGS:

In the TCP layer, there is a FLAGS field, which has the following identifiers: SYN (synchronous online), ACK (ACK nowledge confirmation), PSH (push transmission), FIN (finish), RST (reset), URG (urgent).

Among them, the first five fields are useful for our daily analysis.

Their meanings are:

  • SYN means to establish a connection;

  • FIN means to close the connection;

  • ACK indicates response;

  • PSH indicates DATA transmission;

  • RST indicates connection reset.

Use a picture to clearly show the process of TCP's 3 handshakes and 4 waves.



HTTP packet capture

Server code

const http = require('http');

const server = http.createServer();

server.on('request', (req, res) => {
console.log('request ...');
req.on('data', (data) => {
console.log('data from client ', data.toString('utf-8'));
res.write('Hello, I am Server');


Client code

const request = require('request');

request('', (err, response, body) => {
console.log('Response:', body);

It can be seen that the connection and disconnection are the same as TCP, and the intermediate data transmission is replaced by HTTP protocol data:



3.2 come back to WebSocket's packet capture: how to establish a connection

WebSocket reuses the handshake channel of HTTP. Specifically, the client negotiates the upgrade protocol with the WebSocket server through HTTP request. After the protocol upgrade is completed, the subsequent data exchange follows the WebSocket protocol.



Client: apply for protocol upgrade

First, the client initiates a protocol upgrade request. It can be seen that the standard HTTP message format is adopted, and only GET method is supported.



Connection: Upgrade: indicates to upgrade the protocol.

Upgrade: websocket: indicates to upgrade to websocket protocol.

SEC websocket version: 13: indicates the version of websocket. If the server does not support this version, you need to return one.

SEC websocket key: it is matched with the "sec websocket accept" in the response header of the server. It provides basic protection, such as malicious connection or unintentional connection.

Server: respond to protocol upgrade

The contents returned by the server are as follows. The status code 101 indicates protocol switching:



This completes the protocol upgrade, and the subsequent data interaction will follow the new protocol.

Calculation of SEC websocket accept

SEC websocket accept is calculated according to the SEC websocket key in the header of the client request.

The calculation formula is:

Splice sec websocket key with 258EAFA5-E914-47DA-95CA-C5AB0DC85B11. Calculate the summary through SHA1 and convert it into base64 string. The pseudo code is as follows:

toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 )  )

Verify the previous returned results:

const crypto = require('crypto');

const magic = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
const secWebSocketKey = '8cyP/EvUjJHSMbkOIHFU/w==';

const secWebSocketAccept = crypto.createHash('sha1')
.update(secWebSocketKey + magic)

// EiaKGKO0E/pC8vnArob263aS3XY=

3.3 data frame format

The data exchange between client and server is inseparable from the definition of data frame format. Therefore, before actually explaining data exchange, let's take a look at the data frame format of WebSocket.

The smallest unit of communication between WebSocket client and server is frame, which consists of one or more frames to form a complete message.

Sender: cut the message into multiple frames and send it to the server; Receiving end: receive the message frame and reassemble the associated frame into a complete message.

Overview of data frame format

The unified format of WebSocket data frame is given below, from left to right, and the unit is bit. For example, FIN and RSV1 occupy 1 bit each, and opcode occupies 4 bits. Data, operation code, mask, etc.

Frame format:

  0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |

Packet capture example:



Detailed explanation of data frame format

FIN: 1 bit.

If it is 1, it means it is the last fragment of the message. If it is 0, it means it is not the last fragment of the message.

RSV1, RSV2, RSV3: one bit each.

Generally, it is all 0. When the client and server negotiate to adopt WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value appears and the WebSocket extension is not adopted, the connection error occurs.

Opcode: 4 bits.

The Opcode value determines how the subsequent data payload should be parsed. If the operation code is unknown, the receiver should fail the connection. The optional operation codes are as follows:

  • %x0: indicates a continuation frame. When Opcode is 0, it means that the data transmission adopts data fragmentation, and the currently received data frame is one of the data fragmentation.

  • %x1: indicates that this is a text frame. (frame)

  • %x2: indicates that this is a binary frame. (frame)

  • %x3-7: reserved operation code for subsequent defined non control frames.

  • %x8: indicates that the connection is disconnected.

  • %x8: indicates that this is a ping operation.

  • %xA: indicates that this is a pong operation.

  • %xB-F: reserved operation code for subsequent defined control frames.

Mask: 1 bit.

Indicates whether to mask the data payload. When sending data from the client to the server, the data needs to be masked; When sending data from the server to the client, there is no need to mask the data.



If the data received by the server has not been masked, the server needs to disconnect.

If the Mask is 1, a masking key will be defined in the masking key and used to unmask the data payload. For all data frames sent from the client to the server, the Mask is 1.

Payload length: the length of the data load, in bytes. Is 7 bits, or 7 + 16 bits, or 1 + 64 bits.

Suppose the number Payload length === x, if:

  • X is 0 ~ 126: the length of data is x bytes.

  • x is 126: the next two bytes represent a 16 bit unsigned integer whose value is the length of the data.

  • x is 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data. In addition, if the payload length occupies multiple bytes, the binary expression of the payload length adopts the network order (big endian, the important bits first).

// From ws library sender JS frame function
let payloadLength = data.length;

if (data.length >= 65536) {
offset += 8;
payloadLength = 127;
} elseif (data.length > 125) {
offset += 2;
payloadLength = 126;

Masking key: 0 or 4 bytes. (32 bits)

All data frames transmitted from the client to the server are masked. The Mask is 1 and carries a 4-byte masking key. If Mask is 0, there is no masking key.

Note: the length of load data does not include the length of mask key.

//Generation of mask key
//Each data frame is generated once
const mask = Buffer.alloc(4);
randomFillSync(mask, 0, 4);

Payload data: (x+y) bytes.

Load data: including extended data and application data. Where, the extended data is x bytes and the application data is y bytes.

Extended data: if the extension is not negotiated, the extended data is 0 bytes. All extensions must declare the length of extension data, or how to calculate the length of extension data. In addition, how to use the extension must be negotiated in the handshake phase. If the extended data exists, the load data length must include the length of the extended data.

Application data: any application data occupies the remaining position of the data frame after the extended data (if there is extended data). The length of application data is obtained by subtracting the length of extended data from the length of load data.

Mask Algorithm

Masking key is a 32-bit random number selected by the client. The mask operation does not affect the length of the data payload. The following algorithms are used for mask and unmask operations:

First, suppose:

  • Original octet-i: the ith byte of the original data.

  • transformed-octet-i: the ith byte of the converted data.

  • j: Is the result of i mod 4.

  • masking-key-octet-j: the j-th byte of mask key.

The algorithm is described as follows: after the exclusive or of original-octet-i and masking-key-octet-j, transformed-octet-i is obtained.

j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j

mask and unmask functions in ws Library:

// Generation of mask
// const mask = crypto.randomBytes(4);
// <Buffer 54 63 0c 77>
* Masks a buffer using the given mask.
* @param {Buffer} source The buffer to mask
* @param {Buffer} mask The mask to use
* @param {Buffer} output The buffer where to store the result
* @param {Number} offset The offset at which to start writing
* @param {Number} length The number of bytes to mask.
* @public
function _mask(source, mask, output, offset, length) {
for (var i = 0; i < length; i++) {
output[offset + i] = source[i] ^ mask[i & 3];

* Unmasks a buffer using the given mask.
* @param {Buffer} buffer The buffer to unmask
* @param {Buffer} mask The mask to use
* @public
function _unmask(buffer, mask) {
// Required until https://github.com/nodejs/node/issues/9006 is resolved.
const length = buffer.length;
for (var i = 0; i < length; i++) {
buffer[i] ^= mask[i & 3];

* Frames a piece of data according to the HyBi WebSocket protocol.
* @param {Buffer} data The data to frame
* @param {Object} options Options object
* @param {Number} options.opcode The opcode
* @param {Boolean} options.readOnly Specifies whether `data` can be modified
* @param {Boolean} options.fin Specifies whether or not to set the FIN bit
* @param {Boolean} options.mask Specifies whether or not to mask `data`
* @param {Boolean} options.rsv1 Specifies whether or not to set the RSV1 bit
* @return {Buffer[]} The framed data as a list of `Buffer` instances
* @public
function frame(data, options) {
const merge = data.length < 1024 || (options.mask && options.readOnly);
let offset = options.mask ? 6 : 2;
let payloadLength = data.length;

if (data.length >= 65536) {
offset += 8;
payloadLength = 127;
} elseif (data.length > 125) {
offset += 2;
payloadLength = 126;

const target = Buffer.allocUnsafe(merge ? data.length + offset : offset);

target[0] = options.fin ? options.opcode | 0x80 : options.opcode;
if (options.rsv1) target[0] |= 0x40;

if (payloadLength === 126) {
target.writeUInt16BE(data.length, 2);
} elseif (payloadLength === 127) {
target.writeUInt32BE(0, 2);
target.writeUInt32BE(data.length, 6);

if (!options.mask) {
target[1] = payloadLength;
if (merge) {
data.copy(target, offset);
return [target];

return [target, data];
//Verify it
// const mask = crypto.randomBytes(4);
const mask = Buffer.from('32fd435f', 'hex'); //To restore the example, specify the mask directly here

target[1] = payloadLength | 0x80;
target[offset - 4] = mask[0];
target[offset - 3] = mask[1];
target[offset - 2] = mask[2];
target[offset - 1] = mask[3];

if (merge) {
_mask(data, mask, target, offset, data.length);
return [target];

_mask(data, mask, data, 0, data.length);
return [target, data];

const str = 'something';
const source = Buffer.from(str);
const target = frame(source, {
fin: true, rsv1: false, opcode: 1, mask: true, readOnly: false,
console.log('Payload:', source);
console.log('Masked payload:', target);
Payload: <Buffer 73 6f 6d 65 74 68 69 6e 67>
Masked payload: [ <Buffer 81 89 32 fd 43 5f 41 92 2e 3a 46 95 2a 31 55> ]

You can see that the result is consistent with the figure below.

Original payload:



masked payload:



3.4 data transmission

Once the WebSocket client and server establish a connection, the subsequent operations are based on the transmission of data frames.

WebSocket distinguishes the types of operations according to opcode. For example, 0x0-0x2 indicates disconnected data.




Data slicing

Each message of WebSocket may be divided into multiple data frames. When the receiver of WebSocket receives a data frame, it will judge whether it has received the last data frame of the message according to the value of FIN.

FIN=1 means that the current data frame is the last data frame of the message. At this time, the receiver has received the complete message and can process the message. If FIN=0, the receiver needs to continue listening and receiving other data frames.

In addition, opcode represents the type of data in the scenario of data exchange. 0x01 represents text and 0x02 represents binary. 0x00 is special, which means the continuation frame. As the name suggests, the data frame corresponding to the complete message has not been received.

Data slicing example

The following example is from MDN, which can well demonstrate the fragmentation of data. The client sends messages to the server twice, and the server responds to the client after receiving the message. Here we mainly look at the messages sent by the client to the server.

First message

FIN=1, indicating the last data frame of the current message. After receiving the current data frame, the server can process the message. opcode=0x1, indicating that the client sends text type.

Second message

  1. FIN=0, opcode=0x1, indicating that the message is sent in text type, and the message has not been sent yet, and there are subsequent data frames.

  2. FIN=0, opcode=0x0, indicating that the message has not been sent, and there are subsequent data frames. The current data frame needs to be connected after the previous data frame.

  3. FIN=1, opcode=0x0, indicating that the message has been sent and there is no subsequent data frame. The current data frame needs to be connected after the previous data frame. The server can assemble the associated data frames into a complete message.

    Client: FIN=1, opcode=0x1, msg="hello"
    Server: (process complete message immediately) Hi.
    Client: FIN=0, opcode=0x1, msg="and a"
    Server: (listening, new message containing text started)
    Client: FIN=0, opcode=0x0, msg="happy new"
    Server: (listening, payload concatenated to previous message)
    Client: FIN=1, opcode=0x0, msg="year!"
    Server: (process complete message) Happy new year to you too!

3.5 connection hold + heartbeat

In order to maintain the real-time two-way communication between the client and the server, WebSocket needs to ensure that the TCP channel between the client and the server remains connected. However, for the connection without data exchange for a long time, if it is still maintained for a long time, it may waste the included connection resources.

However, some scenarios are not excluded. Although the client and server have no data exchange for a long time, they still need to maintain the connection. At this time, heartbeat can be used.

Sender - > receiver: ping.

Receiver - > sender: pong.

The operations of ping and pong correspond to the two control frames of WebSocket, and the opcode s are 0x9 and 0xA respectively.

For example, when the WebSocket server sends a ping to the client, it only needs the following code: (ws module)

ws.ping('', false, true);







Today's WebSocket packet capture analysis is here first. Later, we will continue to analyze the engine based on WebSocket IO and socket Please look forward to Io's packet capture analysis.

Transfer from https://mp.weixin.qq.com/s/f96Da8kCluNwv7cxW39gzg

Keywords: http Network Communications

Added by dsds1121 on Thu, 10 Mar 2022 15:49:55 +0200