40 - Case: the network request delay becomes larger. What should I do?

In addition to the increase of network delay caused by DDoS, there are other reasons for network delay, such as

Network transmission is slow, resulting in delay
The message processing of Linux kernel protocol stack is slow, resulting in delay
Application data processing is slow, resulting in delays, etc

Network delay

When it comes to network delay, you may easily think of its meaning - the time taken for network data transmission
Note, however, that this time may be one-way, referring to the one-way time from the source address to the destination address
It may also be bidirectional, that is, it takes time to send a response from the source address to the destination address, and then send a response back from the destination address

Generally, two-way round-trip communication delay is more commonly used. For example, the result of ping test is round-trip delay RTT (round trip time)

In addition to network latency
Another commonly used indicator is application latency, which refers to the time taken from the application to receive the request and then to send back the response
Generally, application delay also refers to round-trip delay, which is the sum of network data transmission time and data processing time

In the Linux network basics, you can use ping to test network latency
ping is based on ICMP Protocol. It obtains the round-trip delay by calculating the time difference between ICMP echo response message and ICMP echo request message
This process does not require special authentication and is often used by many network attacks, such as port scanning tool nmap, packet forming tool hping3 and so on

Therefore, in order to avoid these problems, many network services will prohibit ICMP
This makes it impossible for us to use ping to test the availability and round trip delay of network services
At this time, the TCP and UDP modes of traceroute or hping3 can be used to obtain the network delay

For example, Baidu Com as an example, you can execute the following hping3 command to test the network delay from the current machine to Baidu search server

# -c means sending 3 requests
# -S indicates setting TCP SYN
# -p means the port number is 80
root@alnk:~# hping3 -c 3 -S -p 80 baidu.com
HPING baidu.com (eth0 220.181.38.251): S set, 40 headers + 0 data bytes
len=46 ip=220.181.38.251 ttl=48 id=60840 sport=80 flags=SA seq=0 win=8192 rtt=47.9 ms
len=46 ip=220.181.38.251 ttl=49 id=62209 sport=80 flags=SA seq=1 win=8192 rtt=47.8 ms
len=46 ip=220.181.38.251 ttl=49 id=27368 sport=80 flags=SA seq=2 win=8192 rtt=47.8 ms

--- baidu.com hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 47.8/47.8/47.9 ms
##
from hping3 As can be seen from the results, the round trip delay RTT For 47 ms

Of course, similar results can be obtained with traceroute

# --TCP means using the TCP protocol
# -p indicates the port number
# -n means that reverse domain name resolution is not performed on the IP address in the result
root@alnk:~# traceroute --tcp -p 80 -n baidu.com
traceroute to baidu.com (220.181.38.148), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  183.60.190.109  5.303 ms * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  10.166.96.36  46.231 ms 220.181.38.148  42.693 ms *
## 
traceroute Three packets will be sent in each hop of the route, and the round-trip delay will be output after receiving the response
 If there is no response or the response times out (default 5) s)，An asterisk will be output

Analysis ideas and cases when network delay increases

Case preparation

Ubuntu 18.04
 Machine configuration: 2 CPU，4GB Memory
 Pre installation docker,hping3,tcpdump,curl,wrk,Wireshark Other tools

In terminal 1, execute the following command to run the official Nginx, which will listen on port 80
```
root@alnk:~# docker run --network=host --name=good -itd nginx
```
Continue to execute the following command in terminal 1 to run the case application, which will listen to port 8080
```
root@alnk:~# docker run --name nginx --network=host -itd feisky/nginx:latency
```

Execute curl command in terminal 2 to verify that the two containers have been started normally

# 80 normal
[root@local_deploy_192-168-1-5 ~]# curl  http://124.71.83.217
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

# 8080 normal
[root@local_deploy_192-168-1-5 ~]# curl  http://124.71.83.217:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

In terminal 2, execute the following command to test the delay of port 80 and port 8080 of the case machine respectively

# Test 80 port delay
[root@local_deploy_192-168-1-5 ~]# hping3 -c 3 -S -p 80 124.71.83.217
HPING 124.71.83.217 (eth0 124.71.83.217): S set, 40 headers + 0 data bytes
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=80 flags=SA seq=0 win=64240 rtt=9.8 ms
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=80 flags=SA seq=1 win=64240 rtt=8.5 ms
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=80 flags=SA seq=2 win=64240 rtt=7.4 ms

--- 124.71.83.217 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 7.4/8.6/9.8 ms

# Test 8080 port delay
[root@local_deploy_192-168-1-5 ~]# hping3 -c 3 -S -p 8080 124.71.83.217
HPING 124.71.83.217 (eth0 124.71.83.217): S set, 40 headers + 0 data bytes
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=8080 flags=SA seq=0 win=64240 rtt=8.5 ms
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=8080 flags=SA seq=1 win=64240 rtt=7.7 ms
len=46 ip=124.71.83.217 ttl=48 DF id=0 sport=8080 flags=SA seq=2 win=64240 rtt=7.6 ms

--- 124.71.83.217 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 7.6/7.9/8.5 ms
##
# From this output, you can see that the delay of the two ports is almost 8ms
# However, this is only the case for a single request. What happens if you change to concurrent requests?

In terminal 2, execute the following new command to test the performance of port 80 and port 8080 when the case machine is concurrent for 100

# Test 80 port performance
[root@local_deploy_192-168-1-5 ~]# wrk --latency -c 100 -t 2 --timeout 2 http://124.71.83.217/
Running 10s test @ http://124.71.83.217/
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   193.99ms  346.67ms   1.86s    87.16%
    Req/Sec   331.23      1.05k    6.59k    94.97%
  Latency Distribution
     50%    8.35ms
     75%  279.98ms
     90%  654.18ms
     99%    1.61s
  6564 requests in 10.01s, 5.34MB read
  Socket errors: connect 0, read 0, write 0, timeout 83
Requests/sec:    655.87
Transfer/sec:    546.72KB

# Test 8080 port performance
[root@local_deploy_192-168-1-5 ~]# wrk --latency -c 100 -t 2 --timeout 2 http://124.71.83.217:8080/
Running 10s test @ http://124.71.83.217:8080/
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   311.29ms  398.66ms   1.94s    83.58%
    Req/Sec   179.02    147.02     1.00k    93.50%
  Latency Distribution
     50%   57.17ms
     75%  463.84ms
     90%  827.38ms
     99%    1.69s
  3569 requests in 10.01s, 2.91MB read
  Socket errors: connect 0, read 0, write 0, timeout 31
Requests/sec:    356.63
Transfer/sec:    297.52KB
##
# As can be seen from the above two outputs, the average delay of the official Nginx (listening on port 80) is 193.99ms|
# The average delay of Nginx (listening on port 8080) is 311.29ms
##
Combine above hping3 The output is easy to find, case Nginx The latency increases a lot under concurrent requests. What's going on?

Use tcpdump to capture the received and sent network packets, and analyze whether there are problems in the network receiving and sending process
In terminal 1, execute the following tcpdump command to grab the network packets sent and received on port 8080 and save them to nginx Pcap file
```
root@alnk:~# tcpdump -nn tcp port 8080 -w nginx.pcap
```

In terminal 2, re execute the wrk command

[root@local_deploy_192-168-1-5 ~]# wrk --latency -c 100 -t 2 --timeout 2 http://124.71.83.217:8080/

When the wrk command ends, switch back to terminal 1 again and press Ctrl+C to end the tcpdump command
Then, grab the nginx Pcap, copy it to the machine equipped with Wireshark, and open it with Wireshark
Due to the large number of network packets, you can filter them first
For example, after selecting a package, you can right-click and select "Follow" - > "TCP Stream", as shown in the following figure

Then, close the pop-up dialog box and return to the Wireshark main window,
At this time, you will find that Wireshark has automatically set a filter expression TCP stream eq 24

You can see each request and response of the TCP connection from three handshakes. Of course, this may not be intuitive enough
You can continue to click Statistics - > flow graph in the menu bar
Select "Limit to display filter" and set the Flow type to "TCP Flows"

Note that the client is on the left and the Nginx server is on the right
As can be seen from this figure, the first three handshakes and the first HTTP request and response are still very fast
However, the second HTTP request is slow. In particular, the client sends an ACK response 40 ms after receiving the first packet from the server
Do you think of anything when you see the value of 40ms? In fact, this is the minimum timeout for TCP delayed ack

This is an optimization mechanism for TCP ACK, that is, you don't have to send an ACK every request
Instead, wait for a while (e.g. 40ms) to see if there is a "free ride"
If there are other packets to be sent during this period, send them with ACK
Of course, if you can't wait for other packets, send an ACK separately after timeout
Because 40ms occurred on the client in this case, it is reasonable to suspect that the client started the delayed confirmation mechanism
The client here is actually the wrk running earlier

Query the TCP document (execute man tcp) and find that the quick confirmation mode is enabled only when TCP_QUICKACK is specially set for the TCP socket
Otherwise, by default, the delayed confirmation mechanism is adopted

To verify the conjecture and confirm the behavior of wrk, you can use strace to observe which TCP options wrk sets for sockets
In terminal 2, execute the following command
```
[root@local_deploy_192-168-1-5 ~]# strace -f wrk --latency -c 100 -t 2 --timeout 2 http://124.71.83.217:8080/ 
[pid 32138] setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
```
As you can see, wrk only sets TCP_NODELAY option without setting TCP_QUICKACK
This shows that wrk adopts delayed acknowledgement, which explains the above 40ms problem

But don't forget that this is just the behavior of the client. Normally, the Nginx server should not be affected by this behavior
Did you miss any clues when analyzing network packets? Go back to Wireshark and look again

Carefully observe the Wireshark interface. Packet 1173 is the delayed ACK packet just mentioned
1175 in the next line is the second packet sent by Nginx
It is combined with package 697 to form a complete HTTP response (the ACK number is 85)

The second packet is not sent together with the previous packet No. 697, but is not sent until No. 1173 after the client's ACK to the first packet
This looks a bit like delayed acknowledgement, but instead of ACK, it sends data

I think of something here -- Nagle algorithm
Before further analyzing the case, briefly introduce the algorithm

Nagle algorithm is an optimization algorithm used to reduce the number of small packets sent in TCP protocol in order to improve the utilization of actual bandwidth
For example, when the payload is only 1 byte, plus 20 bytes occupied by TCP header and IP header respectively, the whole network packet is 41 bytes
In this way, the actual bandwidth utilization is only 2.4% (1 / 41)
Generally speaking, if the whole network bandwidth is occupied by such small packets, the effective utilization rate of the whole network is too low
Nagle algorithm is to solve this problem
It improves the utilization of network bandwidth by merging TCP packets
Nagle algorithm stipulates that there can be at most one unconfirmed incomplete packet on a TCP connection
No other packets will be sent until the ACK of this packet is received
These small packets will be combined and sent out with the same packet after receiving the ACK

Obviously, the idea of Nagle algorithm itself is very good, but after knowing the default delay confirmation mechanism of Linux
I don't think so, because when they are used together, the network delay will be obvious. As shown in the figure below

After Sever sends the first packet, because the Client turns on delayed confirmation, it needs to wait 40ms before replying to ACK
At the same time, since Nagle is enabled on the Server side, and the ACK of the first packet has not been received at this time, the Server will be waiting here all the time
The Client will not reply to the ACK until 40ms timeout, and then the Server will continue to send the second packet

Since it may be Nagle's problem, how do you know if the case Nginx has opened Nagle?
Query the tcp document and you will know that only tcp is set_ The Nagle algorithm is disabled only after nodelay
So just check the TCP of Nginx_ The nodelay option is OK

In terminal 1, execute the following command to view the configuration of the case Nginx

root@alnk:~# docker exec nginx cat /etc/nginx/nginx.conf | grep tcp_nodelay
    tcp_nodelay    off;
##
# Sure enough, I saw the TCP of the case Nginx_ Nodelay is off. Setting it to on should solve the problem
##
# Is the problem solved after the change? Naturally, it needs to be verified
# The modified application has been packaged into the Docker image. Execute the following command in terminal 1 to start it
# Delete case application
root@alnk:~# docker rm -f nginx
# Start optimized application
root@alnk:~# docker run --name nginx --network=host -itd feisky/nginx:nodelay

Terminal 2, re execute wrk test delay

root@alnk:~# wrk --latency -c 100 -t 2 --timeout 2 http://124.71.83.217:8080/
Running 10s test @ http://192.168.0.30:8080/
 2 threads and 100 connections
 Thread Stats Avg Stdev Max +/- Stdev
 Latency 9.58ms 14.98ms 350.08ms 97.91%
 Req/Sec 6.22k 282.13 6.93k 68.50%
 Latency Distribution
 50% 7.78ms
 75% 8.20ms
 90% 9.02ms
 99% 73.14ms
 123990 requests in 10.01s, 100.50MB read
Requests/sec: 12384.04
Transfer/sec: 10.04MB
##
# Sure enough, the delay has been reduced to 9ms, which is the same as the official Nginx image tested
# Nginx turns on TCP by default_ Nodelay

Terminal I, stop case
```
root@alnk:~# docker rm -f nginx good
```

Summary

Today, I learned the analysis method after the network delay increases
Network delay is the core network performance index
Due to the influence of network transmission, network packet processing and other factors, network delay is inevitable
However, excessive network delay will directly affect the user experience

Therefore, after finding that the network delay increases, you can use traceroute, hping3, tcpdump, Wireshark, strace and other tools
To locate potential problems in the network

Use hping3, wrk and other tools to confirm whether the network delay of single request and concurrent request is normal
Use traceroute to confirm whether the route is correct and check the delay of each hop gateway in the route
Use tcpdump and Wireshark to confirm whether the network packet is sent and received normally
Use strace, etc. to observe whether the application calls the network socket normally

In this way, you can check layer by layer from routing, sending and receiving network packets, and then to applications, until you locate the root cause of the problem

Added by tkreinbring on Thu, 30 Dec 2021 15:28:49 +0200

Programming VIP

40 - Case: the network request delay becomes larger. What should I do?

Network delay

Analysis ideas and cases when network delay increases

Summary

Popular Keywords