Yesterday, I frequently received the alarm email about Aborted from MySQL instances. I saw the instance information of the alarm email and the test instance. The priority is not so high. I'll take time to see it later. Maybe it'll be better by then. I continue to paddle with a fluke mentality. However, I received more than 50 alarm emails within an hour. I really can't stand it, Prepare to put down the matter at hand and deal with the alarm problem first; The following is a screenshot of the alarm message:
data:image/s3,"s3://crabby-images/6c698/6c6989ac99a5ec83dfd9c28fb8003e731563dd07" alt=""
From the alarm, it is due to the MySQL instance state variable aborted_ It is caused by the continuous increase of connections. Under normal circumstances, the variable value should remain 0, but now it is accumulating. Under what circumstances will the state value continue to increase? This is the first thing that needs to be clarified to solve the problem;
Let's first look at the specific phenomena:
Aborted_ The value of connections is increasing, and there are a lot of time at the system level_ TCP connections in wait state are from the same IP address, as shown below:
# netstat -tuplan | grep 4306 | grep TIME_WAIT | wc -l 1159 The following is the part tcp connect ...... tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7586 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:5198 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:8314 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:5076 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7610 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7962 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:6546 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:5354 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:6596 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:8342 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7184 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7476 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:5994 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:7322 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:5340 TIME_WAIT - tcp 0 0 192.168.7.xxx:4306 10.21.1.28:6798 TIME_WAIT -
At the instance level, you can see that aborted_ The value of connections keeps increasing
mysql> show status like '%aborted%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | Aborted_clients | 4356 | | Aborted_connects | 112524| +------------------+-------+ 2 rows in set (0.00 sec)
Then, first look at the error log to see if there are relevant records in the error log, but no error information is found when looking at the error log; I feel very confused;
Next, let's take a look at the situation_ Will the values of connections continue to increase?
The following is the explanation of the state quantity in the official document and the possible reasons for the continuous increase of the variable:
Aborted_connects
The number of failed attempts to connect to the MySQL server.
If a client is unable even to connect, the server increments the Aborted_connects status variable. Unsuccessful connection attempts can occur for the following reasons:
- A client attempts to access a database but has no privileges for it.
- A client uses an incorrect password.
- A connection packet does not contain the right information.
- It takes more than connect_timeout seconds to obtain a connect packet.
As can be seen from the above, the above four situations will lead to the constant increase of the variable value:
- The client tried to connect to the database but did not have permission
- The client used the wrong password
- The connected packet contains incorrect information
- This parameter controls connect when the connection time limit is exceeded_ The default timeout for mysql is 10s. Generally, it will not timeout unless the network environment is extremely bad
However, no useful information is found in the error log. Next, open general_log to see the operation information related to the database. After opening, a new continent, general, is found_ The following logs are recorded in the log:
2021-11-18T09:16:19.234261Z 8328 Connect dev_test_rwh@10.21.1.28 on game_report_sg using SSL/TLS 2021-11-18T09:16:19.255753Z 8329 Connect dev_test_rwh@10.21.1.28 on game_report_sg using SSL/TLS 2021-11-18T09:16:19.352826Z 8330 Connect dev_test_rwh@10.21.1.28 on game_report_sg using SSL/TLS 2021-11-18T09:16:19.452269Z 8331 Connect dev_test_rwh@10.21.1.28 on game_report_sg using SSL/TLS
From the log information, it can be found that the connection is initiated from the client 10.21.1.28 through dev_test_rwh account connection database game_report_sg, but it can be found that there is no database game for this account_ report_ SG's access permission causes an error. Under normal circumstances, the client without access permission should have an error message, so the connection business personnel check the relevant connection and finally solve the problem;