Author: Yu Zhenxing
As a member of aikesheng DBA team, he is keen on technology sharing and writing technical documents.
Source: original contribution
*It is produced by aikesheng open source community. The original content cannot be used without authorization. For reprint, please contact Xiaobian and indicate the source.
background information
Due to security factors, customers need to upgrade Mysql to version 8.0.26. However, due to the incompatibility change of some terms in 8.0.26, exceptions will occur to the monitoring and collection tools / programs. For this situation, MySQL official also provides a solution, that is, a new parameter terminology is added_ use_ Previous, when this parameter is set to before_ 8_ 0_ At 26, the term form before version 8.0.26 can be maintained. For example, the term form of master and slave can still be maintained. The following is the description fragment summary of the official document 8.0.26 release note:
Incompatible Change: From MySQL 8.0.26, new aliases or replacement names are provided for most remaining identifiers that contain the terms "master", which is changed to "source"; "slave", which is changed to "replica"; and "mts" (for "multithreaded slave"), which is changed to "mta" (for "multithreaded applier"). Help text is also changed where applicable to use the new names. If the incompatible changes do have an impact for you, you can set the new system variable terminology_use_previous to BEFORE_8_0_26 to make MySQL Server use the old versions of the names for the objects specified in the previous list. This enables monitoring tools that rely on the old names to continue working until they can be updated to use the new names. The system variable can be set with session scope to support individual functions, or global scope to be a default for all new sessions. When global scope is used, the slow query log contains the old versions of the names.
After the upgrade to 8.0.26 is completed, the database starts normal monitoring and collection, and MySQL crash is frequently triggered. It is necessary to analyze the cause. The following analysis logs are simulated by the test environment.
fault analysis
Looking at the crash log, we can see that MySQL executes a select * from information message at the end of the crash_ schema. From the stack information of crash, the processlist command basically executes an ordinary sql, which triggers the exception of libstdc library, causing the database to send a signal to abort MySQL directly. I don't know the code very well. Basic from error message_ string::_ S_ According to the keyword search of construct NULL not valid, this is a C + + error, which roughly means that an instantiated logic error causes the program to stop. You cannot use NULL to construct basic in the what() function_ String object.
terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid 10:11:30 UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7f9924000c20 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f9980596d80 thread_stack 0x46000 /data/mysql/3320/base/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x2251b8e] /data/mysql/3320/base/bin/mysqld(handle_fatal_signal+0x323) [0x10fc283] /lib64/libpthread.so.0(+0xf5f0) [0x7f99908d25f0] /lib64/libc.so.6(gsignal+0x37) [0x7f998eb1d337] /lib64/libc.so.6(abort+0x148) [0x7f998eb1ea28] /lib64/libstdc++.so.6(__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f998f42da95] /lib64/libstdc++.so.6(+0x5ea06) [0x7f998f42ba06] /lib64/libstdc++.so.6(+0x5ea33) [0x7f998f42ba33] /lib64/libstdc++.so.6(+0x5ec53) [0x7f998f42bc53] /lib64/libstdc++.so.6(std::__throw_logic_error(char const*)+0x77) [0x7f998f480857] /data/mysql/3320/base/bin/mysqld() [0xea8720] /lib64/libstdc++.so.6(std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)+0x38) [0x7f998f48c998] /data/mysql/3320/base/bin/mysqld(THD::proc_info(System_variables const&) const+0x43) [0xf35d33] /data/mysql/3320/base/bin/mysqld() [0x10216b0] /data/mysql/3320/base/bin/mysqld(Fill_process_list::operator()(THD*)+0x2c8) [0x1024b58] /data/mysql/3320/base/bin/mysqld(Global_THD_manager::do_for_all_thd_copy(Do_THD_Impl*)+0x220) [0xebc6d0] /data/mysql/3320/base/bin/mysqld() [0x1020a13] /data/mysql/3320/base/bin/mysqld(do_fill_information_schema_table(THD*, TABLE_LIST*, Item*)+0x83) [0x1025923] /data/mysql/3320/base/bin/mysqld(MaterializeInformationSchemaTableIterator::Init()+0x91) [0x14550a1] /data/mysql/3320/base/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x345) [0x1085465] /data/mysql/3320/base/bin/mysqld(Query_expression::execute(THD*)+0x2c) [0x10856cc] /data/mysql/3320/base/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x2c5) [0x101a885] /data/mysql/3320/base/bin/mysqld(mysql_execute_command(THD*, bool)+0xac8) [0xfbc2c8] /data/mysql/3320/base/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x410) [0xfc0b00] /data/mysql/3320/base/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1f10) [0xfc2e90] /data/mysql/3320/base/bin/mysqld(do_command(THD*)+0x174) [0xfc3bf4] /data/mysql/3320/base/bin/mysqld() [0x10ed858] /data/mysql/3320/base/bin/mysqld() [0x2778d4c] /lib64/libpthread.so.0(+0x7e65) [0x7f99908cae65] /lib64/libc.so.6(clone+0x6d) [0x7f998ebe588d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f9924019488): select * from information_schema.processlist Connection ID (thread ID): 9 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
If it's a problem of program logic, we're a little powerless, but we still try to collect more relevant information and observe. We configure to automatically generate coredump when MySQL crash. Through sorting and viewing, we can see the following information (fragments have been deleted, and only some key fragments have been intercepted)_ S_ There is a problem when defining the structure of construct__ throw_logic_error, basically consistent with the information recorded in the crash log (limited capacity, unable to analyze the code logic in detail). For the generation method of coredump, please refer to the link: https://opensource.actionsky.com/20200424-mysql/
Thread 1 (Thread 0x7f1ee80c9700 (LWP 12545)): #0 0x00007f1efa7e3a01 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000010fc2ad in handle_fatal_signal () #2 <signal handler called> #3 0x00007f1ef8a33277 in raise () from /lib64/libc.so.6 #4 0x00007f1ef8a34968 in abort () from /lib64/libc.so.6 #5 0x00007f1ef93427d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #6 0x00007f1ef9340746 in ?? () from /lib64/libstdc++.so.6 #7 0x00007f1ef9340773 in std::terminate() () from /lib64/libstdc++.so.6 #8 0x00007f1ef9340993 in __cxa_throw () from /lib64/libstdc++.so.6 #9 0x00007f1ef9395597 in std::__throw_logic_error(char const*) () from /lib64/libstdc++.so.6 #10 0x0000000000ea8720 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () #11 0x00007f1ef93a1778 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6 #12 0x0000000000f35d33 in THD::proc_info(System_variables const&) const () #13 0x00000000010216b0 in thread_state_info(THD*, THD*) () #14 0x0000000001024b58 in Fill_process_list::operator()(THD*) () #15 0x0000000000ebc6d0 in Global_THD_manager::do_for_all_thd_copy(Do_THD_Impl*) () #16 0x0000000001020a13 in fill_schema_processlist(THD*, TABLE_LIST*, Item*) () #17 0x0000000001025923 in do_fill_information_schema_table(THD*, TABLE_LIST*, Item*) () #18 0x00000000014550a1 in MaterializeInformationSchemaTableIterator::Init() () #19 0x0000000001085465 in Query_expression::ExecuteIteratorQuery(THD*) () #20 0x00000000010856cc in Query_expression::execute(THD*) () #21 0x000000000101a885 in Sql_cmd_dml::execute(THD*) () #22 0x0000000000fbc2c8 in mysql_execute_command(THD*, bool) () #23 0x0000000000fc0b00 in dispatch_sql_command(THD*, Parser_state*) () #24 0x0000000000fc2e90 in dispatch_command(THD*, COM_DATA const*, enum_server_command) () #25 0x0000000000fc3bf4 in do_command(THD*) () #26 0x00000000010ed858 in handle_connection () #27 0x0000000002778d4c in pfs_spawn_thread () #28 0x00007f1efa7dee25 in start_thread () from /lib64/libpthread.so.0 #29 0x00007f1ef8afbbad in clone () from /lib64/libc.so.6
After a series of exclusion verification, it is found that if the correctness of compatible monitoring and acquisition is not considered, terminology will be deleted_ use_ When the previous parameter is set to the default value of NONE, crash will not be triggered, that is, it is an abnormal behavior caused by the configuration of the parameter. By eliminating it one by one, the following two key points are basically determined
- terminology_use_previous must be set to before_ 8_ 0_ twenty-six
- There must be an access request to the PROCESSLIST table
Next, pay attention to the query operation of the processlist table recorded during crash. We can see from the description of the processlist table in the official document that the query of the processlist table will hold a global mutex. Frequent queries will actually have a certain impact on the performance, but it will not cause crash in theory. Of course, the official also provides a parameter performance_schema_show_processlist to realize the process status collection of lockless version.
The default SHOW PROCESSLIST implementation iterates across active threads from within the thread manager while holding a global mutex. This has negative performance consequences, particularly on busy systems. The alternative SHOW PROCESSLIST implementation is based on the Performance Schema processlist table. This implementation queries active thread data from the Performance Schema rather than the thread manager and does not require a mutex. The SHOW PROCESSLIST statement provides process information by collecting thread data from all active threads. The performance_schema_show_processlist variable determines which SHOW PROCESSLIST implementation to use: The alternative SHOW PROCESSLIST implementation is based on the Performance Schema processlist table. This implementation queries active thread data from the Performance Schema rather than the thread manager and does not require a mutex.
Try to turn on performance_schema_show_processlist=on parameter and setting terminology_use_previous=BEFORE_8_0_26 to avoid global mutex, but it can still stably reproduce the crash phenomenon, and use performance_ schema. The threads table does not trigger crash when querying the session state, so in fact, although the official document mentions setting performance_schema_show_processlist=on and query performance_schema.threads are implemented without locking, but the implementation methods of the two should be different.
After several times of verification, the problem still cannot be located. Try to search the MySQL bug list to see if there are similar bugs, and browse the release note to see if the new version has a repair or description of the problem. The result is that there is no description of the relevant bug, but a relevant message is mentioned in the release note of 8.0.27. When accessing information concurrently_ SCHEMA. When using the processlist table, MySQL will fail (the Bug #32625376 after the description is the internal bug number of MySQL, and we can't see the specific information)
A SELECT query on the INFORMATION_SCHEMA.PROCESSLIST view with concurrent MySQL Server load caused a failure. (Bug #32625376)
Try to open multiple sessions in version 8.0.26 and execute select * from information at the same time_ SCHEMA. Processlist, and turn off the interference of monitoring and acquisition. As expected, the crash phenomenon is stably reproduced (it can also be reproduced with show processlist) It seems that this is indeed the problem, so we began to upgrade the MySQL version to 8.0.27 and then verify it. Unfortunately, this problem can still be reproduced after upgrading to 8.0.27, which does not seem to be completely solved.
So I tried to give MySQL a BUG description, and soon got an official reply. It was confirmed that the phenomenon had been internally confirmed as a BUG and would be repaired in the upcoming version 8.0.28.
Fault summary
- Due to limited ability, this personal analysis did not fully understand the deep-seated causes of the BUG, but from the recent versions, some terminology changes did have a certain impact on the use and compatibility of MySQL
- For some strange phenomena, you can try to search the MySQL BUG list or ask the official issue, and you can get the results quickly
- For the monitoring collection program, if you need to collect session information, performance is recommended_ schema. Threads table
- The frequency of collecting session information should not be too high, especially when there are many sessions, which will always have a certain impact on the performance
- The bug covered in this article will not trigger crash when accessing the processlist table in a single session or multiple sessions