In a master-slave environment, if the master library hangs, how to elect a slave library as the master library?
As shown in the figure:
If M hangs, how do I elect a slave library from S1 and S2 as the master library?
Solutions to Traditional Reproduction
(1) View the status of slave libraries:
S1: show slave status;
S2: show slave status;
root@localhost [(none)]>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Reconnecting after a failed master event read
Master_Host: 192.168.91.22
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 6227
Relay_Log_File: relay-bin.000004
Relay_Log_Pos: 414
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 6227
Relay_Log_Space: 875
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL --When the main library service stops, it changes from 0 to null,So this value cannot be used as a criterion for judging whether slave libraries are completed synchronously or not.
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master 'repl@192.168.91.22:3306' - retry-time: 60 retries: 12
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 330622
Master_UUID: 83373570-fe03-11e6-bb0a-000c29c1b8a9
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 170415 23:08:25
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-33,
b30cdc47-216a-11e7-95a8-000c29565380:1-3
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version
(2) Determine whether each slave is completed synchronously:
io_thread reads the binlog log log and location of the main library:
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 6227
Which relay-log and location does sql_thread execute to:
Relay_Master_Log_File: mysql-bin.000006
Exec_Master_Log_Pos: 6227
When Master_Log_File = Relay_Master_Log_File & Read_Master_Log_Pos = Exec_Master_Log_Pos indicates that the slave library is completed synchronously with the main library.
If Master_Log_File = Relay_Master_Log_File, but Read_Master_Log_Pos > Exec_Master_Log_Pos, and the status of sql_thread is Connecting, it means that relay-log has not been replayed, probably waiting for 2-5s will be completed synchronously.
(3) Comparing the synchronization of two slave libraries:
When S1 and S2 are synchronized, who is ahead and who is in charge. In most cases, S1 and S2 are the same.
When S1.Relay_Master_Log_File=S2.Relay_Master_Log_File but S1. Exec_Master_Log_Pos > S2. Exec_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.
Or compare:
When S1.Master_Log_File = S2.Master_Log_File but S1. Read_Master_Log_Pos > S2. Read_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.
(4) What if the data of S1 and S2 are inconsistent?
If S1 is ahead of S1 and S2 data is less than S1 data, then after taking S1 as the new master, business reads and writes are first placed on S1, then the data of S2 is repaired by pt-table-checksum and pt-table-sync tools, and then the business is shared by S2.
Solutions to GTID replication
(1) Determine whether each slave is completed synchronously:
Retrieved_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:22-28
Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-28,
When Retrieved_Gtid_Set = Executed_Gtid_Set (28 = 28), the slave library is synchronized with the main library.
(2) Electing a slave treasury as the main treasury:
If S1. Executed_Gtid_Set = S2. Executed_Gtid_Set, randomly select one as the master;
If S1. Executed_Gtid_Set > S2. Executed_Gtid_Set, then S1 is elected as the master and S2 can directly change master to S1 as the slave Library of S1.
What about the damaged main storehouse?
(1) Re-change master to new master, then check the consistency between master and slave, and repair the data.
(2) If the original master database data is damaged, it needs to be re-added as a slave library to the new master.
How to temporarily stop the main library write operation?
(1) Change the password, can not affect the existing connections, remember to kill all existing connections.
(2)flush table with read lock
(3) Open parameter super_read_only=on
(4) Seal up port 3306 through firewall
Summary:
In a master-slave environment, if the master library hangs, how to elect a slave library as the master library switching process (the whole process is fast, about 1-5 seconds):
(1) Modify the password of the main library and disconnect all connections
(2) Judging the Synchronization of S1 and S2
(3) New electoral database
(4) Write traffic on the new master