In a master-slave environment, if the master library hangs, how to elect a slave library as the master library?


In a master-slave environment, if the master library hangs, how to elect a slave library as the master library?

As shown in the figure:



If M hangs, how do I elect a slave library from S1 and S2 as the master library?


Solutions to Traditional Reproduction

(1) View the status of slave libraries:

S1: show slave status;

S2: show slave status;

root@localhost [(none)]>show slave status\G
*************************** 1. row ***************************
              Slave_IO_State: Reconnecting after a failed master event read
                 Master_Host: 192.168.91.22
                 Master_User: repl
                 Master_Port: 3306
               Connect_Retry: 60
             Master_Log_File: mysql-bin.000006
         Read_Master_Log_Pos: 6227
              Relay_Log_File: relay-bin.000004
               Relay_Log_Pos: 414
       Relay_Master_Log_File: mysql-bin.000006
            Slave_IO_Running: Connecting
           Slave_SQL_Running: Yes
             Replicate_Do_DB:
         Replicate_Ignore_DB:
          Replicate_Do_Table:
      Replicate_Ignore_Table:
     Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
         Exec_Master_Log_Pos: 6227
             Relay_Log_Space: 875
             Until_Condition: None
              Until_Log_File:
               Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
             Master_SSL_Cert:
           Master_SSL_Cipher:
              Master_SSL_Key:
       Seconds_Behind_Master: NULL   --When the main library service stops, it changes from 0 to null,So this value cannot be used as a criterion for judging whether slave libraries are completed synchronously or not.
Master_SSL_Verify_Server_Cert: No
               Last_IO_Errno: 2003
               Last_IO_Error: error reconnecting to master 'repl@192.168.91.22:3306' - retry-time: 60  retries: 12
              Last_SQL_Errno: 0
              Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
            Master_Server_Id: 330622
                 Master_UUID: 83373570-fe03-11e6-bb0a-000c29c1b8a9
            Master_Info_File: mysql.slave_master_info
                   SQL_Delay: 0
         SQL_Remaining_Delay: NULL
     Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
          Master_Retry_Count: 86400
                 Master_Bind:
     Last_IO_Error_Timestamp: 170415 23:08:25
    Last_SQL_Error_Timestamp:
              Master_SSL_Crl:
          Master_SSL_Crlpath:
          Retrieved_Gtid_Set:
           Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-33,
b30cdc47-216a-11e7-95a8-000c29565380:1-3
               Auto_Position: 1
        Replicate_Rewrite_DB:
                Channel_Name:
          Master_TLS_Version


(2) Determine whether each slave is completed synchronously:

io_thread reads the binlog log log and location of the main library:

Master_Log_File: mysql-bin.000006

Read_Master_Log_Pos: 6227


Which relay-log and location does sql_thread execute to:

Relay_Master_Log_File: mysql-bin.000006

Exec_Master_Log_Pos: 6227


When Master_Log_File = Relay_Master_Log_File & Read_Master_Log_Pos = Exec_Master_Log_Pos indicates that the slave library is completed synchronously with the main library.


If Master_Log_File = Relay_Master_Log_File, but Read_Master_Log_Pos > Exec_Master_Log_Pos, and the status of sql_thread is Connecting, it means that relay-log has not been replayed, probably waiting for 2-5s will be completed synchronously.


(3) Comparing the synchronization of two slave libraries:

When S1 and S2 are synchronized, who is ahead and who is in charge. In most cases, S1 and S2 are the same.


When S1.Relay_Master_Log_File=S2.Relay_Master_Log_File but S1. Exec_Master_Log_Pos > S2. Exec_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.


Or compare:


When S1.Master_Log_File = S2.Master_Log_File but S1. Read_Master_Log_Pos > S2. Read_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.


(4) What if the data of S1 and S2 are inconsistent?

If S1 is ahead of S1 and S2 data is less than S1 data, then after taking S1 as the new master, business reads and writes are first placed on S1, then the data of S2 is repaired by pt-table-checksum and pt-table-sync tools, and then the business is shared by S2.


Solutions to GTID replication


(1) Determine whether each slave is completed synchronously:

Retrieved_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:22-28

Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-28,

When Retrieved_Gtid_Set = Executed_Gtid_Set (28 = 28), the slave library is synchronized with the main library.


(2) Electing a slave treasury as the main treasury:

If S1. Executed_Gtid_Set = S2. Executed_Gtid_Set, randomly select one as the master;

If S1. Executed_Gtid_Set > S2. Executed_Gtid_Set, then S1 is elected as the master and S2 can directly change master to S1 as the slave Library of S1.


What about the damaged main storehouse?

(1) Re-change master to new master, then check the consistency between master and slave, and repair the data.

(2) If the original master database data is damaged, it needs to be re-added as a slave library to the new master.


How to temporarily stop the main library write operation?

(1) Change the password, can not affect the existing connections, remember to kill all existing connections.

(2)flush table with read lock

(3) Open parameter super_read_only=on

(4) Seal up port 3306 through firewall


Summary:

In a master-slave environment, if the master library hangs, how to elect a slave library as the master library switching process (the whole process is fast, about 1-5 seconds):

(1) Modify the password of the main library and disconnect all connections

(2) Judging the Synchronization of S1 and S2

(3) New electoral database

(4) Write traffic on the new master

Keywords: MySQL Database less

Added by shinagawa on Wed, 03 Jul 2019 22:07:38 +0300