In a master-slave environment, if the master library hangs, how to elect a slave library as the master library?

As shown in the figure:

If M hangs, how do I elect a slave library from S1 and S2 as the master library?

Solutions to Traditional Reproduction

(1) View the status of slave libraries:

S1: show slave status；

S2: show slave status；

root@localhost [(none)]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 192.168.91.22
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 6227
               Relay_Log_File: relay-bin.000004
                Relay_Log_Pos: 414
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6227
              Relay_Log_Space: 875
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL   --When the main library service stops, it changes from 0 to null，So this value cannot be used as a criterion for judging whether slave libraries are completed synchronously or not.
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'repl@192.168.91.22:3306' - retry-time: 60  retries: 12
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 330622
                  Master_UUID: 83373570-fe03-11e6-bb0a-000c29c1b8a9
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 170415 23:08:25
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-33,
b30cdc47-216a-11e7-95a8-000c29565380:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version

(2) Determine whether each slave is completed synchronously:

io_thread reads the binlog log log and location of the main library:

Master_Log_File: mysql-bin.000006

Read_Master_Log_Pos: 6227

Which relay-log and location does sql_thread execute to:

Relay_Master_Log_File: mysql-bin.000006

Exec_Master_Log_Pos: 6227

When Master_Log_File = Relay_Master_Log_File & Read_Master_Log_Pos = Exec_Master_Log_Pos indicates that the slave library is completed synchronously with the main library.

If Master_Log_File = Relay_Master_Log_File, but Read_Master_Log_Pos > Exec_Master_Log_Pos, and the status of sql_thread is Connecting, it means that relay-log has not been replayed, probably waiting for 2-5s will be completed synchronously.

(3) Comparing the synchronization of two slave libraries:

When S1 and S2 are synchronized, who is ahead and who is in charge. In most cases, S1 and S2 are the same.

When S1.Relay_Master_Log_File=S2.Relay_Master_Log_File but S1. Exec_Master_Log_Pos > S2. Exec_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.

Or compare:

When S1.Master_Log_File = S2.Master_Log_File but S1. Read_Master_Log_Pos > S2. Read_Master_Log_Pos, it means that S1 is ahead of synchronization and S1 is selected as the new master.

(4) What if the data of S1 and S2 are inconsistent?

If S1 is ahead of S1 and S2 data is less than S1 data, then after taking S1 as the new master, business reads and writes are first placed on S1, then the data of S2 is repaired by pt-table-checksum and pt-table-sync tools, and then the business is shared by S2.

Solutions to GTID replication

(1) Determine whether each slave is completed synchronously:

Retrieved_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:22-28

Executed_Gtid_Set: 83373570-fe03-11e6-bb0a-000c29c1b8a9:1-28,

When Retrieved_Gtid_Set = Executed_Gtid_Set (28 = 28), the slave library is synchronized with the main library.

(2) Electing a slave treasury as the main treasury:

If S1. Executed_Gtid_Set = S2. Executed_Gtid_Set, randomly select one as the master;

If S1. Executed_Gtid_Set > S2. Executed_Gtid_Set, then S1 is elected as the master and S2 can directly change master to S1 as the slave Library of S1.

What about the damaged main storehouse?

(1) Re-change master to new master, then check the consistency between master and slave, and repair the data.

(2) If the original master database data is damaged, it needs to be re-added as a slave library to the new master.

How to temporarily stop the main library write operation?

(1) Change the password, can not affect the existing connections, remember to kill all existing connections.

(2)flush table with read lock

(3) Open parameter super_read_only=on

(4) Seal up port 3306 through firewall

Summary:

In a master-slave environment, if the master library hangs, how to elect a slave library as the master library switching process (the whole process is fast, about 1-5 seconds):

(1) Modify the password of the main library and disconnect all connections

(2) Judging the Synchronization of S1 and S2

(3) New electoral database

(4) Write traffic on the new master

Keywords: MySQL Database less

Added by shinagawa on Wed, 03 Jul 2019 22:07:38 +0300

Programming VIP

In a master-slave environment, if the master library hangs, how to elect a slave library as the master library?

Popular Keywords