MHA Cluster Management

Environmental Science:

192.168.205.37: as MHA server
192.168.205.47: as Master server
192.168.205.57: as Slave1 server
192.168.205.67: as Slave2 server

Edition:

OS: centos 7 1810 with mini install
mariadb-server 5.5.60
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm

Purpose:

MHA uses a program written in perl script. When the master node downs, if the replication from the slave node is not completed, when it becomes the master node, it needs to try to copy the binary log of the master node to the new master node, so as to ensure the integrity of the data, so it needs sshkey verification. We use one as the management node, three of which are database nodes, one master and two slaves, to simulate the master downloader to see how the slave is promoted to the master node.

mount this database

Installing mariadb uses the following scripts:

[root@master data]#cat maridb_yum.sh 
#!/bin/bash
ID=`ip a show dev eth0 | sed -r '3!d;s@(.*inet)(.*)(/.*)@\2@' | cut -d. -f4`
rpm -q mariadb-server ||yum install -y mariadb-server
[ -d /data/mysql ] || mkdir -p /data/mysql
[ -d /data/logs ] || mkdir -p /data/logs
chown mysql:mysql /data/{mysql,logs}
sed -i 's@datadir=/var/lib/mysql@datadir=/data/mysql@' /etc/my.cnf
grep "log-bin" /etc/my.cnf || sed -i '/\[mysqld\]/a log-bin=/data/logs/bin' /etc/my.cnf
grep "innodb_file_per_table" /etc/my.cnf || sed -i '/\[mysqld\]/a innodb_file_per_table = on' /etc/my.cnf
grep "skip_name_resolve" /etc/my.cnf || sed -i '/\[mysqld\]/a skip_name_resolve = on' /etc/my.cnf
grep "server-id" /etc/my.cnf || sed -i "/\[mysqld\]/a server-id=$ID" /etc/my.cnf
service mariadb restart

Modify the configuration files of two slave nodes

[root@slave1 data]#vi /etc/my.cnf
[mysqld]
read_only   
relay_log_purge=0     #Do not delete relay logs

Establish the replication location and account number of the primary node

MariaDB [(none)]> show master logs;
+------------+-----------+
| Log_name   | File_size |
+------------+-----------+
| bin.000001 |     30373 |
| bin.000002 |   1038814 |
| bin.000003 |       245 |
+------------+-----------+
3 rows in set (0.00 sec)

MariaDB [(none)]> grant replication slave on *.* to repluser@'192.168.205.%' identified by 'centos';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> grant all on *.* to mhauser@'192.168.205.%' identified by 'centos'; 
Query OK, 0 rows affected (0.00 sec)

Configure the change master to from the slave node and start the thread

MariaDB [(none)]> CHANGE MASTER TO
    ->   MASTER_HOST='192.168.205.47',
    ->   MASTER_USER='repluser',
    ->   MASTER_PASSWORD='centos',
    ->   MASTER_PORT=3306,
    ->   MASTER_LOG_FILE='bin.000003',
    ->   MASTER_LOG_POS=245;
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.205.47
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000003
          Read_Master_Log_Pos: 401
               Relay_Log_File: mariadb-relay-bin.000002
                Relay_Log_Pos: 679
        Relay_Master_Log_File: bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

MHA Implementation

Installation package, mha relies on epel source, we need to use epel

[root@MHA ~]#yum install mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm

Install node packages on all sql nodes

[root@master ~]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm 
[root@slave1 data]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm 
[root@slave2 data]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm

Configure ssh key validation

[root@MHA ~]#ssh-keygen 
[root@MHA ~]#ssh-copy-id 192.168.205.37
[root@MHA ~]#cat .ssh/authorized_keys 
[root@MHA ~]#scp -r .ssh 192.168.205.47:/root/
[root@MHA ~]#scp -r .ssh 192.168.205.57:/root/
[root@MHA ~]#scp -r .ssh 192.168.205.67:/root/

Generate configuration MHA file

[root@MHA ~]#mkdir /etc/mha
[root@MHA ~]#vim /etc/mha/app1.cnf
[server default]
master_binlog_dir=/data/logs/ 
user=mhauser
password=centos
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=centos
ping_interval=1
[server1]
hostname=192.168.205.47
candidate_master=1
[server2]                                                                                                                          
hostname=192.168.205.57
[server3]
hostname=192.168.205.67
candidate_master=1

Check ssh before starting, repl replication is ready

[root@MHA ~]#masterha_check_ssh --conf=/etc/mha/app1.cnf
[root@MHA ~]#masterha_check_repl --conf=/etc/mha/app1.cnf

Start the process. This process runs in the foreground. When the primary node fails, it terminates after the switch is completed, so to continue restarting the process

[root@MHA ~]#masterha_manager --conf=/etc/mha/app1.cnf
Mon Aug 12 23:33:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 12 23:33:22 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Aug 12 23:33:22 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..

test

When we simulate a write operation, the primary node down to see if the server can switch normally and not replicate the lost data.

MariaDB [(none)]> use test;
Database changed
MariaDB [test]> create table testlog (id int auto_increment primary key,name char(10),age int default 20);

Query OK, 0 rows affected (0.01 sec)

MariaDB [test]> 
MariaDB [test]> delimiter $$
MariaDB [test]> 
MariaDB [test]> create procedure  sp_testlog() 
    -> begin  
    -> declare i int;
    -> set i = 1; 
    -> while i <= 100000 
    -> do  insert into testlog(name,age) values (concat('wang',i),i); 
    -> set i = i +1; 
    -> end while; 
    -> end$$
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> 
MariaDB [test]> delimiter ;
MariaDB [test]> call sp_testlog;

While writing the data, we shut down 47 servers in time. MHA detected the nodes and responded, and successfully upgraded the other main exit programs.

[root@MHA ~]#masterha_manager --conf=/etc/mha/app1.cnf
Mon Aug 12 23:33:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 12 23:33:22 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Aug 12 23:33:22 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Aug 12 23:45:16 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 12 23:45:16 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Aug 12 23:45:16 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
[root@MHA ~]#

We can check the log to see what happened. Slve2 upgrade is the main thing.

[root@MHA data]#cat mastermha/app1/manager.log 
----- Failover Report -----

app1: MySQL Master failover 192.168.205.47(192.168.205.47:3306) to 192.168.205.67(192.168.205.67:3306) succeeded

Master 192.168.205.47(192.168.205.47:3306) is down!

Check MHA Manager logs at MHA:/data/mastermha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.205.57(192.168.205.57:3306) has all relay logs for recovery.
Selected 192.168.205.67(192.168.205.67:3306) as a new master.
192.168.205.67(192.168.205.67:3306): OK: Applying all logs succeeded.
192.168.205.57(192.168.205.57:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.205.57(192.168.205.57:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.205.67(192.168.205.67:3306)
192.168.205.67(192.168.205.67:3306): Resetting slave info succeeded.
Master failover to 192.168.205.67(192.168.205.67:3306) completed successfully.

Let's check, look at slave status on slave 2 and find that it's gone, because it's the master.
```
MariaDB [(none)]> show slave status\G
Empty set (0.00 sec)
```

Check slave1 and find that the primary node points to slave2

MariaDB [test]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.205.67
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000005
          Read_Master_Log_Pos: 245
               Relay_Log_File: mariadb-relay-bin.000002
                Relay_Log_Pos: 523
        Relay_Master_Log_File: bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Look at the new master node to see if read_only off is removed and find that it is, but don't forget to change back to the configuration file, which will become read_only if restarted.

MariaDB [(none)]> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

Keywords: Database MariaDB RPM MySQL ssh

Added by Daniello on Mon, 12 Aug 2019 12:43:01 +0300

Programming VIP