MHA high availability architecture and fault repair

1, MHA overview

● a set of excellent software for failover and master-slave replication in MySQL high availability environment
● during MySQL fault, MHA can automatically complete fault switching within 0-30 seconds

1. Composition of MHA

① MHA Manager (management node)
② MHA node
manager function: with the help of node components, health check and failover functions can be realized
node component (existing in all mysql servers) functions:
1) . regularly report the status of mysql service to the master manager node
2) . monitor local mysql service functions

2, MHA features

● during automatic failover, MHA tries to save binary logs from the down primary server to ensure that data is not lost to the greatest extent, but this is not always feasible. For example, if the primary server hardware fails or cannot be accessed through ssh, MHA cannot save binary logs and can only perform failover and lose the latest data
● using semi synchronous replication can greatly reduce the risk of data loss
● at present, MHA supports a master-slave architecture with at least three servers, i.e. one master and two slaves
● use MySQL 5 5, which can greatly reduce the risk of data loss. MHA can be combined with semi synchronous replication. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, so it can ensure the data consistency of all nodes. Sometimes, it can be set that the slave node is slower than the master node, In case of accidental database deletion and data loss, it can be recovered from the node binary log.

1. Automatic failover process

● failover (intrinsic vip drift)
● auto - trigger Auto with script

3, Build MHA

Based on the previous master-slave environment, continue
1. Experimental environment
MHA Manager: 192.168.153.225 manage the node and install the manager component and node component
Three mysql servers:
Main: 192.168.153.200 installing node components
Install node components from: 192.168.153.215
Install node components from: 192.168.153.220

Define node server name
hostnamectl set-hostname manager
hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2

Installation and compilation dependent environment(Three sets MySQL Server)
yum -y install ncurses-devel gcc-c++ perl-Module-Install

master,slave1,slave2 Make two soft links respectively⭐⭐⭐
The first is mysql Command, second mysqlbinlog Node recovery
[root@master ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin
[root@master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin

Start in the background mysql Service( mysql_safe Safe start,Equivalent to one more daemon,mysqld If you hang up, you'll automatically mysqld Process pull up)⭐
/usr/local/mysql/bin/mysqld_safe --user=mysql &

to configure MySQL One master and two slaves⭐⭐
Two users are authorized on all database nodes. One is used synchronously from the library, and the other is manager use
mysql> grant replication slave on *.* to 'myslave'@'192.168.153.%' identified by '123';
mysql> grant all privileges on *.* to 'mha'@'192.168.153.%' identified by 'manager';

The following three authorizations need not be added in theory, but the experimental environment passes MHA inspect MySQL Master slave error reporting,
Report that two slave databases cannot connect to the master database through the host name, so all databases need to add the following authorization
 adopt mha When checking, it is monitored in the form of host name, which is easy to report errors⭐⭐
grant all privileges on *.* to 'mha'@'master' identified by 'manager';
grant all privileges on *.* to 'mha'@'slave1' identified by 'manager';
grant all privileges on *.* to 'mha'@'slave2' identified by 'manager';

Enable read-only function from all libraries (not for super administrators) super Effective, ordinary user)
mysql> set global read_only=1;
At all mysql Refresh permissions on Database
mysql> flush privileges;
stay master View binaries and synchronization points on
mysql> show master status;
Synchronize between two node servers
mysql> change master to master_host='192.168.153.200',master_user='myslave',master_password='123',master_log_file='master-bin.000001',master_log_pos=5147;
Turn on the master-slave function and check whether the status is successful
mysql> start slave;
mysql> show slave status\G;

You can do a read-only function verification on a slave server
mysql -umyslave -p123 -h 192.168.153.200 -P 3306
 At this time, an error will be directly reported for the written content, and the read-only function will be enabled successfully

Configure MHA

Install on all servers MHA Environment dependent, install first epel Source, only if the source is installed perl There will be
 install epel Source, and no gph inspect
yum install epel-release --nogpgcheck -y

yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN
 notes:
yum install -y perl-DBD-MySQL \    #perl for mysql database
perl-Config-Tiny \                 #Fetch the value from the configuration file 
perl-Log-Dispatch \                #Log log
perl-Parallel-ForkManager \        #Multithreading management
perl-ExtUtils-CBuilder \           #Extension tool CBuilder
perl-ExtUtils-MakeMaker \          #Extension tool MakeMaker
perl-CPAN                          #Database in cpan perl

PS: the MHA software package is different for each operating system version. Here, centos7 4. You must select version 0.57. You must first install node components on all servers, and finally install manager components on the MHA manager node, because manager depends on node components.

All machines synchronize with alicloud time server
ntpdate ntp.aliyun.com

crontab -e
*/10 * * * * /usr/sbin/ntpdate ntp.aliyun.com

Upload on all machines node assembly
[root@master ~]# mount.cifs //192.168.153.1/LAMP-C7 /mnt
Password for root@//192.168.153.1/LAMP-C7:  

mha4mysql-manager-0.57.tar.gz Upload to/opt Unzip under
tar zxvf mha4mysql-node-0.57.tar.gz -C /root
cd /root
cd  mha4mysql-node-0.57
perl Makefile.PL
make && make install

Install the manager component on the MHA node

(node must be installed before installing the manager component)

stay manager Machine upload manger assembly
 take mha4mysql-manager-0.57.tar.gz Upload and decompress
tar zxvf mha4mysql-manager-0.57.tar.gz -C /root
cd /root/mha4mysql-manager-0.57
perl Makefile.PL
make && make install

After manager is installed, several tools will be generated in the / usr/local/bin directory, mainly including the following:

masterha-check_ssh  		inspect MHA of SSH Configuration status (no interaction)
masterha_check_repl  		inspect MySQL Replication status
masterha_manager 			start-up manager Script for
masterha_check_status 		Check current MHA running state 
masterha_master_monitor 	inspect master Is it down
masterha_master_switch 		Control failover((automatic or manual)
masterha_conf_host 			Add or remove configured server information
masterha_stop  				close manager

After node is installed, several scripts are generated in / usr/local/bin directory, which need to be checked, mainly including the following:

save_binary_logs			Save and copy master Binary log for
apply_diff_relay_logs 		Identify the differential relay log time and apply the differential events to other events slave
filter_mysqlbinlog			Remove unnecessary ROLLBACK Rollback event( MHA (this tool is no longer applicable)
purge_relay_logs			Clear relay log (no blocking) SQL Thread)

Configure password less authentication

All four servers are configured with password less authentication (one-way enter). Enter the following 5 commands for each server
ssh-keygen -t rsa
ssh-copy-id 192.168.153.200
ssh-copy-id 192.168.153.215
ssh-copy-id 192.168.153.220
ssh-copy-id 192.168.153.225

Configure mha

stay MHA Copy related scripts on node to/usr/local/bin catalogue
[root@manager ~]# cp -ra /root/mha4mysql-manager-0.57/samples/scripts/ /usr/local/bin
-a: This option is usually used when copying a directory. It retains links, file properties, and copies everything under the directory
-r: recursion

The four executive files after copying are as follows:
ll /usr/local/bin/scripts/

master_ip_failover			#Script for VIP management during automatic switching ⭐⭐⭐
master_ip_online_change		#VIP management during online switching
power_manager				#Script to shut down the host after a failure
send_report					#Script for sending alarms after failover

Copy the above master_ip_failover reach/usr/local/bin Directory, where script management is used VIP
cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin

Modify master_ip_failover content (delete the original content, add new content and copy directly)
Operate on manager

vim master_ip_failover
vim /usr/local/bin/master_ip_failover

#!/usr/bin/env perl 
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.153.100';		#Floating IP ⭐⭐⭐
my $brdc = '192.168.153.255';	#Broadcast address ⭐⭐⭐
my $ifdev = 'ens33';		#The network card used is ens33 ⭐⭐⭐
my $key = '1';			#International serial number
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";							#Use the ifoconfig command to start it and set the floating address
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";							#You can use the ifconfig command to down it
my $exit_code = 0;			#Normal exit (return status code)
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

#remove#At the beginning, notice that the first line needs to be added#
% s/^#//⭐⭐⭐

Create MHA software directory and copy configuration files

Operate on manager to delete the original template and insert it

[root@manager ~]# mkdir /etc/masterha
[root@manager ~]# cp /root/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha/
[root@manager ~]# vim /etc/masterha/app1.cnf

[server default]
#log file
manager_log=/var/log/masterha/app1/manager.log
#working directory
manager_workdir=/var/log/masterha/app1
#Binary file
master_binlog_dir=/home/mysql
#Tools for failover
master_ip_failover_script=/usr/local/bin/master_ip_failover
#Online switching VIP tool management
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
#The following is the password account management configuration
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=123
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.153.215 -s 192.168.153.220

shutdown_script=""
ssh_user=root
user=mha

[server1]
hostname=192.168.153.200
port=3306

[server2]
candidate_master=1
hostname=192.168.153.215
check_repl_delay=0
port=3306

[server3]
hostname=192.168.153.220
port=3306

Document notes:
#manager working directory
manager_log=/var/log/masterha/app1/manager.log
#manager log
manager_workdir=/var/log/masterha/app1
#The location where the master saves binlog. The path here should be the binlog configured in the master
master_binlog_dir=/home/mysql
#Switch scripts when setting automatic failover, that is, the last configured script
master_ip_failover_script=/usr/local/bin/master_ip_failover
#Set the switching script for manual switching
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
#The password here is the password of the previously created monitoring user
password=manager
#Set monitoring user
user=mha
#Set the event interval for monitoring the main database and sending ping packets. The default is 3 seconds. When there is no response for three attempts, the automatic failover is performed
ping_interval=1
#Set the save location of binlog in case of remote MySQL switching
remote_workdir=/tmp
#Set the password of the replication user
repl_password=123 
#Set the account of the replication user
repl_user=myslave
#Set the script of the alarm sent after switching
report_script=/usr/local/send_report
#Set the script to check from the server
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.153.215 -s 192.168.153.220
#Set the script to close the failed host after the failure occurs
shutdown_script=""
#Set ssh login user name
ssh_user=root
[server2]
#Set the candidate master. If this parameter is set, this library will be promoted to the master library after sending the master-slave switch
candidate_master=1
#Turn off the check of replication. By default, if a slave lags behind the relay logs of the master by 100m, MHA will not select the slave as the new master. If it is set to 0, this rule will be ignored
check_repl_delay=0
hostname=192.168.153.215
port=3306

Test no password authentication

If normal, it will output successfully

[root@manager ~]# masterha_check_ssh -conf=/etc/masterha/app1.cnf
[info] All SSH connection tests passed successfully.

Test master-slave replication

masterha_check_repl -conf=/etc/masterha/app1.cnf
 Note: the first configuration needs to be in master Manually turn on virtual on node IP(ok Success is success)
/sbin/ifconfig ens33:1 192.168.153.100

test

Start mha in the background

[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[2] 121361
[root@manager ~]# jobs
[1]-  In operation               nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

Parameter details:
Explanation:
ohup ##option
masterha_manager ##open
–conf=/etc/masterha/app1.cnf ##Specify profile
–remove_dead_master_conf ##When the master server fails, the old master ip will be deleted from the master configuration file after the master-slave switch
–ignore_last_failover ##Ignore failover, ignore servers that are always down and unreliable
 By default, if MHA If continuous downtime is detected and the interval between two outages is less than 8 hours, it will not be Failove The reason for this restriction is to avoid ping-pong effect. This parameter represents the number of times ignored MHA Start to switch the generated file. By default MHA After switching, it will be recorded in the log, that is, the log set above app1.failover.complete Documents,
If the file is found in the directory during the next switch, the switch is not allowed to be triggered unless the file is deleted after the first switch:For convenience, set here as–ianore_last_failover
< /dev/null> ##All generated information will be exported to nul1 or / var / log / masterha / app1 / manager Log file
2>&1& ##Change the output of 2 error from directional to standard output, and "&" starts the background operation

View the current master node

[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:120311) is running(0:PING_OK), master:192.168.153.200

View current log information

[root@manager ~]# cat /var/log/masterha/app1/manager.log

Simulate the fault on the original master and check the changes of the master

[root@manager ~]# tail -f /var/log/masterha/app1/manager.log

Check whether the MySQL VIP address 192.168.226.100 exists. This VIP address will not disappear because the manager node stops MHA service

[root@master ~]# ifconfig
inet 192.168.153.200  netmask 255.255.255.0  broadcast 192.168.153.255
inet 192.168.153.100  netmask 255.255.255.0  broadcast 192.168.153.255

Fault repair

repair db，Add relay logs and notes to allow synchronization
[root@master ~]# vim  /etc/my.cnf
log_bin=master-bin
#log_slave=updates=true
server_id=11
relay-log=relay-log-bin
relay-log-index=slave-relay-bin.index
 Restart the server
[root@master ~]# systemctl restart mysqld

Repair master-slave: on the master-slave server
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000006 |     1006 |              |                  |                   |
+-------------------+----------+--------------+------------------+-------------------+

Repaired master: 
mysql> change master to master_host='192.168.153.215',master_user='myslave',master_password='123',master_log_file='master-bin.000006',master_log_pos=1006;
mysql> start slave;
mysql> set global read_only=1;
mysql> flush privileges;

Modify add profile
[root@slave1 ~]# vim /etc/my.cnf
[server1]
hostname=192.168.153.215
port=3306

start-up manager(stay manager (on that machine)
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
View current master Who is the node
masterha_check_status --conf=/etc/masterha/app1.cnf
 Solve the problem of incompatible Chinese and English characters and error reporting
dos2unix /usr/local/bin/master_ip_failover

Added by nalkari on Sat, 01 Jan 2022 13:25:47 +0200

Programming VIP