ORCH visual high availability cluster

1. Introduction to Orch

Orchestrator (orch): MySQL high availability and replication topology management tool written by go, which supports the adjustment of replication topology, automatic failover and manual master-slave switching. The back-end database uses MySQL or SQLite to store metadata, and provides a Web interface to display the topology relationship and status of MySQL replication. The replication relationship and some configuration information of MySQL instances can be changed through the Web. At the same time, it also provides command line and api interfaces to facilitate operation and maintenance management. Compared with MHA, the most important thing is to solve the single point problem of the management node, which ensures its high availability through the raft protocol. Part of GitHub is also managed with this tool.

function

① Automatically discover the replication topology of MySQL and display it on the web.

② To reconstruct the replication relationship, you can drag the graph on the web to change the replication relationship.

③ Detect the main exception and recover it automatically or manually. Customize the script through Hooks.

④ Support command line and web interface to manage replication.

2. ORCH deployment planning

Node planning table (3306 is the ORCH back-end database and 8026 is the mysql master-slave architecture)

IP address	host name	Install software	Database port
172.31.0.101	Wl01	orchestrator,mysql	3306,8026
172.31.0.102	Wl02	orchestrator,mysql	3306,8026
172.31.0.103	Wl03	orchestrator,mysql	3306,8026

Each software version

Software name	edition	Download address
MySQL	8.0.26	https://downloads.mysql.com/archives/community/
Orchestrator	version: 3.2.6	https://github.com/openark/orchestrator

mysql database directory planning

MySQL directory function	route
basedir	/usr/loca/mysql-8026
datadir	/mysql-8026/8026/data/
errorlog	/mysql-8026/8026/log/error.log
binlogdir	/mysql-8026/8026/binlog/
relaylogdir	/mysql-8026/8026/relaylog/
Tmpdir	/mysql-8026/8026/tmp
pid	/mysql-8026/8026/run/mysql-8026.pid
socket	/mysql-8026/8026/run/mysql-8026.sock

orchestrator database directory planning

orchestrator directory role	route
basedir	/usr/loca/mysql-8026
datadir	/mysql-8026/3306/data/
errorlog	/mysql-8026/3306/log/error.log
binlogdir	/mysql-8026/3306/binlog/
relaylogdir	/mysql-8026/3306/relaylog/
Tmpdir	/mysql-8026/3306/tmp
pid	/mysql-8026/3306/run/mysql-8026.pid
socket	/mysql-8026/3306/run/mysql-8026.sock

3. Environmental preparation

3.1 environmental preparation (all nodes)

#Install dependent software
[root@wl01 ~]# yum install -y gcc gcc-c++ ncurses-devel.x86_64 libaio bison gcc-c++.x86_64 perl perl-devel libssl-dev autoconf openssl-devel openssl numactl wget *libncurses.so.5*

#Configure environment variables
#Upload the prepared mysql-8.0.26 binary compressed package to the / opt directory
[root@wl01 ~]# cd /opt/
[root@wl01 opt]# tar xf mysql-8.0.26-linux-glibc2.12-x86_64.tar.xz
[root@wl01 opt]# ln -s /opt/mysql-8.0.26-linux-glibc2.12-x86_64 /usr/local/mysql-8026
[root@wl01 opt]# ll /usr/local/mysql-8026
lrwxrwxrwx 1 root root 40 Jan 14 16:59 /usr/local/mysql-8026 -> /opt/mysql-8.0.26-linux-glibc2.12-x86_64
[root@wl01 opt]# echo "export PATH=/usr/local/mysql-8026/bin:$PATH">> /etc/profile
[root@wl01 opt]# source /etc/profile
[root@wl01 opt]# mysql -V
mysql  Ver 8.0.26 for Linux on x86_64 (MySQL Community Server - GPL)

#Configure domain name resolution
[root@wl01 opt]# vim /etc/hosts
172.31.0.101 wl01
172.31.0.102 wl02
172.31.0.103 wl03

#Configure mutual trust
rm -rf /root/.ssh #Clean up old ssh keys
ssh-keygen #Generate a new key and press enter all the way
ssh-copy-id root@wl01 #input root@wl01 Password for
ssh-copy-id root@wl02	#input root@wl02 Password for
ssh-copy-id root@wl03	#input root@wl03 Password for

#Mutual trust verification of each node
ssh root@wl01 date
ssh root@wl02 date
ssh root@wl03 date

#Disable firewall
systemctl stop firewalld.service
systemctl disable firewalld.service

#Create user and directory
useradd mysql -M -s /sbin/nologin
mkdir -p /mysql-8026/8026/{binlog,relaylog,data,log,run,tmp}
mkdir -p /mysql-8026/3306/{binlog,relaylog,data,log,run,tmp}
chown -R mysql.mysql /mysql-8026

3.2 create all instances

`Note that different nodes modify the corresponding server_id,report_host，report_port`
#Edit configuration file (taking orch library as an example)
#When configuring the MySQL library, replace the port number in the configuration file in batches sed - I's / 3306 / 8026 / g '/ mysql-8026 / 8026 / my cnf
vim /mysql-8026/3306/my.cnf 
[mysql]
no-auto-rehash
max_allowed_packet=128M
prompt="\u@\h \R:\m:\s[\d]> "
default_character_set=utf8mb4
socket=/mysql-8026/3306/run/mysql.sock

[mysqldump]
quick
max_allowed_packet=128M
socket=/mysql-8026/3306/run/mysql.sock
[mysqladmin]
socket=/mysql-8026/3306/run/mysql.sock
[mysqld]
user=mysql
port=3306
report_host='172.31.0.101'
report_host=3306
server-id=1013306   # ip end + port number
default-time_zone='+8:00'
log_timestamps=SYSTEM
datadir=/mysql-8026/3306/data
basedir=/usr/local/mysql-8026
tmpdir=/mysql-8026/3306/tmp
socket=/mysql-8026/3306/run/mysql.sock
pid-file=/mysql-8026/3306/run/mysql.pid
character-set-server=utf8mb4

##redolog
innodb_log_file_size=2G
innodb_log_buffer_size=16M
innodb_log_files_in_group=2
innodb_log_group_home_dir=/mysql-8026/3306/data
##undolog
innodb_undo_directory=/mysql-8026/3306/data
innodb_max_undo_log_size=2G
innodb_undo_log_truncate=on
#innodb_undo_tablespaces=4 #8.0.14 deleted, additional undo tablespaces can be created using SQL

##binlog
binlog_format=row
log-bin=/mysql-8026/3306/binlog/mysql-bin
max_binlog_size=1G
binlog_cache_size=1M
sync_binlog=1

##relaylog
relay-log=/mysql-8026/3306/relaylog/mysql-relay
relay-log-purge=on
relay-log-recovery=on
##general log
#general_log=on
#general_log_file=/mysql-8026/3306/log/general.log
##error log
log-error=/mysql-8026/3306/log/error.log

##slow log
long_query_time=1
slow-query-log=on
slow-query-log-file=/mysql-8026/3306/log/slow.log

##connection
skip-external-locking
skip-name-resolve
max_connections=4000
max_user_connections=2500
max_connect_errors=10000
wait_timeout=7200
interactive_timeout=7200
connect_timeout=20
max_allowed_packet=512M

##gtid
gtid_mode=on
enforce_gtid_consistency=1
#log_slave_updates=1 
log_replica_updates=1 #8.0

##parallel replication mysql>5.7.22
# master
loose-binlog_transaction_dependency_tracking=WRITESET
#loose-transaction_write_set_extraction=XXHASH64 #Before 8.0
binlog_transaction_dependency_history_size=25000 #default
# slave
#slave-parallel-type=LOGICAL_CLOCK
replica_parallel_type=LOGICAL_CLOCK #8.0
#slave-parallel-workers=4 #Before 8.0
replica_parallel_workers=4 #8.0
#master_info_repository=TABLE #Before 8.0
#relay_log_info_repository=TABLE #Before 8.0

##memory size
key_buffer_size=2M
table_open_cache=2048
table_definition_cache=4096
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=2M
join_buffer_size=2M
myisam_sort_buffer_size=2M
tmp_table_size=64M
max_heap_table_size=64M
##lock and transaction
transaction_isolation=READ-COMMITTED
innodb_lock_wait_timeout=30
lock_wait_timeout=3600
##InnoDB
innodb_data_home_dir=/mysql-8026/3306/data
innodb_data_file_path=ibdata1:1G:autoextend
innodb_buffer_pool_size=1G
innodb_buffer_pool_instances=2
innodb_flush_log_at_trx_commit=1
innodb_max_dirty_pages_pct=75
innodb_flush_method=O_DIRECT
innodb_file_per_table=1
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_purge_threads=2

#Initialization start
mysqld --defaults-file=/mysql-8026/3306/my.cnf --initialize-insecure
mysqld_safe --defaults-file=/mysql-8026/3306/my.cnf &
mysql -S /mysql-8026/3306/run/mysql.sock

mysqld --defaults-file=/mysql-8026/8026/my.cnf --initialize-insecure
mysqld_safe --defaults-file=/mysql-8026/8026/my.cnf &
mysql -S /mysql-8026/8026/run/mysql.sock

3.3 configure all instances

`(1)Get software(All nodes)
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-3.2.6-1.x86_64.rpm
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-cli-3.2.6-1.x86_64.rpm

`(2)Install software(All nodes)
ll /opt/orch*
-rw-r--r-- 1 root root 10970627 May 11 13:37 /opt/orchestrator-3.2.6-1.x86_64.rpm
-rw-r--r-- 1 root root 10543813 May 11 13:39 /opt/orchestrator-cli-3.2.6-1.x86_64.rpm
yum localinstall -y orchestrator-*
#If you encounter an error (error: package: Requirements: * * * * JQ > = 1.5 * *), please solve it as follows
wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/oniguruma-5.9.5-3.el7.art.x86_64.rpm
yum install -y oniguruma-5.9.5-3.el7.art.x86_64.rpm 
wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/jq-1.5-1.el7.art.x86_64.rpm
yum install -y jq-1.5-1.el7.art.x86_64.rpm

`(3)to configure orch Database and users(All 3306 instances)
mysql -S /mysql-8026/3306/run/mysql.sock
CREATE DATABASE IF NOT EXISTS orchdb; 
CREATE USER 'orchuser'@'127.0.0.1' IDENTIFIED BY '123456'; 
GRANT ALL ON orchdb.* TO 'orchuser'@'127.0.0.1';

`(4)The managed node configures the master-slave relationship(All 8026 instances)
# Create replication dedicated user from master library
mysql -S /mysql-8026/8026/run/mysql.sock
create user 'repl'@'172.31.0.%' IDENTIFIED WITH mysql_native_password BY '123456';
GRANT replication slave on *.* TO 'repl'@'172.31.0.%'; 
flush privileges;
# From library 
#It should be noted here that orch's detection of the downtime of the master library depends on the IO thread of the slave Library (after it cannot connect to the master library, it will also detect whether the master library is abnormal through the slave Library). Therefore, the waiting time for the master and slave to perceive the downtime of the master library built by default change is too long, which needs to be slightly changed:
mysql -S /mysql-8026/8026/run/mysql.sock
reset master;
change master to 
master_host='172.31.0.102',
master_port=8026,
master_user='repl',
master_password='123456',
master_auto_position=1,
MASTER_HEARTBEAT_PERIOD=2,
MASTER_CONNECT_RETRY=1,
MASTER_RETRY_COUNT=86400; 
start slave; 
set global slave_net_timeout=8; 
set global read_only=1; 
set global super_read_only=1; 
#explain: 
slave_net_timeout(Global variable: MySQL5.7.7 After that, the default is changed to 60 seconds. This parameter defines the number of seconds to wait for the slave library to obtain data from the master library. After this time, the slave library will actively quit reading, interrupt the connection, and try to reconnect.
master_heartbeat_period: Copy the cycle of the heartbeat. The default is slave_net_timeout Half. Master When there is no data, every master_heartbeat_period Send a heartbeat packet in seconds, so Slave You can know Master Is it normal.
slave_net_timeout:Is to set how long the network timeout is considered after the data is not received, and then Slave of IO The thread reconnects Master . Combining these two settings can avoid replication delays caused by network problems. master_heartbeat_period The unit is seconds, which can be a number with decimals, such as 10.5，The maximum accuracy is 1 millisecond.

`(5)Managed MySQL User permissions for the database(Main library 172.31.0.101 8026 node)
CREATE USER 'orchctl'@'%' IDENTIFIED WITH mysql_native_password BY '123456'; 
GRANT SUPER, PROCESS, REPLICATION SLAVE,REPLICATION CLIENT, RELOAD ON *.* TO 'orchctl'@'%'; 
GRANT SELECT ON mysql.slave_master_info TO 'orchctl'@'%';

4. Prepare configuration files and scripts

4.1 modify the ORCH configuration file

cp /usr/local/orchestrator/orchestrator-sample.conf.json /etc/orchestrator.conf.json
vim /etc/orchestrator.conf.json
{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchctl",
  "MySQLTopologyPassword": "123456",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "BackendDB": "mysql",
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchdb",
  "MySQLOrchestratorUser": "orchuser",
  "MySQLOrchestratorPassword": "123456",
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "DiscoveryIgnoreReplicaHostnameFilters": [
    "a_host_i_want_to_ignore[.]example[.]com",
    ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
    "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
  ],
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "ReplicationLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 5,
  "RecoveryPeriodBlockSeconds": 30,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
  "OnFailureDetectionProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}; failureClusterAlias:{failureClusterAlias}' >> /tmp/recovery.log",
    "/usr/local/orchestrator/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Unsuccessful Failover ' >> /tmp/recovery.log"],
  "PostMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "PreventCrossRegionMasterFailover": false,
  "MasterFailoverDetachReplicaMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeReplicaRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": "",

  "RaftEnabled":true,
  "RaftDataDir":"/usr/local/orchestrator",
  "RaftBind":"172.31.0.101",
  "DefaultRaftPort":10008,
  "RaftNodes":[
    "172.31.0.101",
    "172.31.0.102",
    "172.31.0.103"
  ]
}

4.2 modify orch_hook.sh

vi /usr/local/orchestrator/orch_hook.sh
#!/bin/bash

isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchctl"

logfile="/usr/local/orchestrator/orch_hook.log"

# list of clusternames
#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)
#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

	array=( eth0 "172.31.0.188" root "172.31.0.101")
	interface=${array[0]}
	IP=${array[1]}
	user=${array[2]}

	if [ ! -z ${IP} ] ; then

		echo $(date)
		echo "Revocering from: $isitdead"
		echo "New master is: $newmaster"
		echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
		/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster
		#mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql
	else

		echo "Cluster does not exist!" | tee $logfile

	fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

	array=( eth0 "172.31.0.188" root "172.31.0.101")
	interface=${array[0]}
	IP=${array[3]}
	user=${array[2]}
	slavehost=`echo $5 | cut -d":" -f1`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $slavehost"
	echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
	/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

        array=( eth0 "172.31.0.188" root "172.31.0.101")
        interface=${array[0]}
        IP=${array[3]}
        user=${array[2]}
	slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
	showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
	newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $newintermediatemaster"
	echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
	/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

fi

4.3 modify vip script

vi /usr/local/orchestrator/orch_vip.sh
#!/bin/bash

emailaddress="1103290832@qq.com"
sendmail=1

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
 
 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address 
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then 
    echo "No options were passed"; 
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}   
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then 
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!" 
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available 
    if vip_status; then 
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!" 
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi

#Send the modified configuration file and script to wl02 and dwl03
scp /etc/orchestrator.conf.json wl02:/etc/
Modify the configuration file as db02 of ip address
"RaftBind": "172.31.0.102",

scp /etc/orchestrator.conf.json wl03:/etc/
Modify the configuration file as db03 of ip address
"RaftBind": "172.31.0.103",

scp /usr/local/orchestrator/orch_hook.sh wl02:/usr/local/orchestrator/
scp /usr/local/orchestrator/orch_hook.sh wl03:/usr/local/orchestrator/

scp /usr/local/orchestrator/orch_vip.sh wl02:/usr/local/orchestrator/
scp /usr/local/orchestrator/orch_vip.sh wl03:/usr/local/orchestrator/

chmod 777 /usr/local/orchestrator/orch_hook.sh
chmod 777 /usr/local/orchestrator/orch_vip.sh

4.4 create vip in master node

Only in master Create on node VIP
# Add VIP ip addr add 172.31.0.188 dev eth0
# Delete VIP ip addr del 172.31.0.188 dev eth0

5. Startup and operation

5.1 start ORCH

#All nodes start ORCH
cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http &

5.2 command line control operation

#List all clusters
/usr/local/orchestrator/resources/bin/orchestrator-client -c clusters

#Prints the topology relationship of the specified cluster
/usr/local/orchestrator/resources/bin/orchestrator-client -c topology -i wl01:8026
wl01:8026 (wl01)   [0s,ok,8.0.26,rw,ROW,>>,GTID]
+ wl02:8026 (wl02) [0s,ok,8.0.26,ro,ROW,>>,GTID]
+ wl03:8026 (wl03) [0s,ok,8.0.26,ro,ROW,>>,GTID]

#See which API to use
#Because Raft is configured and there are multiple orchestrators, Orchestrator is required_ In the environment variable of API, Orchestrator client will automatically select the leader
export ORCHESTRATOR_API="wl01:3000/api wl02:3000/api wl03:3000/api"
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl02:3000/api

#Forget to specify instance
/usr/local/orchestrator/resources/bin/orchestrator-client -c forget -i wl01:8026

#Forget to specify cluster
/usr/local/orchestrator/resources/bin/orchestrator-client -c forget-cluster -i wl01:8026

#Prints the master library for the specified instance
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-master -i wl01:8026

#Prints the slave library for the specified instance
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-replicas -i wl01:8026

6. Graphical interface management operation

IP address of any node in the cluster: 3000

(1) Click Discover in cluster and enter the ip and port numbers of all managed mysql databases

(2) Click Dashboard in cluster to view the discovered clusters

(3) Click the cluster name to view the cluster topology

(4) Click the red box icon in (3) to view the specific information of the node

7. Verify failover

#Shut down the main warehouse and observe the phenomenon
[root@wl01 orchestrator]# mysql -S /mysql-8026/8026/run/mysql.sock
mysql> shutdown;

#View the current leader node and the fault drift log in the leader node
[root@wl01 orchestrator]# /usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl03:3000/api
[root@wl03 orchestrator]# tail -f orch_hook.log
/usr/local/orchestrator/orch_vip.sh -d 1 -n wl02 -i eth0 -I 172.31.0.188 -u root -o wl01  #vip drifts from wl01 to wl02


#vip successfully drifted to wl02 node
[root@wl01 orchestrator]# ip a | grep "172.31.0.188"
[root@wl02 orchestrator]# ip a | grep "172.31.0.188"
inet 172.31.0.188/32 scope global eth0

View the topology relationship through the web interface. wl01 node has been separated from the cluster, and wl02 and wl03 have rebuilt the master-slave relationship

8. Verify the high availability of raft

#(1) Repair high availability of mysql database cluster
#(2) View the leader node of the current raft
[root@wl02 orchestrator]# /usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl01:3000/api

#(3) Close the orch back-end database of wl01 node, and the orch service will stop
[root@wl01 orchestrator]# mysql -S /mysql-8026/3306/run/mysql.sock
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[1]   Done                    mysqld_safe --defaults-file=/mysql-8026/3306/my.cnf
[5]-  Exit 1                  cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http
[root@wl01 orchestrator]# ps -ef | grep orch
root     23134 14495  0 10:28 pts/2    00:00:00 grep --color=auto orch


#(4) Check the orch logs of other nodes. You can see that wl02 is selected as the new leader
[root@wl03 orchestrator]# tail -f /usr/local/orchestrator/nohup.out
2022-01-18 10:27:38 DEBUG raft leader is 172.31.0.101:10008; state: Follower
2022/01/18 10:27:39 [WARN] raft: Rejecting vote request from 172.31.0.102:10008 since we have a leader: 172.31.0.101:10008
2022/01/18 10:27:39 [DEBUG] raft: Node 172.31.0.103:10008 updated peer set (2): [172.31.0.102:10008 172.31.0.101:10008 172.31.0.103:10008]
2022-01-18 10:27:39 DEBUG orchestrator/raft: applying command 6871: leader-uri
2022/01/18 10:27:39 [DEBUG] raft-net: 172.31.0.103:10008 accepted connection from: 172.31.0.102:38934
2022-01-18 10:27:43 DEBUG raft leader is 172.31.0.102:10008; state: Follower
2022-01-18 10:27:46 DEBUG orchestrator/raft: applying command 6872: request-health-report
2022-01-18 10:27:48 DEBUG raft leader is 172.31.0.102:10008; state: Follower
2022-01-18 10:27:53 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
2022-01-18 10:27:53 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
2022-01-18 10:27:53 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
2022-01-18 10:27:53 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
2022-01-18 10:27:53 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
2022-01-18 10:27:53 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
2022-01-18 10:27:53 DEBUG raft leader is 172.31.0.102:10008; state: Follower

#(5) Close the mysql database service of wl02 and verify the fault drift again
[root@wl02 orchestrator]# mysql -S /mysql-8026/8026/run/mysql.sock
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[4]   Done                    mysqld_safe --defaults-file=/mysql-8026/8026/my.cnf

#(6) Check the phenomenon. vip drifts to wl03
[root@wl02 orchestrator]# ip a | grep "172.31.0.188"
[root@wl03 orchestrator]# ip a | grep "172.31.0.188"
inet 172.31.0.188/32 scope global eth0

#(7) The web interface logged in through the ip address of wl01 is lost

#(8) Use the ip of wl02 to log in to the web interface again
 Can see wl02 The node has left the cluster, wl03 And wl01 Rebuilt the master-slave relationship

Keywords: Database MySQL

Added by ScottCFR on Tue, 18 Jan 2022 11:26:15 +0200

Programming VIP