1, High availability introduction
1.1 what is high availability
Generally, it means that two machines start the same business system. When one machine goes down, the other server can quickly take over, which is insensitive to the accessed users.
1.2 common tools
- Hardware commonly used: F5
- Software usually uses: Keepalived
1.3 how does keepalived achieve high availability?
1.3.1 nouns involved
keepalived software is based on VRRP protocol. VRRP virtual routing redundancy protocol is mainly used to solve the problem of single point of failure
ARP: Broadcast
VRRP protocol: broadcast in a LAN
vip: responsible for IP drift
vmac: responsible for notifying ARP broadcast to modify mac address
1.3.2 examples
For example, the company's network accesses the Internet through the gateway. What if the router fails and the gateway can't forward messages, and everyone can't access the Internet at this time?
The usual approach is to add a standby node to the router, but the problem is that if our primary gateway master fails, users need to manually point to backup. If users modify too many, it will be very troublesome.
Question 1: suppose that the user changes the pointing to the backup router, what if the master router is repaired?
Question 2: suppose the master gateway fails, can we configure the backup gateway as the ip of the master gateway?
In fact, it is not possible, because after finding the MAC address and IP address of the master gateway through the ARP broadcast for the first time, the PC will write the information to the ARP cache table. Then, the PC will connect through the information in the cache table, and then forward the data packet. Even if we modify the IP, the MAC address is unique, and the PC data packet will still be sent to the master. (unless the ARP cache table of the PC expires, the MAC address and IP address corresponding to the new backup can be obtained when the ARP broadcast is initiated again)
How can we achieve automatic failover? At this time, VRRP appears. Our VRRP actually adds a virtual MAC address (VMAC) and virtual IP address (VIP) outside the Master and Backup in the form of software or hardware. In this case, when the PC requests VIP, whether it is processed by the Master or Backup, PC will only record VMAC and VIP information in ARP cache table.
1.4 core concept of high availability keepalived
- How to determine who is the primary node and who is the standby node (election, voting, priority)
- If the Master fails and Backup takes over automatically, will the Master seize power after recovery (preemptive and non preemptive)
- What happens if both servers think they are masters (brain crack problem)
2, keepalived
2.1 environmental preparation
host | IP | identity |
---|---|---|
lb01 | 192.168.15.5 | keepalived master |
lb02 | 192.168.15.6 | keepalived backup |
web01 | 172.16.1.7 | web side |
web02 | 172.16.1.8 | web side |
db01 | 172.16.1.61 | database |
192.168.15.3 | VIP |
2.2 installation of Keepalived
[root@lb01 conf.d]# yum install keepalived -y
2.3 configuring keepalived
-
Find profile
[root@lb01 ~]# rpm -qc keepalived /etc/keepalived/keepalived.conf
-
Configure the configuration file of the master node LoadBalance01
[root@lb01 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived # Global configuration global_defs { # Unique identifier of the current keepalived router_id LoadBalance01 } # Configure VRRP protocol vrrp_instance VI_1 { # Status, MASTER and BACKUP state MASTER # Binding network card interface eth0 # Virtual route marking can be understood as grouping virtual_router_id 50 # priority priority 100 # Monitor heartbeat interval advert_int 1 # Configuration authentication authentication { # Certification Type auth_type PASS # Password for authentication auth_pass 1111 } # Set up VIP virtual_ipaddress { # Virtual VIP address 192.168.15.3 } }
-
Configure the configuration file of standby node LoadBalance01
[root@lb01 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived # Global configuration global_defs { # Unique identifier of the current keepalived router_id LoadBalance02 } # Configure VRRP protocol vrrp_instance VI_1 { # Status, MASTER and BACKUP state BACKUP # Binding network card interface eth0 # Virtual route marking can be understood as grouping virtual_router_id 50 # priority priority 80 # Monitor heartbeat interval advert_int 1 # Configuration authentication authentication { # Certification Type auth_type PASS # Password for authentication auth_pass 1111 } # Set up VIP virtual_ipaddress { # Virtual VIP address 192.168.15.3 } }
2.4 startup service (self startup)
[root@lb01 ~]# systemctl enable --now keepalived [root@lb02 ~]# systemctl enable --now keepalived
2.5 kept open log
#Configure keepalived [root@lb01 ~]# vim /etc/sysconfig/keepalived KEEPALIVED_OPTIONS="-D -d -S 0" #Configure rsyslog to grab logs [root@lb01 ~]# vim /etc/rsyslog.conf local0.* /var/log/keepalived.log #Restart service [root@lb01 ~]# systemctl restart keepalived rsyslog
3, Preemptive and non preemptive of Keepalived
3.1 when both nodes are started
#When both nodes are started, only node 1 has VIP because node 1 has higher priority than node 2 [root@lb01 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0 [root@lb02 ~]# ip addr | grep 192.168.15.3
3.2 stop master node
[root@lb01 ~]# systemctl stop keepalived [root@lb01 ~]# ip addr | grep 192.168.15.3 #Since the keepalived of node 1 hangs up, node 2 will automatically take over the work of node 1, that is, VIP [root@lb02 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0
3.3 restart the master node
#Start master node [root@lb01 ~]# systemctl start keepalived [root@lb01 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0 #Since node 1 has higher priority than node 2, when node 1 recovers, the VIP will be preempted back
3.4 configure non preemptive
-
Master node configuration (LoadBalance01)
[root@lb01 ~]# vim /etc/keepalived/keepalived.conf ... ... vrrp_instance VI_1 { #Status, MASTER and BACKUP state BACKUP # Open non preemptive nopreempt #Binding network card interface eth0 #Virtual route marking can be understood as grouping virtual_router_id 50 #priority priority 100 ... ... } [root@lb01 ~]# systemctl restart keepalived
-
Standby node configuration (LoadBalance02)
[root@lb02 ~]# vim /etc/keepalived/keepalived.conf ... ... vrrp_instance VI_1 { #Status, MASTER and BACKUP state BACKUP # Open non preemptive nopreempt #Binding network card interface eth0 #Virtual route marking can be understood as grouping virtual_router_id 50 #priority priority 90 ... ... } [root@lb02 ~]# systemctl restart keepalived.service
-
Configuration considerations
- The state of both nodes must be configured as BACKUP;
- Both nodes must be configured with nopreempt;
- The priority of one node must be higher than that of the other node;
After nopreempt is enabled for both servers, the role status must be modified to BACKUP. The only difference is priority.
4, Keepalived cerebral fissure
For some reasons, the two keepalived high availability servers cannot detect each other's heartbeat within the specified time, and each obtains the ownership of resources and services. At this time, both high availability servers are still alive.
4.1 fault of cerebral fissure
- Loose network cable, network fault
- Server hardware failure
- Firewalls are turned on between servers
4.2 brain fissure simulation
-
Turn on the firewall
[root@lb01 ~]# systemctl start firewalld [root@lb01 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0 [root@lb02 ~]# systemctl start firewalld [root@lb02 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0
-
Visit website
#Because firewalld firewall is enabled, all connections are rejected by default. Port 80 should be enabled [root@lb01 ~]# firewall-cmd --add-service=http success [root@lb02 ~]# firewall-cmd --add-service=http success [root@lb01 ~]# firewall-cmd --add-service=https success [root@lb02 ~]# firewall-cmd --add-service=https success #There are no problems accessing the page
-
Turn off firewall
[root@lb02 ~]# systemctl stop firewalld [root@lb02 ~]# ip addr | grep 192.168.15.3 [root@lb01 ~]# systemctl stop firewalld [root@lb01 ~]# ip addr | grep 192.168.15.3 inet 192.168.15.3/32 scope global eth0
4.3 solutions to cerebral fissure
#If a brain crack occurs, kill one at random #Write a test script on the standby node. If the test can ping the primary node and the standby node has a VIP, it is considered that a brain fissure has occurred [root@lb02 ~]# cat check_split_brain.sh #!/bin/sh vip=192.168.15.3 lb01_ip=192.168.15.5 while true;do ping -c 2 $lb01_ip &>/dev/null if [ $? -eq 0 -a `ip add|grep "$vip"|wc -l` -eq 1 ];then echo "ha is split brain.warning." else echo "ha is ok" fi sleep 5 done [root@lb02 ~]# vim check_keepalive.sh #!/bin/sh vip=192.168.15.3 lb01_ip=172.16.1.5 while true;do ssh $lb01_ip 'ip addr | grep 10.0.0.3' &>/dev/null if [ $? -eq 0 -a `ip add|grep "$vip"|wc -l` -eq 1 ];then echo "ha is split brain.warning." else echo "ha is ok" fi sleep 3 done
5, High availability Keepalived and Nginx
Nginx listens to all IP addresses by default, and the VIP will float to one node, which is equivalent to that nginx has an additional network card such as VIP, so you can access the machine where nginx is located
But If nginx goes down, the user's request will fail, but the keepalived does not hang up and will not be switched. Therefore, a script needs to be written to detect the survival status of nginx. If it does not survive, kill keepalived
5.1 Nginx failover script
[root@lb01 ~]# vim check_web.sh #!/bin/sh nginxpid=$(ps -ef | grep [n]ginx | wc -l) #1. Judge whether Nginx is alive. If not, try to start Nginx if [ $nginxpid -eq 0 ];then systemctl start nginx &>/dev/null sleep 3 #2. Wait for 3 seconds and get the Nginx status again nginxpid=$(ps -ef | grep [n]ginx | wc -l) #3. Judge again. If Nginx does not survive, stop Keepalived, drift the address, and exit the script if [ $nginxpid -eq 0 ];then systemctl stop keepalived fi fi #Add executable permissions to scripts [root@lb01 ~]# chmod +x /root/check_web.sh
5.2 using the keepalived configuration file to call the nginx switch script
-
When configuring preemptive
#It only needs to be configured on the master node [root@lb01 ~]# vim /etc/keepalived/keepalived.conf global_defs { router_id LoadBalance01 } #Execute the script every 5 seconds. The execution content of the script cannot exceed 5 seconds, otherwise the script will be interrupted and re executed again vrrp_script check_web { script "/root/check_web.sh" interval 5 } vrrp_instance VI_1 { state MASTER nopreempt interface eth0 virtual_router_id 50 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.15.3 } #Script to invoke the schedule track_script { check_web } }
-
When configuring non preemptive
#When configuring non preemptive, configure scripts on both sides [root@lb01 ~]# scp check_web.sh 172.16.1.6:/root #Spare nodes should also be configured [root@lb02 ~]# cat /etc/keepalived/keepalived.conf global_defs { router_id LoadBalance02 } vrrp_script check_web { script "/root/check_web.sh" interval 5 } vrrp_instance VI_1 { state BACKUP nopreempt interface eth0 virtual_router_id 50 priority 90 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.15.3 } track_script { check_web } }
5.3 testing
- Error modifying the configuration file of nginx on the VIP machine
- Stop nginx
- Check whether VIP switches