Keepalived high availability for Linux

1, High availability introduction

1.1 what is high availability

Generally, it means that two machines start the same business system. When one machine goes down, the other server can quickly take over, which is insensitive to the accessed users.

1.2 common tools

  • Hardware commonly used: F5
  • Software usually uses: Keepalived

1.3 how does keepalived achieve high availability?

1.3.1 nouns involved

keepalived software is based on VRRP protocol. VRRP virtual routing redundancy protocol is mainly used to solve the problem of single point of failure

ARP: Broadcast
VRRP protocol: broadcast in a LAN
vip: responsible for IP drift
vmac: responsible for notifying ARP broadcast to modify mac address

1.3.2 examples

For example, the company's network accesses the Internet through the gateway. What if the router fails and the gateway can't forward messages, and everyone can't access the Internet at this time?

The usual approach is to add a standby node to the router, but the problem is that if our primary gateway master fails, users need to manually point to backup. If users modify too many, it will be very troublesome.

Question 1: suppose that the user changes the pointing to the backup router, what if the master router is repaired?
Question 2: suppose the master gateway fails, can we configure the backup gateway as the ip of the master gateway?

In fact, it is not possible, because after finding the MAC address and IP address of the master gateway through the ARP broadcast for the first time, the PC will write the information to the ARP cache table. Then, the PC will connect through the information in the cache table, and then forward the data packet. Even if we modify the IP, the MAC address is unique, and the PC data packet will still be sent to the master. (unless the ARP cache table of the PC expires, the MAC address and IP address corresponding to the new backup can be obtained when the ARP broadcast is initiated again)

How can we achieve automatic failover? At this time, VRRP appears. Our VRRP actually adds a virtual MAC address (VMAC) and virtual IP address (VIP) outside the Master and Backup in the form of software or hardware. In this case, when the PC requests VIP, whether it is processed by the Master or Backup, PC will only record VMAC and VIP information in ARP cache table.

1.4 core concept of high availability keepalived

  1. How to determine who is the primary node and who is the standby node (election, voting, priority)
  2. If the Master fails and Backup takes over automatically, will the Master seize power after recovery (preemptive and non preemptive)
  3. What happens if both servers think they are masters (brain crack problem)

2, keepalived

2.1 environmental preparation

host IP identity
lb01 192.168.15.5 keepalived master
lb02 192.168.15.6 keepalived backup
web01 172.16.1.7 web side
web02 172.16.1.8 web side
db01 172.16.1.61 database
192.168.15.3 VIP

2.2 installation of Keepalived

[root@lb01 conf.d]# yum install keepalived -y

2.3 configuring keepalived

  1. Find profile

    [root@lb01 ~]# rpm -qc keepalived
    /etc/keepalived/keepalived.conf
    
  2. Configure the configuration file of the master node LoadBalance01

    [root@lb01 ~]# vim /etc/keepalived/keepalived.conf
    ! Configuration File for keepalived
    
    # Global configuration
    global_defs {
       # Unique identifier of the current keepalived
       router_id LoadBalance01
    }
    
    # Configure VRRP protocol
    vrrp_instance VI_1 {
        # Status, MASTER and BACKUP
        state MASTER
        # Binding network card
        interface eth0
        # Virtual route marking can be understood as grouping
        virtual_router_id 50
        # priority
        priority 100
        # Monitor heartbeat interval
        advert_int 1
        # Configuration authentication
        authentication {
            # Certification Type
            auth_type PASS
            # Password for authentication
            auth_pass 1111
        }
        # Set up VIP
        virtual_ipaddress {
            # Virtual VIP address
            192.168.15.3
        }
    }
    
  3. Configure the configuration file of standby node LoadBalance01

    [root@lb01 ~]# vim /etc/keepalived/keepalived.conf
    ! Configuration File for keepalived
    
    # Global configuration
    global_defs {
       # Unique identifier of the current keepalived
       router_id LoadBalance02
    }
    
    # Configure VRRP protocol
    vrrp_instance VI_1 {
        # Status, MASTER and BACKUP
        state BACKUP
        # Binding network card
        interface eth0
        # Virtual route marking can be understood as grouping
        virtual_router_id 50
        # priority
        priority 80
        # Monitor heartbeat interval
        advert_int 1
        # Configuration authentication
        authentication {
            # Certification Type
            auth_type PASS
            # Password for authentication
            auth_pass 1111
        }
        # Set up VIP
        virtual_ipaddress {
            # Virtual VIP address
            192.168.15.3
        }
    }
    

2.4 startup service (self startup)

[root@lb01 ~]# systemctl enable --now keepalived
[root@lb02 ~]# systemctl enable --now keepalived

2.5 kept open log

#Configure keepalived
[root@lb01 ~]# vim /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D -d -S 0"
 
#Configure rsyslog to grab logs
[root@lb01 ~]# vim /etc/rsyslog.conf
local0.*        /var/log/keepalived.log
 
#Restart service
[root@lb01 ~]# systemctl restart keepalived rsyslog

3, Preemptive and non preemptive of Keepalived

3.1 when both nodes are started

#When both nodes are started, only node 1 has VIP because node 1 has higher priority than node 2
[root@lb01 ~]# ip addr | grep 192.168.15.3
    inet 192.168.15.3/32 scope global eth0
 
[root@lb02 ~]# ip addr | grep 192.168.15.3

3.2 stop master node

[root@lb01 ~]# systemctl stop keepalived
[root@lb01 ~]# ip addr | grep 192.168.15.3
 
#Since the keepalived of node 1 hangs up, node 2 will automatically take over the work of node 1, that is, VIP
[root@lb02 ~]# ip addr | grep 192.168.15.3
    inet 192.168.15.3/32 scope global eth0

3.3 restart the master node

#Start master node
[root@lb01 ~]# systemctl start keepalived
[root@lb01 ~]# ip addr | grep 192.168.15.3
    inet 192.168.15.3/32 scope global eth0

#Since node 1 has higher priority than node 2, when node 1 recovers, the VIP will be preempted back

3.4 configure non preemptive

  1. Master node configuration (LoadBalance01)

    [root@lb01 ~]# vim /etc/keepalived/keepalived.conf 
    ... ...
    vrrp_instance VI_1 {
        #Status, MASTER and BACKUP
        state BACKUP
        # Open non preemptive
        nopreempt
        #Binding network card
        interface eth0
        #Virtual route marking can be understood as grouping
        virtual_router_id 50
        #priority
        priority 100
    ... ...
    }
     
    [root@lb01 ~]# systemctl restart keepalived
    
  2. Standby node configuration (LoadBalance02)

    [root@lb02 ~]# vim /etc/keepalived/keepalived.conf 
    ... ...
    vrrp_instance VI_1 {
        #Status, MASTER and BACKUP
        state BACKUP
        # Open non preemptive
        nopreempt
        #Binding network card
        interface eth0
        #Virtual route marking can be understood as grouping
        virtual_router_id 50
        #priority
        priority 90
    ... ...
    }
     
    [root@lb02 ~]# systemctl restart keepalived.service
    
  3. Configuration considerations

    1. The state of both nodes must be configured as BACKUP;
    2. Both nodes must be configured with nopreempt;
    3. The priority of one node must be higher than that of the other node;

    After nopreempt is enabled for both servers, the role status must be modified to BACKUP. The only difference is priority.

4, Keepalived cerebral fissure

For some reasons, the two keepalived high availability servers cannot detect each other's heartbeat within the specified time, and each obtains the ownership of resources and services. At this time, both high availability servers are still alive.

4.1 fault of cerebral fissure

  • Loose network cable, network fault
  • Server hardware failure
  • Firewalls are turned on between servers

4.2 brain fissure simulation

  1. Turn on the firewall

    [root@lb01 ~]# systemctl start firewalld
    [root@lb01 ~]# ip addr | grep 192.168.15.3
        inet 192.168.15.3/32 scope global eth0
     
    [root@lb02 ~]# systemctl start firewalld
    [root@lb02 ~]# ip addr | grep 192.168.15.3
        inet 192.168.15.3/32 scope global eth0
    
  2. Visit website

    #Because firewalld firewall is enabled, all connections are rejected by default. Port 80 should be enabled
    [root@lb01 ~]# firewall-cmd --add-service=http
    success
    [root@lb02 ~]# firewall-cmd --add-service=http
    success
     
    [root@lb01 ~]# firewall-cmd --add-service=https
    success
    [root@lb02 ~]# firewall-cmd --add-service=https
    success
     
    #There are no problems accessing the page
    
  3. Turn off firewall

    [root@lb02 ~]# systemctl stop firewalld 
    [root@lb02 ~]# ip addr | grep 192.168.15.3
     
    [root@lb01 ~]# systemctl stop firewalld
    [root@lb01 ~]# ip addr | grep 192.168.15.3
        inet 192.168.15.3/32 scope global eth0
    

4.3 solutions to cerebral fissure

#If a brain crack occurs, kill one at random
#Write a test script on the standby node. If the test can ping the primary node and the standby node has a VIP, it is considered that a brain fissure has occurred
[root@lb02 ~]# cat check_split_brain.sh
#!/bin/sh
vip=192.168.15.3
lb01_ip=192.168.15.5
while true;do
    ping -c 2 $lb01_ip &>/dev/null
    if [ $? -eq 0 -a `ip add|grep "$vip"|wc -l` -eq 1 ];then
        echo "ha is split brain.warning."
    else
        echo "ha is ok"
    fi
sleep 5
done
 
[root@lb02 ~]# vim check_keepalive.sh 
#!/bin/sh
vip=192.168.15.3
lb01_ip=172.16.1.5
while true;do
    ssh $lb01_ip 'ip addr | grep 10.0.0.3' &>/dev/null
    if [ $? -eq 0 -a `ip add|grep "$vip"|wc -l` -eq 1 ];then
        echo "ha is split brain.warning."
    else
        echo "ha is ok"
    fi
sleep 3
done

5, High availability Keepalived and Nginx

Nginx listens to all IP addresses by default, and the VIP will float to one node, which is equivalent to that nginx has an additional network card such as VIP, so you can access the machine where nginx is located

But If nginx goes down, the user's request will fail, but the keepalived does not hang up and will not be switched. Therefore, a script needs to be written to detect the survival status of nginx. If it does not survive, kill keepalived

5.1 Nginx failover script

[root@lb01 ~]# vim check_web.sh
#!/bin/sh
nginxpid=$(ps -ef | grep [n]ginx | wc -l)
 
#1. Judge whether Nginx is alive. If not, try to start Nginx
if [ $nginxpid -eq 0 ];then
    systemctl start nginx &>/dev/null
    sleep 3
    #2. Wait for 3 seconds and get the Nginx status again
    nginxpid=$(ps -ef | grep [n]ginx | wc -l) 
    #3. Judge again. If Nginx does not survive, stop Keepalived, drift the address, and exit the script  
    if [ $nginxpid -eq 0 ];then
        systemctl stop keepalived
    fi
fi
 
#Add executable permissions to scripts
[root@lb01 ~]# chmod +x /root/check_web.sh

5.2 using the keepalived configuration file to call the nginx switch script

  • When configuring preemptive

    #It only needs to be configured on the master node
    [root@lb01 ~]# vim /etc/keepalived/keepalived.conf 
    global_defs {
        router_id LoadBalance01
    }
     
    #Execute the script every 5 seconds. The execution content of the script cannot exceed 5 seconds, otherwise the script will be interrupted and re executed again
    vrrp_script check_web {
        script "/root/check_web.sh"
        interval 5
    }
     
    vrrp_instance VI_1 {
        state MASTER
        nopreempt
        interface eth0
        virtual_router_id 50
        priority 100
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }
        virtual_ipaddress {
            192.168.15.3
        }
        #Script to invoke the schedule
        track_script {
            check_web
        }
    }
    
  • When configuring non preemptive

    #When configuring non preemptive, configure scripts on both sides
    [root@lb01 ~]# scp check_web.sh 172.16.1.6:/root
     
    #Spare nodes should also be configured
    [root@lb02 ~]# cat /etc/keepalived/keepalived.conf 
    global_defs {
        router_id LoadBalance02
    }
     
    vrrp_script check_web {
        script "/root/check_web.sh"
        interval 5
    }
     
    vrrp_instance VI_1 {
        state BACKUP
        nopreempt
        interface eth0
        virtual_router_id 50
        priority 90
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }
        virtual_ipaddress {
            192.168.15.3
        }
        track_script {
            check_web
        }
    }
    

5.3 testing

  1. Error modifying the configuration file of nginx on the VIP machine
  2. Stop nginx
  3. Check whether VIP switches

Keywords: Linux

Added by Petty_Crim on Mon, 10 Jan 2022 15:53:48 +0200