High availability load balancer under public cloud (Huawei)

High available load balancer option VIP+NGINX

node	ip
node1	10.0.0.11
node2	10.0.0.12
node3	10.0.0.13

VIP 10.0.0.10

It is easy to operate on the virtual machine. For details, see Nginx and keepalived of Beijing strivers to achieve high availability of nginx

But there are some problems in the public cloud. I bought three servers and deployed them in the same VPC, but I configured Keepalived and Nginx, but I couldn't access the back-end services.

The following is the initial configuration and testing on three nodes.

root@hw1:~# service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
   Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabl
   Active: active (running) since Tue 2020-02-18 17:48:57 CST; 2min 40s ago
  Process: 9058 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCE
 Main PID: 9067 (keepalived)
    Tasks: 3 (limit: 4662)
   CGroup: /system.slice/keepalived.service
           ├─9067 /usr/sbin/keepalived
           ├─9072 /usr/sbin/keepalived
           └─9073 /usr/sbin/keepalived

Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: WARNING - default user 'keepalived_script' fo
Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: Unsafe permissions found for script '/etc/kee
Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: SECURITY VIOLATION - scripts are being execut
Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: Using LinkWatch kernel netlink reflector...
Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 18 17:48:57 hw1 Keepalived_vrrp[9073]: VRRP_Script(chk_nginx) succeeded
Feb 18 17:50:12 hw1 Keepalived_vrrp[9073]: VRRP_Instance(VI_1) Transition to MASTER STAT
Feb 18 17:50:13 hw1 Keepalived_vrrp[9073]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 18 17:50:18 hw1 Keepalived_vrrp[9073]: VRRP_Instance(VI_1) Received advert with high
Feb 18 17:50:18 hw1 Keepalived_vrrp[9073]: VRRP_Instance(VI_1) Entering BACKUP STATE
root@hw1:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw1:~#

The same keepalived is started

root@hw2:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host

root@hw3:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw3:~#

And check eth0 network card, no virtual IP is bound.

Looking up the data, we found that for the sake of security, public cloud forbids us to directly configure an IP to drift.

Fortunately, Tencent cloud provides highly available virtual IP to complete the task of maintained + VIP.

The Huawei cloud I use only has virtual IP, and each virtual IP can only be bound to one server. Only one server can be bound. How can it drift?

First apply for a virtual IP on vpc

VIP is bound to node3 and tested on three nodes. It is found that node3 can access the corresponding service through VIP, but other nodes still cannot access it. Check the eth0 network card of node3, and it is found that VIP is not bound.

root@hw1:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw1:~#

root@hw2:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw2:~#

root@hw3:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx03!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx03!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@hw3:~# ifconfig |grep eth0 -n7
2-        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
3-        ether 02:42:6a:c3:06:a5  txqueuelen 0  (Ethernet)
4-        RX packets 0  bytes 0 (0.0 B)
5-        RX errors 0  dropped 0  overruns 0  frame 0
6-        TX packets 0  bytes 0 (0.0 B)
7-        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
8-
9:eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
10-        inet 10.0.0.13  netmask 255.255.255.0  broadcast 10.0.0.255
11-        inet6 fe80::f816:3eff:fe60:c6ec  prefixlen 64  scopeid 0x20<link>
12-        ether fa:16:3e:60:c6:ec  txqueuelen 1000  (Ethernet)
13-        RX packets 220545  bytes 312462163 (312.4 MB)
14-        RX errors 0  dropped 0  overruns 0  frame 0
15-        TX packets 66533  bytes 5748953 (5.7 MB)
16-        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
root@hw3:~#

Then test to shut down the maintained service of node3. It is found that VIP really drifts. The service can be accessed through VIP on node2, but other nodes can't access it.

root@hw3:~# service keepalived stop
root@hw3:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw3:~#

root@hw2:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx02!</title>
...
</head>
<body>
<h1>Welcome to nginx02!</h1>
...
</body>
</html>
root@hw2:~#

root@hw1:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw1:~#

You can only access the services of this node through the VIP on the node to which the VIP drifts. What is the high availability?

Go ahead and find a blog Under Centos7.2, high availability load balancing is built based on nginx + preserved (1. HA system is built based on preserved)
In Huawei cloud, virtual IP is also used to realize the IP drift. However, it is disgusting that he only manages the keeaplied service of node1, and then visits vip on node2 to get the service. That is, the above test process of this paper. It doesn't make sense at all, OK.

Continue to check the data, envy Tencent cloud's highly available virtual IP, and then check Huawei cloud's virtual IP documents VPC virtual IP interface operation guidance office High availability words found. But what's on that page.

Turn off source and destination check (applicable to high availability load balancing cluster scenario)

Try to turn off the source and destination checks on all hosts (at the ecs host network card tab). Test again. All nodes can access the service through VIP. Note that we just shut down the keepalived service of node3.

root@hw1:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx02!</title>
...
</head>
...
</body>
</html>
root@hw1:~#

root@hw2:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx02!</title>
...
</head>
<body>
<h1>Welcome to nginx02!</h1>
...
</body>
</html>
root@hw2:~#

root@hw3:~# service keepalived stop
root@hw3:~# curl 10.0.0.10
curl: (7) Failed to connect to 10.0.0.10 port 80: No route to host
root@hw3:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx02!</title>
...
</head>
<body>
...
</body>
</html>
root@hw3:~#

Test the VIP drift.
The above results show that now VIP is on node2, turn off the keepalived service on node2. Note that the keepalived service on node3 and node2 is turned off, and VIP should drift to node1. The following test results also confirm this point. After node2 closes the keepalived service, it accesses the VIP and the service on node1. Other nodes can also access the services on node1 through VIP. This is the real high availability and IP drift verification success!

root@hw2:~# service keepalived stop
root@hw2:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx01!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx01!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@hw2:~#

root@hw1:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx01!</title>
...
</head>
<body>
...
</body>
</html>
root@hw1:~#

root@hw3:~# curl 10.0.0.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx01!</title>
...
</head>
<body>
...
</body>
</html>
root@hw3:~#

In summary, to use keepalived+vip to realize IP drift on Huawei cloud, what we need to do is to ensure that all nodes are in the same vpc, and each node is configured with keepalived, and also need to apply for a virtual IP under this vpc. The most important thing is that each node needs to close the source and destination check, otherwise it can only experience VIP drift, which has no effect.

Looking back, Huawei cloud limits virtual IP. Each virtual IP can only be bound to one server. In fact, the test found that keepalived+vip doesn't need to bind virtual IP to the server at all, just need us to apply for one. This restriction pit is useless. But the real thing to note is to turn off the purpose and source checks.

Reference resources

Huawei cloud [Huawei cloud network technology sharing] [the sixth bullet] guidance on typical VIP application cases
Huawei cloud shut down source and destination check (applicable to high availability load balancing cluster scenario)
Tencent cloud VPC builds a highly available active and standby cluster through keepalived
Tencent cloud high available virtual IP
Does alicloud support using preserved to build load balancing software?
Under Centos7.2, high availability load balancing is built based on nginx + preserved (1. HA system is built based on preserved)