Building web cluster with haproxy+nginx

Haproxy overview

Haproxy is a free and open source software written in C language, which provides high availability, load balancing, and application proxy based on TCP and HTT.
Haproxy is especially suitable for web sites with heavy load, which usually need session persistence or seven layer processing. Haproxy runs on the current hardware and can fully support tens of thousands of concurrent connections. And its operation mode makes it easy and safe to integrate into your current architecture, and can protect your web server from being exposed to the network.
Haproxy implements an event driven, single process model that supports a very large number of concurrent connections. The multiprocess or multithreading model is limited by memory, system scheduler and ubiquitous locks. It can rarely handle thousands of concurrent connections. The event driven model does not have these problems because it implements all these tasks in user space with better resource and time management.

Introduction to Haproxy

Common wed cluster scheduler

At present, the common Web Cluster scheduler is divided into software and hardware

Software:
Open source LVS, Haproxy and Nginx are usually used
Hardware:
F5 is commonly used, and many people use some domestic products, such as shuttle fish, green alliance (bare metal), etc

Application of Haproxy

Haproxy is a software that provides high availability, load balancing, and proxy based on TCP and HTTP applications

Suitable for Web sites with heavy load
Running on hardware, it can support tens of thousands of connection requests for concurrent connections

Main advantages of Haproxy

Haproxy is superior to Nginx in load balancing speed and concurrent processing
Haproxy supports virtual hosts and can work in layers 4 and 7
It can supplement some shortcomings of Nginx, such as session retention, Cookie guidance and so on
Supports url detection of the status of the backend server
Haproxy can load balance Mysql, detect and load balance backend DB nodes
Many load balancing algorithms are supported, such as round robin, weight Round Robin, source, RI and RDP cookie

Haproxy scheduling algorithm

Haproxy supports a variety of scheduling algorithms, three of which are most commonly used

RR (Round Robin): RR algorithm is the simplest and most commonly used algorithm, namely polling scheduling
LC (Least Connections): the Least Connections algorithm, which dynamically allocates front-end requests according to the number of back-end node connections.
SH (Source Hashing): source based access scheduling algorithm, which is used in some scenarios where Session sessions are recorded on the server. Cluster scheduling can be done based on source IP, cookies, etc

Differences between nginx, LVS and Haproxy

nginx

Support regular
Only health check based on terminal C1 is supported
The direct holding of session is not supported, but it can be through IP_ hash to solve
Low requirements for network stability
Strong reverse agency ability
nginx community is active (community: Organization for maintaining and updating services), and charging community

LVS

Forwarding can only be based on layer 4 ports
It is distributed on the fourth floor and has strong load resistance
Wide range of applications (almost all applications can be loaded)

Haproxy .

Eight load balancing strategies are supported
It is only used as load balancing software, and its performance is better than nginx in the case of high concurrency
Support URL health detection and session maintenance

Experiment: building Web cluster with Haproxy

Experimental environment

The server	                                IP
Haproxy	                            192.168.220.10
nginx1	                            192.168.220.40
nginx2	                            192.168.220.50
centos client	                    192.168.220.70

Deploy haproxy server

systemctl stop firewalld
setenforce 0

cd /opt
haproxy-1.5.19.tar.gz

yum install -y pcre-devel bzip2-devel gcc gcc-c++ make

tar zxvf haproxy-1.5.19.tar.gz
cd haproxy-1.5.19/
make TARGET=linux2628 ARCH=x86_64
make install

#Haproxy server configuration
mkdir /etc/haproxy
cp examples/haproxy.cfg /etc/haproxy/

cd /etc/haproxy/
vim haproxy.cfg
global

#Lines 4 ~ 5 – modify and configure logging. local0 is the logging device and is stored in the system log by default
log /dev/log   local0 info		
log /dev/log   local0 notice
#log loghost    local0 info
maxconn 4096					#The maximum number of connections needs to consider the ulimit-n limit

#Line 8 – comment, chroot running path, which is the root directory set by the service. Generally, this line needs to be commented out
#chroot /usr/share/haproxy
uid 99							#User UID
gid 99							#User GID
daemon							#Daemon mode

defaults        
log     global					#Define log is the log definition in the global configuration
mode    http					#The mode is http
option  httplog					#Log in http log format
option  dontlognull				#Do not record health check log information
retries 3						#Check the number of failures of the node server. If there are three consecutive failures, the node is considered unavailable
redispatch						#When the server load is very high, the connection with the current queue for a long time will be automatically ended
maxconn 2000					#maximum connection
contimeout      5000			#Connection timeout
clitimeout      50000			#Client timeout
srvtimeout      50000			#Server timeout

listen  webcluster 0.0.0.0:80		
    option httpchk GET /test.html	#Check the server's test HTML file
    balance roundrobin				#The load balancing scheduling algorithm uses the polling algorithm roundrobin
    server inst1 192.168.220.40:80 check inter 2000 fall 3		#Define online nodes
    server inst2 192.168.220.50:80 check inter 2000 fall 3

Parameter details:

balance roundrobin #Load balancing scheduling algorithm
 Polling algorithm: roundrobin
 Minimum number of connections algorithm: leastconn
 Source access scheduling algorithm: source hashing，be similar to nginx of ip_hash
check inter 2000 #Represents a heartbeat rate between the haproxy server and the node
fall 3 #Indicates that the node fails if the heartbeat frequency is not detected for three consecutive times
 If the node is configured with“ backup"It means that this node is only a backup node. This node will only be on if the primary node fails. Not carry“ backup"，Indicates the master node, which provides services together with other master nodes

Compile and install Nginx server

systemctl stop firewalld
setenforce 0

yum install -y pcre-devel zlib-devel gcc gcc-c++ make 

useradd -M -s /sbin/nologin nginx

cd /opt
tar zxvf nginx-1.12.0.tar.gz -C /opt/

cd nginx-1.12.0/
./configure --prefix=/usr/local/nginx --user=nginx --group=nginx

make && make install

ln -s /usr/local/nginx/sbin/nginx /usr/local/sbin/

nginx      #Start nginx service


Edit test page
192.168.220.40: 
echo "dzw" > /usr/local/nginx/html/test.html
192.168.220.50: 
echo "dzw1" > /usr/local/nginx/html/test.html

Start Haproxy service

cp /opt/haproxy-1.5.19/examples/haproxy.init /etc/init.d/haproxy
cd /etc/init.d/
ls
chmod +x haproxy
chkconfig --add /etc/init.d/haproxy
ln -s /usr/local/sbin/haproxy /usr/sbin/haproxy

service haproxy start

Haproxy parameter optimization

With the increase of enterprise website load, haproxy parameter optimization is very important. The specific optimization items are as follows:

maxconn: the maximum number of connections. It can be adjusted according to the actual situation of the application. It is recommended to use 10 240
Daemon: daemon mode; Haproxy can be started in non daemon mode. It is recommended to start in daemon mode
nbproc: the number of concurrent processes for load balancing. It is recommended to be equal to or twice the number of CPU cores of the current server
Retries: the number of retries. It is mainly used to check cluster nodes. If there are many nodes and the amount of concurrency is large, it is set to 2 or 3 times
Option http server close: actively turn off the http request option. It is recommended to use this option in the production environment
Timeouthttp keep alive: long connection timeout, which can be set to 10s
Limeoul http requestl: when the hllp request times out, it is recommended to set this time to 5-~10s to increase the release speed of the HTTP connection
timcout. clicnt: client timeout. If the traffic is too large and the node response is slow, you can set this time to be shorter. It is recommended to set it to about 1min

Added by Harley1979 on Wed, 29 Dec 2021 18:39:57 +0200

Programming VIP