Haproxy overview
- Haproxy is a free and open source software written in C language, which provides high availability, load balancing, and application proxy based on TCP and HTT.
- Haproxy is especially suitable for web sites with heavy load, which usually need session persistence or seven layer processing. Haproxy runs on the current hardware and can fully support tens of thousands of concurrent connections. And its operation mode makes it easy and safe to integrate into your current architecture, and can protect your web server from being exposed to the network.
- Haproxy implements an event driven, single process model that supports a very large number of concurrent connections. The multiprocess or multithreading model is limited by memory, system scheduler and ubiquitous locks. It can rarely handle thousands of concurrent connections. The event driven model does not have these problems because it implements all these tasks in user space with better resource and time management.
Introduction to Haproxy
Common wed cluster scheduler
At present, the common Web Cluster scheduler is divided into software and hardware
- Software:
Open source LVS, Haproxy and Nginx are usually used - Hardware:
F5 is commonly used, and many people use some domestic products, such as shuttle fish, green alliance (bare metal), etc
Application of Haproxy
Haproxy is a software that provides high availability, load balancing, and proxy based on TCP and HTTP applications
- Suitable for Web sites with heavy load
- Running on hardware, it can support tens of thousands of connection requests for concurrent connections
Main advantages of Haproxy
- Haproxy is superior to Nginx in load balancing speed and concurrent processing
- Haproxy supports virtual hosts and can work in layers 4 and 7
- It can supplement some shortcomings of Nginx, such as session retention, Cookie guidance and so on
- Supports url detection of the status of the backend server
- Haproxy can load balance Mysql, detect and load balance backend DB nodes
- Many load balancing algorithms are supported, such as round robin, weight Round Robin, source, RI and RDP cookie
Haproxy scheduling algorithm
Haproxy supports a variety of scheduling algorithms, three of which are most commonly used
- RR (Round Robin): RR algorithm is the simplest and most commonly used algorithm, namely polling scheduling
- LC (Least Connections): the Least Connections algorithm, which dynamically allocates front-end requests according to the number of back-end node connections.
- SH (Source Hashing): source based access scheduling algorithm, which is used in some scenarios where Session sessions are recorded on the server. Cluster scheduling can be done based on source IP, cookies, etc
Differences between nginx, LVS and Haproxy
nginx
- Support regular
- Only health check based on terminal C1 is supported
- The direct holding of session is not supported, but it can be through IP_ hash to solve
- Low requirements for network stability
- Strong reverse agency ability
- nginx community is active (community: Organization for maintaining and updating services), and charging community
LVS
- Forwarding can only be based on layer 4 ports
- It is distributed on the fourth floor and has strong load resistance
- Wide range of applications (almost all applications can be loaded)
Haproxy .
- Eight load balancing strategies are supported
- It is only used as load balancing software, and its performance is better than nginx in the case of high concurrency
- Support URL health detection and session maintenance
Experiment: building Web cluster with Haproxy
Experimental environment
The server IP Haproxy 192.168.220.10 nginx1 192.168.220.40 nginx2 192.168.220.50 centos client 192.168.220.70
Deploy haproxy server
systemctl stop firewalld setenforce 0 cd /opt haproxy-1.5.19.tar.gz yum install -y pcre-devel bzip2-devel gcc gcc-c++ make tar zxvf haproxy-1.5.19.tar.gz cd haproxy-1.5.19/ make TARGET=linux2628 ARCH=x86_64 make install #Haproxy server configuration mkdir /etc/haproxy cp examples/haproxy.cfg /etc/haproxy/ cd /etc/haproxy/ vim haproxy.cfg global #Lines 4 ~ 5 – modify and configure logging. local0 is the logging device and is stored in the system log by default log /dev/log local0 info log /dev/log local0 notice #log loghost local0 info maxconn 4096 #The maximum number of connections needs to consider the ulimit-n limit #Line 8 – comment, chroot running path, which is the root directory set by the service. Generally, this line needs to be commented out #chroot /usr/share/haproxy uid 99 #User UID gid 99 #User GID daemon #Daemon mode defaults log global #Define log is the log definition in the global configuration mode http #The mode is http option httplog #Log in http log format option dontlognull #Do not record health check log information retries 3 #Check the number of failures of the node server. If there are three consecutive failures, the node is considered unavailable redispatch #When the server load is very high, the connection with the current queue for a long time will be automatically ended maxconn 2000 #maximum connection contimeout 5000 #Connection timeout clitimeout 50000 #Client timeout srvtimeout 50000 #Server timeout listen webcluster 0.0.0.0:80 option httpchk GET /test.html #Check the server's test HTML file balance roundrobin #The load balancing scheduling algorithm uses the polling algorithm roundrobin server inst1 192.168.220.40:80 check inter 2000 fall 3 #Define online nodes server inst2 192.168.220.50:80 check inter 2000 fall 3 Parameter details: balance roundrobin #Load balancing scheduling algorithm Polling algorithm: roundrobin Minimum number of connections algorithm: leastconn Source access scheduling algorithm: source hashing,be similar to nginx of ip_hash check inter 2000 #Represents a heartbeat rate between the haproxy server and the node fall 3 #Indicates that the node fails if the heartbeat frequency is not detected for three consecutive times If the node is configured with“ backup"It means that this node is only a backup node. This node will only be on if the primary node fails. Not carry“ backup",Indicates the master node, which provides services together with other master nodes
Compile and install Nginx server
systemctl stop firewalld setenforce 0 yum install -y pcre-devel zlib-devel gcc gcc-c++ make useradd -M -s /sbin/nologin nginx cd /opt tar zxvf nginx-1.12.0.tar.gz -C /opt/ cd nginx-1.12.0/ ./configure --prefix=/usr/local/nginx --user=nginx --group=nginx make && make install ln -s /usr/local/nginx/sbin/nginx /usr/local/sbin/ nginx #Start nginx service Edit test page 192.168.220.40: echo "dzw" > /usr/local/nginx/html/test.html 192.168.220.50: echo "dzw1" > /usr/local/nginx/html/test.html
Start Haproxy service
cp /opt/haproxy-1.5.19/examples/haproxy.init /etc/init.d/haproxy cd /etc/init.d/ ls chmod +x haproxy chkconfig --add /etc/init.d/haproxy ln -s /usr/local/sbin/haproxy /usr/sbin/haproxy service haproxy start
Haproxy parameter optimization
With the increase of enterprise website load, haproxy parameter optimization is very important. The specific optimization items are as follows:
- maxconn: the maximum number of connections. It can be adjusted according to the actual situation of the application. It is recommended to use 10 240
- Daemon: daemon mode; Haproxy can be started in non daemon mode. It is recommended to start in daemon mode
- nbproc: the number of concurrent processes for load balancing. It is recommended to be equal to or twice the number of CPU cores of the current server
- Retries: the number of retries. It is mainly used to check cluster nodes. If there are many nodes and the amount of concurrency is large, it is set to 2 or 3 times
- Option http server close: actively turn off the http request option. It is recommended to use this option in the production environment
- Timeouthttp keep alive: long connection timeout, which can be set to 10s
- Limeoul http requestl: when the hllp request times out, it is recommended to set this time to 5-~10s to increase the release speed of the HTTP connection
- timcout. clicnt: client timeout. If the traffic is too large and the node response is slow, you can set this time to be shorter. It is recommended to set it to about 1min