This article mainly introduces in centos7 9 deploy the FullNAT mode of DPVS on the system and install the toa module on the RealServer to obtain the real IP of the client.
It has been introduced in previous articles DPVS introduction and deployment as well as Application and principle analysis of DPDK in DPVS , students in need can fill in the relevant contents first. Since the deployment steps in the previous article only introduced the deployment of DPVS and did not involve the configuration of various load balancing modes, and the version of DPVS and the corresponding version of DPDK have been updated after more than half a year, a new deployment tutorial is written here in detail.
The DPVS version installed in this article is 1.8-10 and the dpdk version is 18.11.2. Different from the above, the installation steps and operations are also different.
1. Preparatory work
After the formal installation, we need to adjust the hardware parameters of the machine. DPVS official has certain requirements for hardware (mainly because of the dpdk used at the bottom), and dpdk official gives a copy Support List , although the platforms on the support list are widely supported, in fact, Intel's hardware platform seems to be the one with the best compatibility and performance.
1.1 hardware
1.1.1 hardware parameters
- Machine model: PowerEdge R630
- CPU: two Intel (R) Xeon (R) CPUs e5-2630 V4 @ 2.20GHz
- Memory: 16G*8 DDR4-2400 MT/s (Configured in 2133 MT/s), 64g for each CPU64G, 128G in total
- Network card 1: Intel Corporation 82599es 10 Gigabit SFI / SFP + network connection (Rev 01)
- Network card 2: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
- System: CentOS Linux release 7.9.2009 (Core)
- Kernel: 3.10.0-1160.36.2 el7. x86_ sixty-four
1.1.2 BIOS settings
Before starting, first enter the BIOS, turn off hyper threading and enable NUMA policy. DPVS is a typical CPU busy application (the CPU utilization of the process is always 100%). In order to ensure performance, it is recommended to turn off the hyper threading setting of the CPU. In order to ensure the affinity of the CPU, it is best to open the page manually in the BIOS for the sake of the affinity of the CPU.
1.1.3 network card PCI ID
After using the PMD driver of dpvs to take over the network card, if the number of network cards is large and easy to be confused, it is best to record the corresponding network card name, MAC address and PCI ID in advance to avoid confusion in subsequent operations.
Use the lspci command to view the PCI ID of the corresponding network card. Then we can view the device file under the corresponding network card name directory in the directory / sys/class/net / to know the PCI ID of the network card. Finally, you can string the network card name, MAC address and PCI ID.
$ lspci | grep -i net 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) $ file /sys/class/net/eth0/device /sys/class/net/eth0/device: symbolic link to `../../../0000:01:00.0'
1.2 software
1.2.1 system software
# Tools for compiling and installing dpvs and tools for viewing CPU NUMA information $ yum group install "Development Tools" $ yum install patch libnuma* numactl numactl-devel kernel-devel openssl* popt* libpcap-devel -y # If you need ipvsadm to support ipv6, you need to install libnl3 devel $ yum install libnl libnl-devel libnl3 libnl3-devel -y # Note that the version of the kernel and the corresponding kernel components should correspond to the current version of the kernel $ uname -r 3.10.0-1160.36.2.el7.x86_64 $ rpm -qa | grep kernel | grep "3.10.0-1160.36.2" kernel-3.10.0-1160.36.2.el7.x86_64 kernel-devel-3.10.0-1160.36.2.el7.x86_64 kernel-tools-libs-3.10.0-1160.36.2.el7.x86_64 kernel-debug-devel-3.10.0-1160.36.2.el7.x86_64 kernel-tools-3.10.0-1160.36.2.el7.x86_64 kernel-headers-3.10.0-1160.36.2.el7.x86_64
1.2.2 dpvs and dpdk
# dpvs we directly use git to pull the latest version from github $ git clone https://github.com/iqiyi/dpvs.git # dpdk we download version 18.11.2 from the official website and put it in the dpvs directory for easy operation $ cd dpvs/ $ wget https://fast.dpdk.org/rel/dpdk-18.11.2.tar.xz $ tar -Jxvf dpdk-18.11.2.tar.xz
After completing the above steps, you can start the following installation.
2. Installation steps
2.1 DPDK installation
2.1.1 installing dpdk patch
Under the patch directory of the dpvs folder, there are patches of the corresponding supported dpdk version. If you don't know which patch you need, the official recommendation is to install all of them
$ ll dpvs/patch/dpdk-stable-18.11.2 total 44 -rw-r--r-- 1 root root 4185 Jul 22 12:47 0001-kni-use-netlink-event-for-multicast-driver-part.patch -rw-r--r-- 1 root root 1771 Jul 22 12:47 0002-net-support-variable-IP-header-len-for-checksum-API.patch -rw-r--r-- 1 root root 1130 Jul 22 12:47 0003-driver-kni-enable-flow_item-type-comparsion-in-flow_.patch -rw-r--r-- 1 root root 1706 Jul 22 12:47 0004-rm-rte_experimental-attribute-of-rte_memseg_walk.patch -rw-r--r-- 1 root root 16538 Jul 22 12:47 0005-enable-pdump-and-change-dpdk-pdump-tool-for-dpvs.patch -rw-r--r-- 1 root root 2189 Jul 22 12:47 0006-enable-dpdk-eal-memory-debug.patch
The operation of installing patch is also very simple
# We first copy all the patch es to the root directory of dpdk $ cp dpvs/patch/dpdk-stable-18.11.2/*patch dpvs/dpdk-stable-18.11.2/ $ cd dpvs/dpdk-stable-18.11.2/ # Then we install it in the order of the file names of the patch $ patch -p 1 < 0001-kni-use-netlink-event-for-multicast-driver-part.patch patching file kernel/linux/kni/kni_net.c $ patch -p 1 < 0002-net-support-variable-IP-header-len-for-checksum-API.patch patching file lib/librte_net/rte_ip.h $ patch -p 1 < 0003-driver-kni-enable-flow_item-type-comparsion-in-flow_.patch patching file drivers/net/mlx5/mlx5_flow.c $ patch -p 1 < 0004-rm-rte_experimental-attribute-of-rte_memseg_walk.patch patching file lib/librte_eal/common/eal_common_memory.c Hunk #1 succeeded at 606 (offset 5 lines). patching file lib/librte_eal/common/include/rte_memory.h $ patch -p 1 < 0005-enable-pdump-and-change-dpdk-pdump-tool-for-dpvs.patch patching file app/pdump/main.c patching file config/common_base patching file lib/librte_pdump/rte_pdump.c patching file lib/librte_pdump/rte_pdump.h $ patch -p 1 < 0006-enable-dpdk-eal-memory-debug.patch patching file config/common_base patching file lib/librte_eal/common/include/rte_malloc.h patching file lib/librte_eal/common/rte_malloc.c
2.1.2 dpdk compilation and installation
$ cd dpvs/dpdk-stable-18.11.2 $ make config T=x86_64-native-linuxapp-gcc $ make # The words "build complete [x86_64 native Linux app GCC] appear to indicate that make is successful $ export RTE_SDK=$PWD $ export RTE_TARGET=build
Dpdk17.0 will not appear in the process of compiling and installing here Ndo in version 11.2_ change_ MTU problem
2.1.3 configure hugepage
Different from other general programs, the dpdk used by dpvs does not ask for memory from the operating system, but directly uses large page memory, which greatly improves the efficiency of memory allocation. The configuration of hugepage is relatively simple. The official configuration process uses 2MB of large page memory. 28672 here refers to 28672 2MB of large page memory allocated, that is, 56GB of memory corresponding to a node. A total of 112GB of memory is allocated. The memory here can be adjusted according to the size of the machine. However, if it is less than 1GB, it may cause startup error.
A single CPU system can refer to dpdk Official documents
# for NUMA machine $ echo 28672 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages $ echo 28672 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages $ mkdir /mnt/huge $ mount -t hugetlbfs nodev /mnt/huge # If you need to start up and mount automatically, you can $ echo "nodev /mnt/huge hugetlbfs defaults 0 0" >> /etc/fstab # After the configuration is completed, we can see that the memory utilization rate increases immediately $ free -g # Before configuration total used free shared buff/cache available Mem: 125 1 122 0 1 123 $ free -g # After configuration total used free shared buff/cache available Mem: 125 113 10 0 1 11 # Using numactl to check the memory status, you can also see that 56G of CPU memory is allocated on both sides $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 node 0 size: 64184 MB node 0 free: 4687 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 node 1 size: 64494 MB node 1 free: 5759 MB node distances: node 0 1 0: 10 21 1: 21 10
2.1.4 configuring ulimit
By default, the ulimit of the system limits the number of open file descriptors. If it is too small, it will affect the normal operation of dpvs, so we increase it:
$ ulimit -n 655350 $ echo "ulimit -n 655350" >> /etc/rc.local $ chmod a+x /etc/rc.local
2.2 mount the drive module
First, we need to let the system mount the dpdk driver (PMD driver) we have compiled, and then replace the default driver used by the network card with the PMD driver we have compiled here
$ modprobe uio $ insmod /path/to/dpdk-stable-18.11.2/build/kmod/igb_uio.ko $ insmod /path/to/dpdk-stable-18.11.2/build/kmod/rte_kni.ko carrier=on
It should be noted that the carrier parameter is from dpdk v18 The default value is off when the version 11 is added. We need to load RTE_ KNI. The KNI equipment can work normally only when the carrier=on parameter is brought when the KO module is.
In the dpdk-stable-18.11.2/usertools directory, there are some scripts to help us install and use dpdk. We can use them to reduce the complexity of configuration. Here we can use dpdk devbind Py script to change the driver of the network card
# First, we turn off the network card that we need to load the PMD driver $ ifdown eth{2,3,4,5} # Check the status of the network card. Pay special attention to the PCI ID corresponding to the network card. Below, only some useful output results are intercepted $ ./usertools/dpdk-devbind.py --status Network devices using kernel driver =================================== 0000:04:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth2 drv=ixgbe unused=igb_uio 0000:04:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth3 drv=ixgbe unused=igb_uio 0000:82:00.0 'Ethernet 10G 2P X520 Adapter 154d' if=eth4 drv=ixgbe unused=igb_uio 0000:82:00.1 'Ethernet 10G 2P X520 Adapter 154d' if=eth5 drv=ixgbe unused=igb_uio
From the above output results, we can see that the current network card uses ixgbe driver, and our goal is to make it use igb_uio driver. Note that if there are too many network cards in the system at this time, the three parameters of network card name MAC address PCI ID recorded earlier can be used.
# Load specific drivers for network cards that need to use dpvs $ ./usertools/dpdk-devbind.py -b igb_uio 0000:04:00.0 $ ./usertools/dpdk-devbind.py -b igb_uio 0000:04:00.1 $ ./usertools/dpdk-devbind.py -b igb_uio 0000:82:00.0 $ ./usertools/dpdk-devbind.py -b igb_uio 0000:82:00.1 # Check whether the loading is successful again. Only some useful output results are intercepted below $ ./usertools/dpdk-devbind.py --status Network devices using DPDK-compatible driver ============================================ 0000:04:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' drv=igb_uio unused=ixgbe 0000:04:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' drv=igb_uio unused=ixgbe 0000:82:00.0 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe 0000:82:00.1 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe
2.3 DPVS installation
$ cd /path/to/dpdk-stable-18.11.2/ $ export RTE_SDK=$PWD $ cd /path/to/dpvs $ make $ make install # View the binary files in the bin directory $ ls /path/to/dpvs/bin/ dpip dpvs ipvsadm keepalived # Pay attention to the prompt information in the make process, especially the kept part. If the following part appears, it means that IPVS supports IPv6 Keepalived configuration ------------------------ Keepalived version : 2.0.19 Compiler : gcc Preprocessor flags : -D_GNU_SOURCE -I/usr/include/libnl3 Compiler flags : -g -g -O2 -fPIE -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -O2 Linker flags : -pie -Wl,-z,relro -Wl,-z,now Extra Lib : -lm -lcrypto -lssl -lnl-genl-3 -lnl-3 Use IPVS Framework : Yes IPVS use libnl : Yes IPVS syncd attributes : No IPVS 64 bit stats : No # In order to facilitate management, relevant operation commands can be soft linked to / sbin for global execution $ ln -s /path/to/dpvs/bin/dpvs /sbin/dpvs $ ln -s /path/to/dpvs/bin/dpip /sbin/dpip $ ln -s /path/to/dpvs/bin/ipvsadm /sbin/ipvsadm $ ln -s /path/to/dpvs/bin/keepalived /sbin/keepalived # Check whether the commands related to dpvs can work normally. Note that other commands can be used normally only after the dpvs process is started $ dpvs -v dpvs version: 1.8-10, build on 2021.07.26.15:34:26
2.4 configuring DPVS conf
Under the dpvs/conf directory, there are examples of dpvs configuration files with various configuration modes, and at the same time, in dpvs All parameters are recorded in the conf.items file. It is recommended that students read them all and understand the basic syntax before configuring them. The default configuration file for dpvs startup is / etc / dpvs conf.
Here is a brief summary of some parts (! Is a comment symbol):
-
The format of the log can be manually adjusted to DEBUG, and the location of the log output can be modified to facilitate the location of the problem
global_defs { log_level DEBUG log_file /path/to/dpvs/logs/dpvs.log }
-
If you need to define multiple network cards, you can refer to this configuration
netif_defs { <init> pktpool_size 1048575 <init> pktpool_cache 256 <init> device dpdk0 { rx { queue_number 16 descriptor_number 1024 rss all } tx { queue_number 16 descriptor_number 1024 } fdir { mode perfect pballoc 64k status matched } kni_name dpdk0.kni } <init> device dpdk1 { rx { queue_number 16 descriptor_number 1024 rss all } tx { queue_number 16 descriptor_number 1024 } fdir { mode perfect pballoc 64k status matched } kni_name dpdk1.kni } <init> device dpdk2 { rx { queue_number 16 descriptor_number 1024 rss all } tx { queue_number 16 descriptor_number 1024 } fdir { mode perfect pballoc 64k status matched } kni_name dpdk2.kni } <init> device dpdk3 { rx { queue_number 16 descriptor_number 1024 rss all } tx { queue_number 16 descriptor_number 1024 } fdir { mode perfect pballoc 64k status matched } kni_name dpdk3.kni } }
-
The same transceiver queue of multiple network cards shares the same CPU
<init> worker cpu1 { type slave cpu_id 1 port dpdk0 { rx_queue_ids 0 tx_queue_ids 0 } port dpdk1 { rx_queue_ids 0 tx_queue_ids 0 } port dpdk2 { rx_queue_ids 0 tx_queue_ids 0 } port dpdk3 { rx_queue_ids 0 tx_queue_ids 0 } }
-
If you need to specify a CPU separately to process ICMP packets, you can add ICMP to the parameters of the worker_ redirect_ core
<init> worker cpu16 { type slave cpu_id 16 icmp_redirect_core port dpdk0 { rx_queue_ids 15 tx_queue_ids 15 } }
After the DPVS process is started, the corresponding network card can be configured directly in the network configuration file of the Linux system, which is exactly the same as other network cards such as eth0.
After running successfully, you can see the corresponding dpdk network card by using the dpip command and the normal IP and ifconfig commands, and the IPv4 and IPv6 networks can be used normally. The following figure only intercepts part of the information, the IP and MAC information has been desensitized, and the IPv6 information has been removed.
$ dpip link show 1: dpdk0: socket 0 mtu 1500 rx-queue 16 tx-queue 16 UP 10000 Mbps full-duplex auto-nego addr AA:BB:CC:23:33:33 OF_RX_IP_CSUM OF_TX_IP_CSUM OF_TX_TCP_CSUM OF_TX_UDP_CSUM $ ip a 67: dpdk0.kni: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether AA:BB:CC:23:33:33 brd ff:ff:ff:ff:ff:ff inet 1.1.1.1/24 brd 1.1.1.255 scope global dpdk0.kni valid_lft forever preferred_lft forever $ ifconfig dpdk0.kni dpdk0.kni: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 1.1.1.1 netmask 255.255.254.0 broadcast 1.1.1.255 ether AA:BB:CC:23:33:33 txqueuelen 1000 (Ethernet) RX packets 1790 bytes 136602 (133.4 KiB) RX errors 0 dropped 52 overruns 0 frame 0 TX packets 115 bytes 24290 (23.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
3. Configure FullNat
In order to verify that our DPVS can work normally, here we refer to the official Configuration document First, configure an FNAT with the simplest dual arm mode. Referring to the official architecture diagram and modifying the IP address information, we can get the following simple architecture diagram.
In this mode, it is not necessary to configure the kni network card virtualized by DPVS by using the tools such as ip and ifconfig provided by the system
[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ywrjxom5-1645782758324)( https://resource.tinychen.com/20210728123202.svg )]
Here, we use the dpdk2 network card as the wan port and the dpdk0 network card as the lan port
# First, we add VIP 10.0.96.204 to the dpdk2 network card (wan) $ dpip addr add 10.0.96.204/32 dev dpdk2 # Next, we need to add two routes, which are divided into wan port network segment and RS machine network segment $ dpip route add 10.0.96.0/24 dev dpdk2 $ dpip route add 192.168.229.0/24 dev dpdk0 # It is better to add a default route to the gateway to ensure that the return packet of ICMP packet can run through $ dpip route add default via 10.0.96.254 dev dpdk2 # Establish forwarding rules using RR algorithm # add service <VIP:vport> to forwarding, scheduling mode is RR. # use ipvsadm --help for more info. $ ipvsadm -A -t 10.0.96.204:80 -s rr # Here, for the convenience of testing, we only add one RS # add two RS for service, forwarding mode is FNAT (-b) $ ipvsadm -a -t 10.0.96.204:80 -r 192.168.229.1 -b # Add LocalIP to the network. FNAT mode is required here # add at least one Local-IP (LIP) for FNAT on LAN interface $ ipvsadm --add-laddr -z 192.168.229.204 -t 10.0.96.204:80 -F dpdk0 # Then let's see the effect $ dpip route show inet 192.168.229.204/32 via 0.0.0.0 src 0.0.0.0 dev dpdk0 mtu 1500 tos 0 scope host metric 0 proto auto inet 10.0.96.204/32 via 0.0.0.0 src 0.0.0.0 dev dpdk2 mtu 1500 tos 0 scope host metric 0 proto auto inet 10.0.96.0/24 via 0.0.0.0 src 0.0.0.0 dev dpdk2 mtu 1500 tos 0 scope link metric 0 proto auto inet 192.168.229.0/24 via 0.0.0.0 src 0.0.0.0 dev dpdk0 mtu 1500 tos 0 scope link metric 0 proto auto inet 0.0.0.0/0 via 10.0.96.254 src 0.0.0.0 dev dpdk2 mtu 1500 tos 0 scope global metric 0 proto auto $ dpip addr show inet 10.0.96.204/32 scope global dpdk2 valid_lft forever preferred_lft forever inet 192.168.229.204/32 scope global dpdk0 valid_lft forever preferred_lft forever $ ipvsadm -ln IP Virtual Server version 0.0.0 (size=0) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.0.96.204:80 rr -> 192.168.229.1:80 FullNat 1 0 0 $ ipvsadm -G VIP:VPORT TOTAL SNAT_IP CONFLICTS CONNS 10.0.96.204:80 1 192.168.229.204 0 0
Then we start an nginx on RS and set the return IP and port number to see the effect:
server { listen 80 default; location / { default_type text/plain; return 200 "Your IP and port is $remote_addr:$remote_port\n"; } }
Test VIP directly with ping and curl commands:
$ ping -c4 10.0.96.204 PING 10.0.96.204 (10.0.96.204) 56(84) bytes of data. 64 bytes from 10.0.96.204: icmp_seq=1 ttl=54 time=47.2 ms 64 bytes from 10.0.96.204: icmp_seq=2 ttl=54 time=48.10 ms 64 bytes from 10.0.96.204: icmp_seq=3 ttl=54 time=48.5 ms 64 bytes from 10.0.96.204: icmp_seq=4 ttl=54 time=48.5 ms --- 10.0.96.204 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 8ms rtt min/avg/max/mdev = 47.235/48.311/48.969/0.684 ms $ curl 10.0.96.204 Your IP and port is 192.168.229.204:1033
It can be found that no matter what machine it is on, only the IP and port number of LIP will be returned. If you need to obtain the user's real IP, you need to install the TOA module
4. RS installation TOA module
At present, there are many versions of TOA modules provided by the open source community. Here, in order to ensure compatibility, we directly use the TOA and uoa modules officially provided by dpvs. According to their official description, their toa modules are separated from Alibaba TOA
TOA source code is included into DPVS project(in directory kmod/toa) since v1.7 to support IPv6 and NAT64. It is derived from the Alibaba TOA. For IPv6 applications which need client's real IP address, we suggest to use this TOA version.
Since both RS machines and DPVS machines here use the CentOS7 system, we can directly compile the toa module on the DPVS machine and then copy it to each RS machine for use
$ cd /path/to/dpvs/kmod/toa/ $ make
After successful compilation, a toa will be generated in the current directory Ko module file, which is the file we need. Directly use the insmod command to load the module and then check it
$ insmod toa.ko $ lsmod | grep toa toa 279641 0
Ensure that the module can be loaded at RC Add the following instructions to the local file
/usr/sbin/insmod /path/to/toa.ko # for example: # /usr/sbin/insmod /home/dpvs/kmod/toa/toa.ko
In addition to toa module, there is also uoa module for UDP protocol, which is completely consistent with the compilation and installation process of TOA module above. It will not be repeated here.
After we load the curs module on the machine again:
$ curl 10.0.96.204 Your IP and port is 172.16.0.1:62844
So far, the FullNat mode of the whole DPVS has been deployed and can work normally. Since DPVS supports many configuration combinations, a special article on IPv6, nat64, keepalived, bonding and Master/Backup mode configuration will be written later.