Escape will only deploy cluster series - k8s cluster network model and cross host communication

        

catalogue

1, Requirements for network environment of k8s cluster

2, k8s cluster network communication flow chart

3, Analysis of communication flow of k8s cluster network  

1. How to satisfy cluster Pod IP uniqueness

2. The pause container creates a shared namespace

3. pod network plug-in bridge

4. The data packet is sent by our party through vxlan tunnel

5. The other party of the packet received the unpacking

6. Talk about the role of the whole process

7. Using host GW mode to improve cluster network performance

There are many principle articles. Here, we mainly use our own understanding to sort out the cluster cross host communication, which is a summary. It is recommended to read more principle introductions, and then look at the conclusion, which is suddenly enlightened.

First confirm

1, Requirements for network environment of k8s cluster

1. CNI requires that each Pod in the cluster must be assigned a unique Pod IP

2. k8s the communication in the cluster is not vxlan point-to-point communication, because all nodes in the cluster need to be interconnected

2, k8s cluster network communication flow chart

         Next, we will describe the implementation process of cluster network requirements in detail according to the picture:

3, Analysis of communication flow of k8s cluster network  

1. How to satisfy cluster Pod IP uniqueness

Since the POD address is required to be unique, how does flannel assign a unique address segment to each node:

        When deploying a flannel, you need to specify a network segment. This environment is 10.244.0.0, and then each node deploying a flannel will be assigned a sub network segment. As shown in the figure, the pod address of the master node is always 10.244.0.x, the pod address of the node1 node is always 10.244.1.x, and the pod address of the node2 node is always 10.244.2.x, so there will be no duplication;

        When communicating between nodes, use route -n to view the routing rules. The internal container communication of the master can be directly routed through cni0. When the master accesses node1, it can be routed through flannel.1

[root@k8s-master ~]# kubectl -n kube-system exec kube-flannel-ds-pbqj4 cat /etc/kube-flannel/net-conf.json
{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan"
  }
}
[root@k8s-master ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.224.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.224.1.0      10.224.1.0      255.255.255.0   UG    0      0        0 flannel.1
10.224.2.0      10.224.2.0      255.255.255.0   UG    0      0        0 flannel.1
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
[root@k8s-master ~]# kubectl get po -owide -A
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
default       busybox-5d7b4b65d6-mk6c6             1/1     Running   0          16m   10.224.1.2      k8s-node1    <none>           <none>
kube-system   coredns-5644d7b6d9-7gw6t             1/1     Running   5          9d    10.224.0.11     k8s-master   <none>           <none>
kube-system   coredns-5644d7b6d9-vd5vk             1/1     Running   5          9d    10.224.0.10     k8s-master   <none>           <none>
[root@k8s-master ~]# cat /run/flannel/subnet.env 
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.224.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

2. The pause container creates a shared namespace

         Each pod in the k8s cluster corresponds to the shared namespace created by the pause container to realize container interworking. Therefore, it can be seen from the figure that when the so-called cluster cross host communication object is pod, the internal container has realized interworking, and now it only needs to communicate with the external interface uniformly. Find a picture to understand.  

3. pod network plug-in bridge

        What is a virtual bridge? It is used to collect container networks and distribute them uniformly. The purpose is to provide a unified scheduling for container communication. The bridge here refers to cni0. Cni0 is essentially the same as docker0. All container communications must pass through cni0, and cni0 can be used directly to judge internal communication. After all, pod is on the same bridge. Please stay here, Cross host, continue to the next step.

[root@k8s-master ~]# brctl show
bridge name	bridge id		STP enabled	interfaces
cni0		8000.86355d489a2a	no		veth523f7aca
							            veth8e5df844
                                        veth6e1df844
                                        veth9e4df844
docker0		8000.02426bf09add	no	

4. The data packet is sent by our party through vxlan tunnel

         The full name of VXLAN is Virtual eXtensible Local Area Network, which is mainly used to build a virtual two-layer network through a three-layer network. VXLAN is essentially a tunnel technology to establish a logical tunnel on the IP network between the source network equipment and the destination network equipment, The user side message is forwarded through this tunnel after specific encapsulation. From the user's point of view, the server accessing the network is like connecting to different ports of a virtual layer-2 switch.
          VTEP (VXLAN Tunnel Endpoints) is the edge device of VXLAN network and the starting and ending point of VXLAN tunnel. VXLAN encapsulates and de encapsulates the user's original data frame on VTEP. In K8s, the corresponding is flannel.1
         VNI (VXLAN Network Identifier): VNI is the identifier of each vxlan. Generally, each VNI corresponds to a tenant. The. 1 of flannel.1 is the so-called identifier
        What do you mean? If a family arrives at cni0, they can communicate with each other. During cross host communication, the cni0 bridge determines the target through routing information. At present, the two nodes only have three layers of interworking, that is, IP address interworking. vxlan tunnel sends network requests to vtep. Vtep records the addresses of a and B, mac and other information, and uses the ens33 network card after packets, A sends the packet to B's ens33 network card.

5. The other party of the packet received the unpacking

        After receiving the sent packet, B arrives at the B.flannel.1 site. Once your identifier is 1, I am also 1. Yes, start unpacking. When the packet is unpacked, I can see A's address and other information. I don't know it. Give it to the route, and then go to B's Bridge to finally realize communication. vice versa. As for the role of flannel in this process, vxlan defaults to point-to-point mode. The k8s cluster comes with an etcd cluster. Flannel is deployed on the node to collect address mac and other information to report, so as to realize the distributed registry. This is also the reason why flannel. 1 acts as vtep.

6. Talk about the role of the whole process

pause container: create a virtual cyberspace to find a home for the container in the same pod

cni0: the role of bridge, the entrance and exit of container traffic, and the commander of container communication

flannel.1: vtep role, solve vxlan single point problem, unpack and match with. 1 identifier

ens33: the actual interface of network communication, the two tier is not powerful, it can only be suck up and go from here.

Flannel: node agent, record the host information, report it to etcd, maintain flannel.1, and ensure that it is known when unpacking

vxlan: virtual tunnel technology, the commander who steals the day, realizes virtual layer-2 communication, that is, vtep direct communication between the two sides

7. Using host GW mode to improve cluster network performance

         vxlan mode is applicable to the three-tier reachable network environment and has loose requirements for the cluster network. However, at the same time, it will carry out additional packet and unpacking through VTEP equipment, which will bring additional overhead to the performance.

         The purpose of host GW is to send the traffic of the local cni0 bridge to the cni0 bridge of the destination host. In fact, many clusters are deployed in the same layer-2 network environment, and the layer-2 host can be directly used as the traffic forwarding gateway. In this way, the traffic can be forwarded directly through the routing table without unpacking.

        What do you mean, you don't need to unpack layer 2 - layer 3 - layer 3 of the other party - layer 2 of the other party. Once you see that you can communicate directly, you can directly communicate with layer 2 cni0. How can you judge whether I can reach layer 2 or layer 3?

1. The second layer means that the interfaces at both ends are in the bridge mode and interconnected through access or trunk.

2. The third layer is that the interface modes at both ends are in the route routing mode, and the IP address interconnection is directly configured on the interface.

Generally speaking, the second layer checks MAC forwarding on the same network segment, and the third layer checks routing forwarding across network segments. If the same network segment, use the host GW mode decisively!

Why does the three-tier reachable network not directly use the gateway to forward traffic?

According to the routing rules in the kernel, the gateway must be in the same network segment as at least one IP in the host.
Because k8s all nodes in the cluster need to realize Pod interworking, it means that the host GW mode requires the entire cluster nodes to be in the same layer 2 network.

Modify the network backend of flannel:

$ kubectl edit cm kube-flannel-cfg -n kube-system
...
net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "host-gw"
      }
    }
kind: ConfigMap
...
reconstruction Flannel of Pod

$ kubectl -n kube-system delete po kube-flannel-ds-amd64-5dgb8 kube-flannel-ds-amd64-c2gdc kube-flannel-ds-amd64-t2jdd
# After the Pod is newly started, check the log and the word backend type: host GW appears
$  kubectl -n kube-system logs -f kube-flannel-ds-amd64-4hjdw
I0704 01:18:11.916374       1 kube.go:126] Waiting 10m0s for node controller to sync
I0704 01:18:11.916579       1 kube.go:309] Starting kube subnet manager
I0704 01:18:12.917339       1 kube.go:133] Node controller sync successful
I0704 01:18:12.917848       1 main.go:247] Installing signal handlers
I0704 01:18:12.918569       1 main.go:386] Found network config - Backend type: host-gw
I0704 01:18:13.017841       1 main.go:317] Wrote subnet file to /run/flannel/subnet.env
​
View node routing table:

$ route -n 
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Keywords: Kubernetes vxlan

Added by art15 on Thu, 18 Nov 2021 18:45:12 +0200