Containerization Technology: scheduling strategy of Pod in Kubernetes - label and stain practice

1, Overview of Pod scheduling strategy

The biggest problem to be solved in the container choreography function in Kubernetes is to schedule the created Pod to the Node. Then, how does the Pod decide which Node to schedule to. This involves the Kube scheduler component introduced when we installed the Kubernetes cluster earlier.

Kube scheduler selects a pod based on two steps:

(1) Filtering: according to the scheduling requirements of Pod, filter out some nodes, such as whether resources match, labels match, stains are tolerated, etc.

(2) Scoring: when the filtering requirements are met, these nodes are scored. The nodes with high scores will be established as the nodes of the whole Pod scheduling.

Pod's demand for resources has been introduced to readers in the previous content. Scheduling is possible only when the resources of the corresponding Node meet the pod's request for resources.

There are two other strategies that will directly affect our Pod scheduling, namely:

(1) Label

(2) Stain

2, Scheduling practice of Label

The label label can be defined on various resources in Kubernetes cluster. The upper level resources can select the corresponding resources according to the label. There are three common ways to use labels:

(1) Deployment: select the Pod corresponding to the Service according to the tag. It will be introduced later.

(2) PVC can select the PV with this label that meets the resource requirements according to the label selector. It will be introduced later.

(3) When Pod is defined, the definition can be displayed and deployed on the node of the label that meets the requirements.

Next, we label the node through an example, and define the node with what label to deploy in the Pod to verify the effect.

Adding tags to our resources in Kubernetes cluster can be defined in the metadata of yaml file, as shown below:

metadata: 
  labels: 
    key1: value1
    key2: value2

For the Worker node in Kubernetes cluster, we can label the node through command operation. The commands to remove the label are as follows:

(1)kubectl label nodes node1 key1=value1

(2)kubectl label nodes node1 key1-

First, label the node:

kubectl label nodes kubernetes-worker01 scheduler-node=node1

Verify whether the next node is labeled. The query results are as shown in the figure. It is found that the label has been successfully labeled.

[root@kubernetes-master01 ~]# kubectl describe node kubernetes-worker01
Name:               kubernetes-worker01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubernetes-worker01
                    kubernetes.io/os=linux
                    scheduler-node=node1

In the second step, we use yaml to define a Pod. The contents of the file are as follows:

apiVersion: v1
kind: Pod
metadata:
  name: pod-nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
      hostPort: 80
  nodeSelector:
    scheduler-node: node1

Create this Pod:

kubectl apply -f pod-nginx-select.yaml

Let's see which node the Pod is scheduled to. As shown in the figure, this Pod has been defined to the corresponding node.

[root@kubernetes-master01 ~]# kubectl get pod -o wide
NAME        READY   STATUS    RESTARTS   AGE    IP            NODE                  
pod-nginx   1/1     Running   0          102s   10.244.1.28   kubernetes-worker01

Step 3: modify the definition of Pod and change the value of the label scheduler node defined by the node to node2. The changes in yaml file are as follows:

nodeSelector:
    scheduler-node: node2

Step 4: delete the previous Pod and re create the Pod, as shown in the figure, and execute the corresponding command.

[root@kubernetes-master01 ~]# kubectl delete pod pod-nginx
pod "pod-nginx" deleted
[root@kubernetes-master01 ~]# kubectl apply -f pod-nginx-select.yaml
pod/pod-nginx created

Step 5: let's check the status of the Pod. As shown in the figure, the status of the Pod is Pending.

[root@kubernetes-master01 ~]# kubectl get pod -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP       NODE     
pod-nginx   0/1     Pending   0          7s    <none>   <none>   

Step 6: check the specific operation of the Pod and why the scheduling failed, as shown in the figure, because the node corresponding to the nodeSelector was not found.

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  78s (x4 over 4m11s)  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

Next, let's summarize the relevant contents of the tag. The tag is to mark the corresponding tag on the corresponding resource. In this example, it is to mark the corresponding tag on the node, and then let the Pod decide to deploy to a specific type of node according to the tag selection parameter nodeSelector.

For advanced resources such as Deployment and Service, the same principle applies to their Pod selection strategy. You can mark the corresponding tag on the Pod and set the tag range of the corresponding Pod on resources such as Deployment or Service, so that these advanced resources can find the corresponding Pod.

In the yaml file, the format of the matching label specified by the upper level resource is as follows:

selector:
  matchLabels:
    component: redis
  matchExpressions:
    - {key: tier, operator: In, values: [cache]}
    - {key: environment, operator: NotIn, values: [dev]}

Of which:

(1) matchLabels is a mapping consisting of {key,value} pairs.

(2) A single {key,value} In the matchLabels map is equivalent to the element of matchExpressions. The key field is "key", the operator is "In", and the values array contains only "value".

(3) Match expressions is a list of Pod selection operator requirements.

(4) Valid operators include In, NotIn, Exists, and DoesNotExist.

(5) In the case of in and NotIn, the set value must be non empty.

(6) All requirements from matchLabels and matchExpressions are grouped together in a logical relationship with – they must all be met to match.

3, Scheduling practice of stain

In addition to using tags to match resources accordingly, we also have a mechanism called stain. Stains are generally bad things. When our nodes do not want Pod scheduling to the top because of some situations, we can stain the nodes. The Pod will not dispatch the nodes with stains, unless it is clearly indicated on the Pod that the corresponding stains can be tolerated.

The commands to stain nodes are:

kubectl taint nodes node1 key1=value1:NoSchedule

Of which:

(1) node1: is the name of the node.

(2) key1: it's a stain. It's OK.

(3) value1: is the value of the stain.

(4) NoSchedule: represents the effect of the stain.

The value of the effect of the stain can be:

(1) NoSchedule: Pod will not be scheduled to this node.

(2) PreferNoSchedule: Pod may be scheduled to this node, but it will not be scheduled to this node first.

(3) NoExecute: if the Pod is not running on this node, it will not be scheduled to this node. If the Pod is already running on this node, the Pod will be expelled.

We use an example to lead readers to practice and understand the concept of stain. In the example, we stain a node, create a Pod normally, and then check the scheduling of the Pod. Then we modify the definition of Pod to tolerate this stain, and check the scheduling of Pod again.

Step 1: stain both Worker nodes.

kubectl taint node kubernetes-worker01 key1=value1:NoSchedule
kubectl taint node kubernetes-worker02 key1=value1:NoSchedule

Check the stain on the node. As shown in the figure, the node has been successfully stained.

[root@kubernetes-master01 ~]# kubectl describe node kubernetes-worker01
Name:               kubernetes-worker01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubernetes-worker01
                    kubernetes.io/os=linux
                    scheduler-node=node1
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"82:5b:4d:5a:49:f8"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.8.22
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 14 Mar 2021 10:47:25 +0800
Taints:             key1=value1:NoSchedule
[root@kubernetes-master01 ~]# kubectl describe node kubernetes-worker02
Name:               kubernetes-worker02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubernetes-worker02
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"96:a9:e7:6c:c6:32"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.8.33
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 14 Mar 2021 10:47:25 +0800
Taints:             key1=value1:NoSchedule

Step 2: define a Pod normally, and its yaml file contents are as follows:

apiVersion: v1
kind: Pod
metadata:
  name: pod-nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
      hostPort: 80

Create this Pod:

kubectl apply -f pod-nginx-toleration.yaml

Step 3: check the status of the Pod. As shown in the figure, the Pod is not scheduled successfully.

[root@kubernetes-master01 k8s-yaml]# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
pod-nginx   0/1     Pending   0          67s

Step 4: check the reason why the Pod failed to schedule successfully, as shown in the figure. Because there are stains on the master and two node s, the scheduling failed.

Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  3s (x3 over 93s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {key1: value1}, that the pod didn't tolerate.

Step 5: we delete the previous Pod.

kubectl delete pod pod-nginx

Step 6: modify the definition of Pod so that it can tolerate the corresponding stains.

apiVersion: v1
kind: Pod
metadata:
  name: pod-nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
      hostPort: 80
  tolerations:
  - key: "key1"
    operator: "Exists"
    effect: "NoSchedule"

Create this Pod:

kubectl apply -f pod-nginx-toleration.yaml

Step 7: check the scheduling status of the Pod. As shown in the figure, the Pod has been successfully scheduled.

[root@kubernetes-master01 k8s-yaml]# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
pod-nginx   1/1     Running   0          56s

Check the running status of the specific Pod, as shown in the figure. It is found that it has been successfully dispatched to node kubernetes-worker01.

Events:
  Type    Reason     Age   From                          Message
  ----    ------     ----  ----                          -------
  Normal  Scheduled  21s   default-scheduler             Successfully assigned default/pod-nginx to kubernetes-worker01
  Normal  Pulling    20s   kubelet, kubernetes-worker01  Pulling image "nginx"
  Normal  Pulled     4s    kubelet, kubernetes-worker01  Successfully pulled image "nginx"
  Normal  Created    4s    kubelet, kubernetes-worker01  Created container nginx
  Normal  Started    4s    kubelet, kubernetes-worker01  Started container nginx

Next, I'll summarize the relevant information about stains. Generally, stains will define the poor operation of some nodes. Under normal circumstances, Pod will not be scheduled to the nodes with stains. Unless the corresponding stain can be tolerated is defined in the Pod. The nodes in Kubernetes cluster will automatically add corresponding stains to the nodes according to the operation of the nodes:

(1)node. kubernetes. IO / not Ready: the node is not Ready. This is equivalent to the value of node state Ready is "False".

(2)node.kubernetes.io/unreachable: the node controller cannot access the node This is equivalent to the value of node state Ready as "Unknown".

(3)node. kubernetes. IO / out of disk: node disk is exhausted.

(4)node. kubernetes. IO / memory pressure: the node has memory pressure.

(5)node. kubernetes. IO / disk pressure: the node has disk pressure.

(6)node. kubernetes. IO / network unavailable: the node network is unavailable.

(7)node. kubernetes. IO / unscheduled: node is not schedulable.

(8)node.cloudprovider.kubernetes.io/uninitialized: if an "external" cloud platform driver is specified when kubelet starts, it will add a stain to the current node and mark it as unavailable. After a controller of cloud controller manager initializes the node, kubelet will remove the stain.

So under what circumstances do we need to actively add stains to nodes in production? It is generally based on the following two situations:

(1) Dedicated nodes: when we need some pods to run on specific nodes, we can stain these nodes and make these corresponding pods tolerate these stains.

(2) Nodes equipped with special hardware: because some pods have some special requirements for the use of resources, at this time, the stain mechanism can be used to solve it, so as to achieve the goal of Pod running on these special resources.

This section explains the scheduling strategy of Pod for readers. First, the corresponding Node needs to meet the requirements of Pod for resources. Then, based on the label and stain mechanism, the purpose of interfering with the scheduling of Pod can be achieved, so that Pod can decide whether to run or not to run or not to give priority to running on some nodes. Finally, according to the scoring mechanism, in the Node that meets the requirements, Select the optimal Node to run.

4, Guess you like it

If you are interested in the knowledge of containerization technology, you can read: Beautiful containerization technology Kubernetes column

Features of this column:

  • Combining theory with practice
  • Explain in simple terms

Keywords: Docker Kubernetes

Added by Ricklord on Mon, 07 Mar 2022 21:03:20 +0200