Resource scheduling (nodeSelector, nodeAffinity, tail, Tolrations)
1.nodeSelector
nodeSelector is the simplest way to constrain. nodeSelector is pod A field of the spec
Use -- show labels to view the labels of the specified node
[root@master haproxy]# kubectl get node node1 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux [root@master haproxy]#
If no additional nodes labels are added, you will see the default labels as shown above. We can add labels to the specified node through the kubectl label node command:
[root@master haproxy]# Kubectl get node node1 -- show labels / / you can view it now NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux [root@master haproxy]#
You can also delete the specified labels through kubectl label node
[root@master haproxy]# kubectl label node node1 disktype- node/node1 labeled [root@master haproxy]# kubectl get node node1 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux [root@master haproxy]#
Create a pod and specify the nodeSelector option to bind nodes:
[root@master haproxy]# kubectl label node node1 disktype=ssd node/node1 labeled [root@master haproxy]# kubectl get node node1 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux [root@master haproxy]#
[root@master haproxy]# cat test.yml --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: httpd2 name: httpd2 spec: replicas: 1 selector: matchLabels: app: httpd2 template: metadata: labels: app: httpd2 spec: containers: - image: 3199560936/httpd:v0.4 name: httpd2 --- apiVersion: v1 kind: Service metadata: name: httpd2 spec: ports: - port: 80 targetPort: 80 selector: app: httpd2 [root@master haproxy]# kubectl create -f test.yml deployment.apps/httpd2 created service/httpd2 created [root@master haproxy]#
View nodes scheduled by pod
[root@master haproxy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpd2-fd86fb676-l7gk4 1/1 Running 0 61s 10.244.1.45 node1 <none> <none> [root@master haproxy]#
It can be seen that the created pod is forcibly scheduled to the node of the labs of disktype=ssd
2.nodeAffinity
nodeAffinity means node affinity scheduling policy. Is a new scheduling policy to replace nodeSelector. At present, there are two kinds of node affinity expression:
- RequiredDuringSchedulingIgnoredDuringExecution:
The rules must be met before the pode can be scheduled to the Node. Equivalent to hard limit - PreferredDuringSchedulingIgnoreDuringExecution:
It is emphasized that priority is given to meeting the established rules. The scheduler will try to schedule the pod to the Node, but it is not mandatory, which is equivalent to soft restriction. Multiple priority rules can also set weight values to define the order of execution.
Ignored during execution means:
If the label of the node where a pod is located changes during the operation of the pod, which does not meet the node affinity requirements of the pod, the system will ignore the change of label on the node, and the pod can run on the node by machine.
The operators supported by NodeAffinity syntax include:
-
The value of In: label is In a list
-
NotIn: the value of label is not in a list
-
Exists: a label exists
-
DoesNotExit: a label does not exist
-
Gt: the value of label is greater than a certain value
-
Lt: the value of label is less than a certain value
Precautions for nodeAffinity rule setting are as follows:
- If both nodeSelector and nodeAffinity are defined, name must meet both conditions before pod can finally run on the specified node.
- If nodeasffinity specifies multiple nodeSelectorTerms, one of them can match successfully.
- If there are multiple matchExpressions in nodeSelectorTerms, a node must meet all matchExpressions to run the pod.
[root@master haproxy]# cat test.yml apiVersion: v1 kind: Pod metadata: name: test1 labels: app: nginx spec: containers: - name: test1 image: nginx imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype values: - ssd operator: In preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 preference: matchExpressions: - key: name values: - test operator: In [root@master haproxy]#
The node2 host is also labeled disktype=ssd
[root@master haproxy]# kubectl label node node2 disktype=ssd node/node2 labeled [root@master haproxy]# kubectl get node node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node2 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux [root@master haproxy]#
test
Label node1 with name=sb
[root@master ~]# kubectl label node node1 name=sb node/node1 labeled [root@master ~]# kubectl get node node1 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux,name=sb [root@master ~]#
Create a pod to view the results
[root@master haproxy]# cat httpd.yml --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: httpd2 name: httpd2 spec: replicas: 1 selector: matchLabels: app: httpd2 template: metadata: labels: app: httpd2 spec: containers: - image: 3199560936/httpd:v0.4 name: httpd2 --- apiVersion: v1 kind: Service metadata: name: httpd2 spec: ports: - port: 80 targetPort: 80 selector: app: httpd2 [root@master haproxy]# [root@master haproxy]# kubectl apply -f httpd.yml deployment.apps/httpd2 created service/httpd2 created [root@master haproxy]#
[root@master haproxy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpd2-fd86fb676-b8pqx 1/1 Running 0 13s 10.244.1.46 node1 <none> <none> [root@master haproxy]#
Delete name=sb and view the test results
[root@master haproxy]# kubectl label node node1 name- node/node1 labeled [root@master haproxy]# kubectl get node node1 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 4d12h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux [root@master haproxy]#
[root@master haproxy]# kubectl apply -f haproxy.yml deployment.apps/haproxy created service/haproxy created [root@master haproxy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES haproxy-74f8f5c6cf-6pf9w 0/1 ContainerCreating 0 8s <none> node2 <none> <none> httpd2-fd86fb676-xggxk 1/1 Running 0 65s 10.244.1.47 node1 <none> <none> [root@master haproxy]#
The above pod is first required to run on nodes with the label disktype=ssd. If multiple nodes have this label, it is preferred to create it on the label name=sb
3. Taint and tolerances
Taints: avoid point scheduling to a specific Node
Accelerations: allow Pod to be scheduled to the Node holding Taints
Application scenario:
Private Node: nodes are grouped and managed according to the business line. It is hoped that this Node will not be scheduled by default. Allocation is allowed only when stain tolerance is configured
- Equipped with special hardware: some nodes are equipped with SSD, hard disk and GPU. It is hoped that the Node will not be scheduled by default. The allocation is allowed only when stain tolerance is configured
- Taint based expulsion
effect description
In the above example, the value of effect is NoSchedule. The following is a brief description of the value of effect:
- NoSchedule: if a pod does not declare tolerance for this Taint, the system will not schedule the pod to the node with this Taint
- PreferNoSchedule: the soft restricted version of NoSchedule. If a pod does not declare tolerance for this Taint, the system will try to avoid scheduling this pod to this node, but it is not mandatory.
- NoExecute: defines the expulsion behavior of pod to deal with node failure. The Taint effect of NoExecute has the following effects on the running pods on the node:
- Pod s without a tolerance will be expelled immediately
- The pod corresponding to the tolerance is configured. If no value is assigned to the tolerance seconds, it will remain in this node all the time
- If the pod corresponding to the tolerance is configured and the tolerance seconds value is specified, it will be expelled after the specified time
- Since kubernetes version 1.6, an alpha version has been introduced, That is, the node fault is marked as Taint (currently only for node unreachable and node not ready, and the values of the corresponding NodeCondition "Ready" are Unknown and False). After activating the TaintBasedEvictions function (adding TaintBasedEvictions=true in the – feature gates parameter), the NodeController will automatically set Taint for the node and the status is "Ready" The normal expulsion logic previously set on the node of will be disabled. Note that in case of node failure, in order to maintain the existing speed limit setting of pod expulsion, the system will gradually set Taint for the node in the speed limit mode, which can prevent the consequences of a large number of pods being expelled in some specific cases (such as the temporary loss of contact with the master). This function is compatible with tolerationSeconds and allows the pod to define how long it will be expelled in case of node failure.
Taint
[root@master ~]# kubectl describe node master Slightly...... detach: true CreationTimestamp: Sat, 18 Dec 2021 22:07:52 -0500 Taints: node-role.kubernetes.io/master:NoSchedule //Avoid pod scheduling to specific node s Unschedulable: false
Tolerances (stain tolerance)
[root@master ~]# kubectl describe pod httpd2-fd86fb676-xnrcc Slightly...... Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s //Allow pod to schedule to the node of existing Taints Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 57s default-scheduler Successfully assigned default/httpd2-fd86fb676-xnrcc to node1 Normal Pulled 56s kubelet Container image "3199560936/httpd:v0.4" already present on machine Normal Created 56s kubelet Created container httpd2 Normal Started 56s kubelet Started container httpd2 [root@master ~]#
Add five points to the node
format: kubectl taint node [node] key=value:[effect]
Where [effect] can be taken as:
- NoSchedule: must not be scheduled
- PreferNoSchedule: try not to schedule. Tolerance must be configured
- NoExecute: not only will it not be scheduled, it will also expel the existing Pod on the Node
Add the stain tolerance field to the Pod configuration
Add stain disktype
[root@master ~]# kubectl taint node node1 disktype:NoSchedule node/node1 tainted [root@master ~]#
see
[root@master ~]# kubectl describe node node1 Slightly...... detach: true CreationTimestamp: Sat, 18 Dec 2021 22:10:36 -0500 Taints: disktype:NoSchedule //Here you can see the successful addition Unschedulable: false Slightly...... [root@master ~]#
Create a container to test
[root@master haproxy]# cat haproxy.yml --- apiVersion: apps/v1 kind: Deployment metadata: name: haproxy namespace: default spec: replicas: 1 selector: matchLabels: app: haproxy template: metadata: labels: app: haproxy spec: containers: - image: 93quan/haproxy:v1-alpine imagePullPolicy: Always env: - name: RSIP value: "10.106.56.19 10.96.149.182" name: haproxy ports: - containerPort: 80 hostPort: 80 --- apiVersion: v1 kind: Service metadata: name: haproxy namespace: default spec: ports: - port: 80 protocol: TCP targetPort: 80 selector: app: haproxy type: NodePort [root@master haproxy]# [root@master haproxy]# kubectl create -f haprxoy.yml deployment.apps/haproxy created service/haproxy created
see
[root@master haproxy]# kubectl get pods -o wide #Here you can find that there are stains on node1, so the created container will appear on node2 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES haproxy-74f8f5c6cf-k8867 0/1 ContainerCreating 0 2m47s <none> node2 <none> <none> httpd2-fd86fb676-xnrcc 1/1 Running 0 17m 10.244.1.44 node1 <none> <none> [root@master haproxy]#
Remove stains
Syntax: kubectl taint node [node] key:[effect]-
[root@master haproxy]# kubectl taint node node1 disktype- node/node1 untainted [root@master haproxy]#
see
[root@master haproxy]# kubectl describe node node1 Slightly...... detach: true CreationTimestamp: Sat, 18 Dec 2021 22:10:36 -0500 Taints: <none> //It can be seen that the stain removal is successful Unschedulable: false