catalogue
2. Factors affecting kubernetes scheduling
2.3 affinity and anti affinity
2.5 instructions affecting pod scheduling
1. Introduction
-
The scheduler uses kubernetes' watch mechanism to discover newly created pods in the cluster that have not yet been scheduled to nodes. The scheduler will schedule each unscheduled Pod found to run on a suitable Node.
-
Kube scheduler is the default scheduler for Kubernetes clusters and is part of the cluster control surface. If you really want or need this, Kube scheduler is designed to allow you to write a scheduling component and replace the original Kube scheduler.
-
Factors to be considered when making scheduling decisions include: individual and overall resource requests, hardware / software / policy constraints, affinity and anti affinity requirements (more use), data locality, interference between loads, etc.
The default policy can be referenced
Scheduling framework
2. Factors affecting kubernetes scheduling
2.1 nodeName
-
NodeName is the simplest method for node selection constraints, but it is generally not recommended. If nodeName is specified in PodSpec, it takes precedence over other node selection methods.
-
Use nodeName to select some restrictions of nodes: (all will report errors)
If the specified node does not exist.
If the specified node has no resources to accommodate the pod, the pod scheduling fails.
Node names in a cloud environment are not always predictable or stable.
[root@server2 ~]# vim pod1.yml [root@server2 ~]# cat pod1.yml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: myapp:v1 nodeName: server3 ##Specify server3 [root@server2 ~]# kubectl apply -f pod1.yml pod/nginx created [root@server2 ~]# kubectl get pod -o wide ##View details
2.2 nodeSelector
-
nodeSelector is the simplest recommended form of node selection constraint. (where is the priority of label scheduling? Where will it be next time). If two hosts have labels at the same time, but one has insufficient resources, it will be scheduled to the other host.
-
Label the selected node:
kubectl label nodes server2 disktype=ssd
[root@server2 ~]# kubectl label nodes server4 disktype=ssd ##Add label [root@server2 ~]# kubectl get node --show-labels [root@server2 ~]# vim pod1.yml [root@server2 ~]# cat pod1.yml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m nodeSelector: disktype: ssd
2.3 affinity and anti affinity
If node and pod exist at the same time and conflict occurs, an error will be reported!
2.3.1 node affinity
-
Affinity and anti affinity. nodeSelector provides a very simple way to constrain a pod to a node with a specific label. The affinity / anti affinity function greatly expands the types of constraints you can express.
You can find that the rule is "soft" / "preference", not a hard requirement. Therefore, if the scheduler cannot meet the requirement, it still schedules the pod
You can use the label of the pod on the node to constrain, rather than the label of the node itself, to allow which pods can or cannot be placed together.
Node affinity (only during scheduling)
requiredDuringSchedulingIgnoredDuringExecution must meet
preferredDuringSchedulingIgnoredDuringExecution tends to satisfy
-
IgnoreDuringExecution means that if the label of the Node changes during the operation of the Pod, resulting in the affinity policy can not be met, the current Pod will continue to run.
-
nodeaffinity also supports the configuration of a variety of rule matching criteria, such as
The value of In: label is In the list
NotIn: the value of label is not in the list
Gt: the value of label is greater than the set value, and Pod affinity is not supported
Lt: the value of label is less than the set value, and pod affinity is not supported
Exists: the label set does not exist
DoesNotExist: the label set does not exist
[root@server2 ~]# kubectl label nodes server3 disktype=ssd ##Add the same label as server4 [root@server2 ~]# kubectl label nodes server4 role=db ##Adding role s to server4 [root@server2 ~]# cat pod1.yml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: role operator: In values: - db [root@server2 ~]# kubectl apply -f pod1.yml [root@server2 ~]# kubectl get pod -o wide
2.3.2 pod affinity
- pod affinity and anti affinity
podAffinity mainly solves the problem of which pods can be deployed in the same topology domain (the topology domain is implemented with the host label, which can be a single host or a cluster, zone, etc. composed of multiple hosts.)
podAntiAffinity mainly solves the problem that pods cannot be deployed in the same topology domain with which pods. They deal with the relationship between POD and POD within the Kubernetes cluster.
Inter Pod affinity and anti affinity may be more useful when used with higher-level collections (such as ReplicaSets, stateful sets, Deployments, etc.). You can easily configure a set of workloads that should be in the same defined topology (for example, nodes).
Affinity and anti affinity between pods require a lot of processing, which may significantly slow down scheduling in large-scale clusters.
Example: affinity [root@server2 ~]# kubectl run demo --image=busyboxplus -it ##Run a pod [root@server2 ~]# kubectl get pod -o wide ##See which server host [root@server2 ~]# vim pod2.yaml [root@server2 ~]# cat pod2.yaml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: run operator: In values: - demo topologyKey: kubernetes.io/hostname [root@server2 ~]# kubectl apply -f pod2.yaml [root@server2 ~]# kubectl get pod -o wide ##All at one node
Example: anti affinity [root@server2 ~]# vim pod2.yaml [root@server2 ~]# cat pod2.yaml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m affinity: podAntiAffinity: ##Anti affinity only needs to be modified requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: run operator: In values: - demo topologyKey: kubernetes.io/hostname [root@server2 ~]# kubectl apply -f pod2.yaml [root@server2 ~]# kubectl get pod -o wide ##See if there is no longer a node
2.4 taints
-
NodeAffinity Node affinity is an attribute defined on the Pod, which enables the Pod to be scheduled to a Node according to our requirements. On the contrary, Taints can make the Node refuse to run the Pod or even expel the Pod.
-
Taints is an attribute of a Node. After setting taints, Kubernetes will not schedule the Pod to this Node, so Kubernetes sets an attribute tolerance for the Pod. As long as the Pod can tolerate the stain on the Node, Kubernetes will ignore the stain on the Node and can (not necessarily) schedule the Pod.
-
You can use the command kubectl taint to add a taint to the node:
$kubectl taint nodes node1 key=value:NoSchedule / / create
$kubectl describe nodes server1 |grep Taints / / query
$kubectl taint nodes node1 key:NoSchedule - / / delete
Where [effect] can take the value: [NoSchedule | PreferNoSchedule | NoExecute]
NoSchedule: POD will not be scheduled to nodes marked tails.
PreferNoSchedule: the soft policy version of NoSchedule.
NoExecute: this option means that once Taint takes effect, if the running POD in this node does not have a corresponding tolerance setting, it will be evicted directly. -
The key, value, and effect defined in the tolerances should always be consistent with the taint set on the node:
If operator is Exists, value can be omitted.
If the operator is Equal, the relationship between key and value must be Equal.
If the operator attribute is not specified, the default value is Equal.
There are also two special values:
When no key is specified, all keys and value s can be matched with Exists, and all stains can be tolerated.
When no effect is specified, all effects are matched.
Example: a stain and add a tolerance label [root@server2 ~]# kubectl describe nodes server2 | grep Taints ##There is a stain on the master Taints: node-role.kubernetes.io/master:NoSchedule [root@server2 ~]# kubectl describe nodes server3 | grep Taints Taints: <none> [root@server2 ~]# kubectl describe nodes server4 | grep Taints [root@server2 ~]# kubectl taint node server3 key1=v1:NoExecute ##Generate a stain for server3 [root@server2 ~]# vim pod.yml [root@server2 ~]# cat pod.yml ##Set tolerance label tolerances: apiVersion: apps/v1 kind: Deployment metadata: name: nginx namespace: default spec: replicas: 2 selector: matchLabels: run: nginx template: metadata: labels: run: nginx spec: hostNetwork: true containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 100Mi limits: cpu: 0.5 memory: 512Mi tolerations: - key: "key1" operator: "Equal" value: "v1" effect: "NoExecute" [root@server2 ~]# kubectl apply -f pod.yml [root@server2 ~]# kubectl get pod -o wide ##All on server3 because of the calico network plug-in
Example: two special values ##Both server3 and server4 are tainted [root@server2 ~]# kubectl describe nodes server3 | grep Taints Taints: key1=v1:NoExecute [root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule node/server4 tainted [root@server2 ~]# kubectl describe nodes server4 | grep Taints Taints: key2=v2:NoSchedule [root@server2 ~]# vim pod.yml [root@server2 ~]# cat pod.yml apiVersion: apps/v1 kind: Deployment metadata: name: nginx namespace: default spec: replicas: 2 selector: matchLabels: run: nginx template: metadata: labels: run: nginx spec: hostNetwork: true containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 100Mi limits: cpu: 0.5 memory: 200Mi tolerations: - operator: "Exists" [root@server2 ~]# kubectl apply -f pod.yml [root@server2 ~]# kubectl get pod [root@server2 ~]# kubectl get pod -o wide ##See if both hosts can run
2.5 instructions affecting pod scheduling
- Instructions affecting pod scheduling include cordon, drain and delete. Later created pods will not be scheduled to this node, but the degree of violence is different.
2.5.1 cordon
- cordon stop scheduling:
The impact is minimal. Only the node will be set to scheduling disabled. The newly created pod will not be scheduled to the node. The original pod of the node will not be affected and will still provide services to the outside world normally.
$ kubectl cordon server3
$ kubectl get node
NAME STATUS ROLES AGE VERSION
server1 Ready 29m v1.17.2
server2 Ready 12d v1.17.2
server3 Ready,SchedulingDisabled 9d v1.17.2
$kubectl uncordon server3 / / recovery
[root@server2 ~]# kubectl taint node server3 key1=v1:NoExecute- ##Remove stains [root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule- ##Remove stains [root@server2 ~]# kubectl cordon server3 ##Stop scheduling server3 [root@server2 ~]# kubectl get node ##Check the node [root@server2 ~]# cat pod.yml apiVersion: apps/v1 kind: Deployment metadata: name: nginx namespace: default spec: replicas: 2 selector: matchLabels: run: nginx template: metadata: labels: run: nginx spec: hostNetwork: true containers: - name: nginx image: myapp:v1 imagePullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 100Mi limits: cpu: 0.5 memory: 200Mi [root@server2 ~]# kubectl apply -f pod.yml [root@server2 ~]# kubectl get pod -o wide ##All run on server4 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-b49457b9-7h2q5 1/1 Running 0 3s 172.25.13.4 server4 <none> <none> nginx-b49457b9-b6bbl 1/1 Running 0 3s 172.25.13.4 server4 <none> <none> [root@server2 ~]# kubectl uncordon server3 ##Release and stop scheduling [root@server2 ~]# kubectl uncordon server4
2.5.2 drain
- drain expulsion node:
First expel the pod on the node, recreate it on other nodes, and then set the node to schedulendisabled.
$kubectl drain server3 ## evict nodes
node/server3 cordoned
evicting pod "web-1"
evicting pod "coredns-9d85f5447-mgg2k"
pod/coredns-9d85f5447-mgg2k evicted
pod/web-1 evicted
node/server3 evicted
$kubectl uncordon server3 ## disarm
[root@server2 ~]# kubectl drain server4 --ignore-daemonsets ## [root@server2 ~]# kubectl get node server4 Ready,SchedulingDisabled <none> 10d v1.20.2 [root@server2 ~]# kubectl apply -f pod.yml [root@server2 ~]# kubectl get pod -o wide [root@server2 ~]# kubectl uncordon server4 ##delete
2.5.3 delete
- Delete delete node
The most violent one is to expel the pod on the node and recreate it on other nodes. Then, delete the node from the master node, and the master loses control over it. To restore scheduling, you need to enter the node node and restart the kubelet service
$ kubectl delete node server3
$systemctl restart kubelet / / the node based self registration function is restored
[root@server2 ~]# kubectl delete nodes server3 ##Delete node server3 [root@server2 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION server2 Ready control-plane,master 10d v1.20.2 server4 Ready <none> 10d v1.20.2 [root@server3 ~]# systemctl restart kubelet.service ##Restart the kubelet service on the deleted node [root@server2 ~]# kubectl get node ##server3 node recovery NAME STATUS ROLES AGE VERSION server2 Ready control-plane,master 10d v1.20.2 server3 Ready <none> 11s v1.20.2 server4 Ready <none> 10d v1.20.2