Docker--Docker k8s--Kubernetes storage -- kubernetes scheduling

catalogue

1. Introduction

2. Factors affecting kubernetes scheduling

2.1 nodeName

2.2 nodeSelector

2.3 affinity and anti affinity

2.3.1 node affinity

2.3.2 pod affinity

2.4 taints

2.5 instructions affecting pod scheduling

2.5.1 cordon

2.5.2 drain

2.5.3 delete

1. Introduction

  • The scheduler uses kubernetes' watch mechanism to discover newly created pods in the cluster that have not yet been scheduled to nodes. The scheduler will schedule each unscheduled Pod found to run on a suitable Node.

  • Kube scheduler is the default scheduler for Kubernetes clusters and is part of the cluster control surface. If you really want or need this, Kube scheduler is designed to allow you to write a scheduling component and replace the original Kube scheduler.

  • Factors to be considered when making scheduling decisions include: individual and overall resource requests, hardware / software / policy constraints, affinity and anti affinity requirements (more use), data locality, interference between loads, etc.
    The default policy can be referenced
    Scheduling framework

2. Factors affecting kubernetes scheduling

2.1 nodeName

  • NodeName is the simplest method for node selection constraints, but it is generally not recommended. If nodeName is specified in PodSpec, it takes precedence over other node selection methods.

  • Use nodeName to select some restrictions of nodes: (all will report errors)
    If the specified node does not exist.
    If the specified node has no resources to accommodate the pod, the pod scheduling fails.
    Node names in a cloud environment are not always predictable or stable.

[root@server2 ~]# vim pod1.yml 
[root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
  nodeName: server3   ##Specify server3


[root@server2 ~]# kubectl apply -f pod1.yml 
pod/nginx created
[root@server2 ~]# kubectl get pod -o wide       ##View details 

2.2 nodeSelector

  • nodeSelector is the simplest recommended form of node selection constraint. (where is the priority of label scheduling? Where will it be next time). If two hosts have labels at the same time, but one has insufficient resources, it will be scheduled to the other host.

  • Label the selected node:
    kubectl label nodes server2 disktype=ssd

[root@server2 ~]# kubectl label nodes server4 disktype=ssd   ##Add label
[root@server2 ~]# kubectl get node --show-labels 

[root@server2 ~]# vim pod1.yml 
[root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  nodeSelector:
    disktype: ssd

2.3 affinity and anti affinity

If node and pod exist at the same time and conflict occurs, an error will be reported!

2.3.1 node affinity

  • Affinity and anti affinity. nodeSelector provides a very simple way to constrain a pod to a node with a specific label. The affinity / anti affinity function greatly expands the types of constraints you can express.
    You can find that the rule is "soft" / "preference", not a hard requirement. Therefore, if the scheduler cannot meet the requirement, it still schedules the pod
    You can use the label of the pod on the node to constrain, rather than the label of the node itself, to allow which pods can or cannot be placed together.

Node affinity (only during scheduling)
requiredDuringSchedulingIgnoredDuringExecution must meet
preferredDuringSchedulingIgnoredDuringExecution tends to satisfy

  • IgnoreDuringExecution means that if the label of the Node changes during the operation of the Pod, resulting in the affinity policy can not be met, the current Pod will continue to run.

    Official website

  • nodeaffinity also supports the configuration of a variety of rule matching criteria, such as
    The value of In: label is In the list
    NotIn: the value of label is not in the list
    Gt: the value of label is greater than the set value, and Pod affinity is not supported
    Lt: the value of label is less than the set value, and pod affinity is not supported
    Exists: the label set does not exist
    DoesNotExist: the label set does not exist

[root@server2 ~]# kubectl label nodes server3 disktype=ssd  ##Add the same label as server4
[root@server2 ~]# kubectl label nodes server4 role=db   ##Adding role s to server4

 [root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: disktype
               operator: In
               values:
                 - ssd
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: role
            operator: In
            values:
            - db 

[root@server2 ~]# kubectl apply -f pod1.yml 
[root@server2 ~]# kubectl get pod -o wide 

2.3.2 pod affinity

  • pod affinity and anti affinity
    podAffinity mainly solves the problem of which pods can be deployed in the same topology domain (the topology domain is implemented with the host label, which can be a single host or a cluster, zone, etc. composed of multiple hosts.)
    podAntiAffinity mainly solves the problem that pods cannot be deployed in the same topology domain with which pods. They deal with the relationship between POD and POD within the Kubernetes cluster.
    Inter Pod affinity and anti affinity may be more useful when used with higher-level collections (such as ReplicaSets, stateful sets, Deployments, etc.). You can easily configure a set of workloads that should be in the same defined topology (for example, nodes).
    Affinity and anti affinity between pods require a lot of processing, which may significantly slow down scheduling in large-scale clusters.
Example: affinity
[root@server2 ~]# kubectl run demo --image=busyboxplus -it   ##Run a pod
[root@server2 ~]# kubectl  get pod -o wide    ##See which server host

[root@server2 ~]# vim pod2.yaml 
[root@server2 ~]# cat pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: run
            operator: In
            values:
            - demo
        topologyKey: kubernetes.io/hostname


[root@server2 ~]# kubectl apply -f pod2.yaml 
[root@server2 ~]# kubectl get pod -o wide    ##All at one node	
Example: anti affinity
[root@server2 ~]# vim pod2.yaml 
[root@server2 ~]# cat pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    podAntiAffinity:    ##Anti affinity only needs to be modified
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: run
            operator: In
            values:
            - demo
        topologyKey: kubernetes.io/hostname

[root@server2 ~]# kubectl apply  -f pod2.yaml  
[root@server2 ~]# kubectl get pod -o wide   ##See if there is no longer a node

2.4 taints

  • NodeAffinity Node affinity is an attribute defined on the Pod, which enables the Pod to be scheduled to a Node according to our requirements. On the contrary, Taints can make the Node refuse to run the Pod or even expel the Pod.

  • Taints is an attribute of a Node. After setting taints, Kubernetes will not schedule the Pod to this Node, so Kubernetes sets an attribute tolerance for the Pod. As long as the Pod can tolerate the stain on the Node, Kubernetes will ignore the stain on the Node and can (not necessarily) schedule the Pod.

  • You can use the command kubectl taint to add a taint to the node:
    $kubectl taint nodes node1 key=value:NoSchedule / / create
    $kubectl describe nodes server1 |grep Taints / / query
    $kubectl taint nodes node1 key:NoSchedule - / / delete
    Where [effect] can take the value: [NoSchedule | PreferNoSchedule | NoExecute]
    NoSchedule: POD will not be scheduled to nodes marked tails.
    PreferNoSchedule: the soft policy version of NoSchedule.
    NoExecute: this option means that once Taint takes effect, if the running POD in this node does not have a corresponding tolerance setting, it will be evicted directly.

  • The key, value, and effect defined in the tolerances should always be consistent with the taint set on the node:
    If operator is Exists, value can be omitted.
    If the operator is Equal, the relationship between key and value must be Equal.
    If the operator attribute is not specified, the default value is Equal.
    There are also two special values:
    When no key is specified, all keys and value s can be matched with Exists, and all stains can be tolerated.
    When no effect is specified, all effects are matched.

Example: a stain and add a tolerance label
[root@server2 ~]# kubectl describe nodes server2 | grep Taints   ##There is a stain on the master
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@server2 ~]# kubectl describe nodes server3 | grep Taints
Taints:             <none>
[root@server2 ~]# kubectl describe nodes server4 | grep Taints


[root@server2 ~]# kubectl taint node server3 key1=v1:NoExecute  ##Generate a stain for server3
[root@server2 ~]# vim pod.yml 
[root@server2 ~]# cat pod.yml    ##Set tolerance label tolerances:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 512Mi
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "v1"
        effect: "NoExecute"
[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide    ##All on server3 because of the calico network plug-in
Example: two special values
##Both server3 and server4 are tainted
[root@server2 ~]# kubectl describe nodes server3 | grep Taints
Taints:             key1=v1:NoExecute
[root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule
node/server4 tainted
[root@server2 ~]# kubectl describe nodes server4 | grep Taints
Taints:             key2=v2:NoSchedule

[root@server2 ~]# vim pod.yml 
[root@server2 ~]# cat pod.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 200Mi
      tolerations:
      - operator: "Exists"

[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod
[root@server2 ~]# kubectl get pod -o wide   ##See if both hosts can run

2.5 instructions affecting pod scheduling

  • Instructions affecting pod scheduling include cordon, drain and delete. Later created pods will not be scheduled to this node, but the degree of violence is different.

2.5.1 cordon

  • cordon stop scheduling:
    The impact is minimal. Only the node will be set to scheduling disabled. The newly created pod will not be scheduled to the node. The original pod of the node will not be affected and will still provide services to the outside world normally.
    $ kubectl cordon server3
    $ kubectl get node
    NAME STATUS ROLES AGE VERSION
    server1 Ready 29m v1.17.2
    server2 Ready 12d v1.17.2
    server3 Ready,SchedulingDisabled 9d v1.17.2
    $kubectl uncordon server3 / / recovery
[root@server2 ~]#  kubectl taint node server3 key1=v1:NoExecute-   ##Remove stains
[root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule-   ##Remove stains
[root@server2 ~]# kubectl cordon server3    ##Stop scheduling server3
[root@server2 ~]# kubectl get node      ##Check the node

[root@server2 ~]# cat pod.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 200Mi

[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide    ##All run on server4
NAME                   READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
nginx-b49457b9-7h2q5   1/1     Running   0          3s    172.25.13.4   server4   <none>           <none>
nginx-b49457b9-b6bbl   1/1     Running   0          3s    172.25.13.4   server4   <none>           <none>

[root@server2 ~]# kubectl uncordon server3   ##Release and stop scheduling
[root@server2 ~]# kubectl uncordon server4

2.5.2 drain

  • drain expulsion node:
    First expel the pod on the node, recreate it on other nodes, and then set the node to schedulendisabled.
    $kubectl drain server3 ## evict nodes
    node/server3 cordoned
    evicting pod "web-1"
    evicting pod "coredns-9d85f5447-mgg2k"
    pod/coredns-9d85f5447-mgg2k evicted
    pod/web-1 evicted
    node/server3 evicted
    $kubectl uncordon server3 ## disarm
[root@server2 ~]# kubectl drain server4 --ignore-daemonsets   ##
[root@server2 ~]# kubectl get node 
server4   Ready,SchedulingDisabled   <none>                 10d   v1.20.2
[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide 


[root@server2 ~]# kubectl uncordon server4     ##delete

2.5.3 delete

  • Delete delete node
    The most violent one is to expel the pod on the node and recreate it on other nodes. Then, delete the node from the master node, and the master loses control over it. To restore scheduling, you need to enter the node node and restart the kubelet service
    $ kubectl delete node server3
    $systemctl restart kubelet / / the node based self registration function is restored
[root@server2 ~]# kubectl delete nodes server3     ##Delete node server3
[root@server2 ~]# kubectl get nodes 
NAME      STATUS   ROLES                  AGE   VERSION
server2   Ready    control-plane,master   10d   v1.20.2
server4   Ready    <none>                 10d   v1.20.2

[root@server3 ~]# systemctl restart kubelet.service   ##Restart the kubelet service on the deleted node

[root@server2 ~]# kubectl get node    ##server3 node recovery
NAME      STATUS   ROLES                  AGE   VERSION
server2   Ready    control-plane,master   10d   v1.20.2
server3   Ready    <none>                 11s   v1.20.2
server4   Ready    <none>                 10d   v1.20.2

Keywords: Docker Kubernetes Container

Added by Scooby Doo on Wed, 26 Jan 2022 16:32:28 +0200