Multi tenancy in Kubernetes

multi-tenancy

Multi tenant clusters are shared by multiple users and / or workloads, which are called "tenants". The operator of multi tenant cluster must isolate tenants from each other to minimize the damage that stolen tenants or malicious tenants may cause to the cluster and other tenants. In addition, cluster resources must be allocated fairly among tenants.

When planning a multi tenant architecture, you should consider the resource isolation layer in Kubernetes: cluster, namespace, node, Pod and container. The security risks of sharing different types of resources among tenants should also be considered. For example, scheduling pods from different tenants on the same node can reduce the number of machines required in the cluster. On the other hand, you may need to prevent some workload co location. For example, untrusted code from outside the organization may not be allowed to run on the same node as containers that process sensitive information.

Although Kubernetes cannot guarantee complete and safe isolation between tenants, it provides sufficient relevant functions for specific use scenarios. Each tenant and its Kubernetes resources can be separated into their own namespace. Then, you can use some Restriction strategy To enforce tenant isolation. Now, policies are usually divided by namespace and can be used to restrict API access, resource use, and operations allowed to be performed by containers.

Tenants in a multi tenant cluster share the following resources:

Extensions, controllers, plug-ins, and custom resource definitions (CRDs).
Cluster control plane. This means that cluster operations, security, and auditing are centrally managed.

Compared with operating multiple single tenant clusters, operating a multi tenant cluster has several advantages:

Reduce management overhead
Reduce resource fragmentation
New tenants do not have to wait for the cluster to be created

Multi tenant use case

Enterprise multi tenant

In an enterprise environment, the tenants of a cluster are different teams within the organization. Generally, each tenant corresponds to a namespace. The network traffic in the same namespace is not limited, but the network traffic between different namespaces must be explicitly listed in the white list. Kubernetes network policy can be used to achieve these isolation.

Cluster users are divided into three different roles according to their permissions:

Cluster Administrator: this role is applicable to the administrator who manages all tenants in the whole cluster. Cluster administrators can create, read, update, and delete any policy object. They can create namespaces and assign them to namespace administrators.
Namespace administrator: this role is applicable to the administrator of a specific single tenant. The namespace administrator can manage the users in its namespace.
Developer: members of this role can create, read, update and delete non policy objects in the namespace, such as Pod, Job and progress. Developers have these permissions only in namespaces they have access to.

SaaS provider multi tenant

The tenants of the SaaS provider cluster are each customer specific instance of the application and the SaaS control plane. To make full use of the policy of dividing by namespace, each application instance should be arranged into its own namespace, as should the components of SaaS control plane. Instead of directly interacting with the Kubernetes control plane, the end user uses the SaaS interface, which interacts with the Kubernetes control plane.

For example, the blog platform can run on a multi tenant cluster. In this case, the tenant is each customer's blog instance and the platform's own control plane. The control plane of the platform and each managed blog will run in a different namespace. Customers will create and delete blogs and update the blog software version through the platform interface, but they cannot understand the operation mode of the cluster.

Multi tenant policy

Namespace

Multi tenants in Kubernetes are divided according to namespace. Namespace is a group of logical clusters, which can be roughly similar to the concept of tenant, and can achieve a certain degree of resource isolation and Quota.

As shown in the following two commands:

$ kubectl run nginx -image=nginx
$ kubectl run nginx -image=nginx -namespace=dev

Although these two commands are run as an nginx, the scope is different. The first command is in the Default namespace, and the second command is to run nginx in a namespace called dev.

Access control

For multi tenants, access control is very important. We can use the built-in RBAC of Kubernetes for access control, which can grant detailed permissions to specific resources and operations in the cluster.

Network strategy

Through the cluster network policy, we can control the communication between the pods of the cluster. The policy can specify which namespaces, labels and IP address ranges the Pod can communicate with.

Resource quota

Resource quota is used to manage the amount of resources used by objects in the namespace. We can set the quota according to the amount of CPU and memory or the number of objects. Resource quotas ensure that tenants do not use cluster resources that exceed their allocated share.

The resource quota is defined through the ResourceQuota resource object, which can limit the total resource consumption of each namespace. It can limit the number of objects that can be created in the namespace by type, or limit the total amount of computing resources that can be consumed by the project as resources.

Resource quotas work as follows:

1. The administrator creates one or more resource quota objects for each namespace

2. Users create resources (pods, services, etc.) in the namespace, and the quota system will track the usage to ensure that it does not exceed the hard resource limit defined in the resource quota

3. If the creation or update of the resource violates the quota constraint, the request will fail and return the HTTP status code 403 FORBIDDEN and information indicating the violation of the quota constraint

4. If the quota of computing resources (such as cpu and memory) in namespace is enabled, the user must set the request value and limit value for these resources, otherwise the quota system will reject the creation of Pod.

There are three levels of resource quota control in Kubernetes:

1. Container: you can limit CPU and Memory

2. Pod: you can limit the resources of all containers in a Pod

3. Namespace: limits the resources under a namespace

The container level mainly uses the support of the container itself, such as Docker's support for CPU, memory, etc;

In terms of Pod, you can limit the resource range of creating Pod in the system, such as the maximum or minimum CPU and memory requirements;

The Namespace level is the resource quota at the user level, including CPU and memory, and can also limit the number of pods, RC and services.

To use resource quota, you need to ensure that the -- enable admission plugins = parameter of apiserver contains ResourceQuota. When there is a ResourceQuota object in the namespace, the namespace will begin to implement resource quota management. In addition, it should be noted that there should be at most one ResourceQuota object in a namespace.

The quota control resources supported by the resource quota controller mainly include: calculation resource quota, storage resource quota, object quantity resource quota and quota scope.

Calculate resource quota

Users can limit the total amount of computing resources in a given namespace. The supported resource types are as follows:

Resource name	describe
CPU	The total CPU demand of all non terminated pods cannot exceed this value
limits.cpu	The total CPU quota of all non terminated pods cannot exceed this value
limits.memory	The total memory quota of all non terminated pods cannot exceed this value
memory	The total memory requirements of all non terminated pods cannot exceed this value
request.cpu	The total CPU demand of all non terminated pods cannot exceed this value
request.memory	The total memory requirements of all non terminated pods cannot exceed this value

For example, let's create memory and CPU quotas for a namespace. First, create a namespace for testing:

$ kubectl create namespace dev

Then define a resource quota resource object as follows: (quota MEM CPU. Yaml)

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
  namespace: dev
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi

After the creation, we view the ResourceQuota object:

$ kubectl describe quota mem-cpu-demo -n dev
Name:            mem-cpu-demo
Namespace:       dev
Resource         Used  Hard
--------         ----  ----
limits.cpu       0     2
limits.memory    0     2Gi
requests.cpu     0     1
requests.memory  0     1Gi

Now let's create a Pod as follows:

apiVersion: v1
kind: Pod
metadata:
  name: quota-mem-cpu-demo
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "800Mi"
        cpu: "800m" 
      requests:
        memory: "600Mi"
        cpu: "400m"

Create the above Pod and check the running status:

$ kubectl get pod -n dev
NAME                 READY   STATUS    RESTARTS   AGE
quota-mem-cpu-demo   1/1     Running   0          78s

You can see that the Pod has been running. At this time, let's check the resource quota object we defined again:

$ kubectl describe quota mem-cpu-demo -n dev
Name:            mem-cpu-demo
Namespace:       dev
Resource         Used   Hard
--------         ----   ----
limits.cpu       800m   2
limits.memory    800Mi  2Gi
requests.cpu     400m   1
requests.memory  600Mi  1Gi

We can see that we have been clearly told how many computing resources have been used. For example, there are only 400Mi (1Gi-600Mi) resources left in the memory request value. Now let's create a resource test with more than 400Mi requested memory:

apiVersion: v1
kind: Pod
metadata:
  name: quota-mem-cpu-demo-2
  namespace: quota-mem-cpu-example
spec:
  containers:
  - name: quota-mem-cpu-demo-2-ctr
    image: redis
    resources:
      limits:
        memory: "1Gi"
        cpu: "800m"      
      requests:
        memory: "700Mi"
        cpu: "400m"

We created this Pod and found it unsuccessful:

$ kubectl apply -f quota_nginx2.yaml 
Error from server (Forbidden): error when creating "quota_nginx2.yaml": pods "quota-mem-cpu-demo-2" is forbidden: exceeded quota: mem-cpu-demo, requested: requests.memory=700Mi, used: requests.memory=600Mi, limited: requests.memory=1Gi

You can see that it has been rejected because requests Memory has exceeded our resource quota.

From the above case, we can use ResourceQuota to limit the total number of CPU and memory resource quotas of all containers running in the namespace. If we limit a single container instead of the total number of all containers, we need to use the LimitRange resource object. In addition, if the quota of resources (such as CPU and memory) calculated under a namespace is enabled, the user must set the request value (request) and constraint value (limit) for these resources, otherwise the quota system will reject the creation of Pod unless we configure the LimitRange resource object.

To use LimitRange, you also need to turn on LimitRanger in the -- enable admission plugins = parameter. For example, now let's configure the minimum and maximum memory limits of containers in a namespace. We create a namespace for configuration:

$ kubectl create namespace mem-example

Then create a configuration resource object of LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: mem-min-max-demo-lr
  namespace: constraints-mem-example
spec:
  limits:
  - max:
      memory: 1Gi
    min:
      memory: 500Mi
    type: Container

After creation, we can view the details:

$ kubectl get limitrange mem-min-max-demo-lr --namespace=mem-example -o yaml
......
spec:
  limits:
  - default:
      memory: 1Gi
    defaultRequest:
      memory: 1Gi
    max:
      memory: 1Gi
    min:
      memory: 500Mi
    type: Container

The above output shows the minimum and maximum memory constraints, but note that they will be created automatically even if we do not specify a default value. Now, whenever you create a container in the MEM example namespace, Kubernetes will perform the following steps:

If the Container does not specify its own memory request and limit, it will be specified with the default memory request and limit
Verify that the memory request for the Container is greater than or equal to 500 MiB
Verify that the memory limit of the Container is less than or equal to 1 GiB

Let's create a Pod here, where the container declares a memory request of 600 MiB and a memory limit of 800 MiB, which meet the minimum and maximum memory constraints of the LimitRange:

apiVersion: v1
kind: Pod
metadata:
  name: mem-demo
  namespace: mem-example
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "800Mi"
      requests:
        memory: "600Mi"

Then directly create and view the status:

$ kubectl get pod  -n mem-example
NAME       READY   STATUS    RESTARTS   AGE
mem-demo   1/1     Running   0          38s

It can be seen that it can work normally.

Then we create a Pod test that exceeds the maximum memory limit:

apiVersion: v1
kind: Pod
metadata:
  name: mem-demo-2
  namespace: mem-example
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "1.5Gi"
      requests:
        memory: "800Mi"

Now we create and find it unsuccessful:

Error from server (Forbidden): error when creating "memdemo2.yaml": pods "mem-demo-2" is forbidden: maximum memory usage per Container is 1Gi, but limit is 1536Mi

The output shows that the Pod was not created successfully because the memory limit declared by the container is too large. We can also try to create a Pod that is less than the minimum memory limit or a Pod that does not declare memory requests and limits.

storage quota

Users can limit the total amount of storage resources in a given namespace. In addition, they can limit the consumption of storage resources according to the relevant Storage Class.

Resource name	describe
requests.storage	In all PVC, the demand for storage resources cannot exceed this value
persistentvolumeclaims	Total amount of PVC allowed in namespace
.storageclass.storage.k8s.io/requests.storage	In all PVC related to this storage class name, the demand for storage resources cannot exceed this value
.storageclass.storage.k8s.io/persistentvolumeclaims	The total amount of PVC associated with this sorage CALS name allowed in the namespace

Object quantity configuration

Resource name	describe
configmaps	Number of configmap s allowed in namespace
persistentvolumeclaims	Number of PVC allowed in namespace
pods	The number of allowed non terminating pods in the namespace, if the status of the Pod If phase is Failed or Succeeded, it is in the terminated state.
replicationcontrollers	Number of replication controllers allowed in namespace
resourcequotas	Number of resource quotas allowed in namespace
services	Number of service s allowed in namespace
services.loadbalancers	The number of load balancer type service s allowed in the namespace
servicesnodeports	Number of node port type service s allowed in namespace
secrets	Number of Secrets allowed in namespace

Qos quality of service

When Kubernetes creates a pod, it assigns one of the following QoS classes to the Pod:

1,Guaranteed

2,Burstable

3,BestEffort

Guaranteed

To provide Guaranteed QoS classes for Pod, the following requirements must be met:

1. Each Container in the Pod must have a memory limit and a memory request.

2. For each Container in the Pod, the memory limit must be equal to the memory request.

3. Each Container in the Pod must have a CPU limit and a CPU request.

4. For each Container in the Pod, the CPU limit must be equal to the CPU request.

These restrictions also apply to the init container and app container.

Let's create a pod. The Container has a memory limit and a memory request, both of which are equal to 200 MiB. The Container has a CPU limit and a CPU request, both equal to 700 millicpus:

apiVersion: v1
kind: Namespace
metadata:
  name: qos-example

---
apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-ctr
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"

After creating a pod, view the details of the Pod:

$ kubectl get pod qos-demo -n qos-example -oyaml
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: qos-demo-ctr
    resources:
      limits:
        cpu: 700m
        memory: 200Mi
      requests:
        cpu: 700m
        memory: 200Mi
    ...
status:
  qosClass: Guaranteed

The output shows that Kubernetes provides Guaranteed QoS class for Pod. The output also verifies that the Pod container has a memory request that matches its memory limit, and that it has a CPU request that matches its CPU limit.

be careful:

If the Container specifies its own memory limit but does not specify a memory request, Kubernetes will automatically allocate a memory request that matches the limit. Similarly, if a Container specifies its own CPU limit but does not specify a CPU request, Kubernetes will automatically allocate a CPU request matching the limit.

Burstable

Pod will be assigned to bursable QoS class in the following cases:

1. Pod does not meet the standard of QoS level Guaranteed.

2. At least one Container in the Pod has a memory or CPU request.

This is the configuration file for a Pod with one Container. The memory limit of the Container is 200 MiB and the memory request is 100 MiB.

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-2
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-2-ctr
    image: nginx
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"

Similarly, after creating a pod, view the details of the Pod:

$  kubectl get pod -n qos-example qos-demo-2 -oyaml
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: qos-demo-2-ctr
    resources:
      limits:
        memory: 200Mi
      requests:
        memory: 100Mi
	...
status:
  qosClass: Burstable

The output shows that Kubernetes provides a bursable QoS class for Pod.

Create a Pod with two containers

This is the configuration file for a Pod with two containers. A Container specifies a memory request of 200 MiB. The other Container did not specify any requests or restrictions.

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-4
  namespace: qos-example
spec:
  containers:

  - name: qos-demo-4-ctr-1
    image: nginx
    resources:
      requests:
        memory: "200Mi"

  - name: qos-demo-4-ctr-2
    image: redis

This Pod complies with the standard of QoS class Burstable. In other words, it does not meet the standard of QoS class Guaranteed, and one of its containers has memory requests.

View the details of Pod after creation:

$ kubectl get pod qos-demo-4 -n qos-example -oyaml
spec:
  containers:
    ...
    name: qos-demo-4-ctr-1
    resources:
      requests:
        memory: 200Mi
    ...
    name: qos-demo-4-ctr-2
    resources: {}
    ...
status:
  qosClass: Burstable

BestEffort

To provide the QoS class of BestEffort for the Pod, the container in the Pod must not have any memory or CPU restrictions or requests.

This is the configuration file for a Pod with one container. Container has no memory or CPU limit or request:

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-3
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-3-ctr
    image: nginx

Similarly, after creating a Pod, view the details:

$ kubectl get pod qos-demo-3 -n qos-example -oyaml
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: qos-demo-3-ctr
    resources: {}
	...
status:
  qosClass: BestEffort

The output shows that Kubernetes provides the best effort QoS class for Pod.

Pod security policy

Pod security policy is a cluster level resource that controls the security sensitive aspects of pod specification. PodSecurityPolicy Object defines a set of conditions that Pod must meet before it can be accepted by the system, as well as the default values of related fields. They allow administrators to control the following:

Control aspect	Field name
Run privilege container	privileged
Use of host namespace	hostPID, hostIPC
Use of host networks and ports	hostNetwork, hostPorts
Use of volume types	volumes
Use of host file system	allowedHostPaths
Allow specific FlexVolume drivers	allowedFlexVolumes
Assign the FSGroup that owns the pod volume	fsGroup
Read only root file system is required	readOnlyRootFilesystem
User and group ID of the container	runAsUser, runAsGroup,supplementalGroups
Restrict upgrade to root	allowPrivilegeEscalation, defaultAllowPrivilegeEscalation
Linux features	defaultAddCapabilities, requiredDropCapabilities,allowedCapabilities
SELinux context of the container	seLinux
Allowed Proc Mount type of container	allowedProcMountTypes
The AppArmor configuration file used by the container	notes
sysctl configuration file used by the container	forbiddenSysctls,allowedUnsafeSysctls
seccomp configuration file used by the container	notes

Note: PodSecurityPolicy is from kubernetes v1 21 have been deprecated and will be in V1 25.

Enable Pod security policy

The Pod security policy is implemented as an optional Admission controller. Access controller enabled The Pod security policy can be enforced. However, if the admission controller is enabled before the authorization and approval policy, no Pod can be created in the cluster.

Since the Pod security policy API (policy/v1beta1/podsecuritypolicy) is enabled independently of the admission controller, for existing clusters, it is recommended to add policies and authorize them before enabling the admission controller.

Authorization policy

When the PodSecurityPolicy resource is created, no operation is performed. In order to use this resource, the user who made the request or the target Pod needs to be updated Service account Authorization, allowing it to use the policy by allowing it to execute the use verb on the policy.

Most kubernetes pods are not created directly by users. Instead, these pods are created by Deployment, ReplicaSet, or controllers templated via the controller manager. Giving the controller access to the policy means that all pods created by the corresponding controller can access the policy. Therefore, the priority scheme for authorizing the policy is to grant access rights to the service account of the Pod.

Authorized by RBAC

First, a Role or ClusterRole needs permission to access the target policy using use. The rules for access authorization look like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: <Role name>
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - <List of policies to authorize>

Next, bind the Role (or ClusterRole) to the authorized user:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: <binding name>
roleRef:
  kind: ClusterRole
  name: <role name>
  apiGroup: rbac.authorization.k8s.io
subjects:
# All service accounts under the authorization namespace (recommended):
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts:<authorized namespace>
# Authorize a specific service account (this is not recommended):
- kind: ServiceAccount
  name: <authorized service account name>
  namespace: <authorized pod namespace>
# Authorize specific users (not recommended):
- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: <authorized user name>

If you are using RoleBinding (not ClusterRoleBinding), authorization is limited to Pods in the same namespace as the RoleBinding. Consider combining this authorization mode with the system group to grant access to all Pods in the namespace.

# Authorize all service accounts in a namespace
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts
# Or equivalent, authorize all authenticated users in a namespace
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:authenticated

Policy order

In addition to restricting Pod creation and update, Pod security policy can also be used to set default values for many fields it controls. When there are multiple policy objects, the Pod security policy controller selects the policy according to the following conditions:

Priority is given to allowing the Pod to remain as is without changing the default value of the Pod field or other configured PodSecurityPolicy. The order between such non changing PodSecurityPolicy objects is irrelevant.
If you must set a default value or other configuration for Pod, select the first PodSecurityPolicy object that allows Pod operation (in name order).

Note: during the update operation (at this time, it is not allowed to change the Pod protocol), only the PodSecurityPolicy of non change nature is used to verify the Pod.

For details, please refer to Official documents

Pod anti affinity

Note: malicious tenants can circumvent Pod anti affinity rules. The following example should only be used for clusters with trusted tenants or clusters where tenants cannot directly access the Kubernetes control plane. We can use Pod anti affinity to prevent pods of different tenants from being scheduled to the same node. For example, the following Pod specification describes a Pod labeled team: billing and anti affinity rules that prevent the Pod from scheduling with a Pod without the tag.

apiVersion: v1
kind: Pod
metadata:
  name: bar
  labels:
    team: "billing"
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # Hard strategy
      - topologyKey: "kubernetes.io/hostname"
        labelSelector:
          matchExpressions:
          - key: "team"
            operator: NotIn
            values: ["billing"]

The above resource list means that the Pod bar cannot be scheduled to the node where the Pod without a team: billing tag is located. However, the disadvantage of this approach is that malicious users can circumvent the rules by adding a team: billing tag to any Pod, The Pod anti affinity mechanism alone is not sufficient to securely enforce policies on clusters with untrusted tenants.

Stain and tolerance

Note: malicious tenants can circumvent policies enforced by node taint and tolerance mechanisms. The following example should only be used for clusters with trusted tenants or clusters where tenants cannot directly access the Kubernetes control plane.

Node stain is another way to control workload scheduling. Node stain can be used to leave private nodes for some tenants. For example, GPU equipped nodes can be reserved for specific tenants whose workloads require GPUs. To leave a node pool exclusively for a tenant, apply the stain with effect: "NoSchedule" to the node pool. Then, only pods with corresponding tolerance settings can be scheduled to nodes in the node pool.

The disadvantage of this approach is that malicious users can access the private node pool by adding corresponding tolerance settings to their Pod, so only using node stain and tolerance mechanism is not enough to safely enforce policies on clusters with untrusted tenants.

If a node is marked as Taints, the Taints node will not be scheduled pod unless the pod is also identified as a stain tolerant node.

For example, if the user wants to reserve the Master node for Kubernetes system components, or reserve a group of special resources for some pods, the stain is very useful. The Pod will no longer be scheduled to the node marked by taint. The cluster we built with kubedm adds a stain mark to the Master node by default, so the ordinary pods we created are not scheduled to the Master:

$ kubectl describe node master
Name:               master
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
......
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
.....

We can use the above command to view the information of the master node, including a piece of information about taints: node role kubernetes. IO / Master: NoSchedule, which means that the master node is marked with a stain. The affected parameter is NoSchedule, which means that Pod will not be scheduled to the node marked with tails. In addition to NoSchedule, there are two other options:

PreferNoSchedule: the soft policy version of NoSchedule, which means that it is not scheduled to the stain node as far as possible
NoExecute: this option means that once Taint takes effect, if the running Pod in this node does not have a corresponding tolerance setting, it will be evicted directly

The commands for the taint tag node are as follows:

$ kubectl taint nodes node1 test=node1:NoSchedule
node/node1 tainted

The above naming marks node1 node as a stain. The influence strategy is NoSchedule, which will only affect the new Pod scheduling. If you still want to schedule a Pod to taint node, you must make a tolerance definition in Spec before scheduling to this node. For example, now we want to schedule a Pod to master node: (taint-demo.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint
  labels:
    app: taint
spec:
  selector:
    matchLabels:
      app: taint
  replicas: 2
  template:
    metadata:
      labels:
        app: taint
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"

Since the master node is marked as a tainted node, we need to add a tolerance statement here to enable Pod scheduling to the master node:

tolerations:
- key: "node-role.kubernetes.io/master"
  operator: "Exists"
  effect: "NoSchedule"

Then create the above resources and view the results:

kubectl get pod -o wide
NAME                     READY   STATUS              RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
busybox                  1/1     Running             0          2d    10.244.2.3   node2    <none>           <none>
taint-5779c44f78-lq9gg   0/1     ContainerCreating   0          27s   <none>       master   <none>           <none>
taint-5779c44f78-m46lm   0/1     ContainerCreating   0          27s   <none>       master   <none>           <none>
test-np                  1/1     Running             0          2d    10.244.1.6   node1    <none>           <none>

We can see that two Pod replicas are scheduled to the master node, which is the tolerant use method.

For the writing method of the descriptions attribute, the key, value and effect must be consistent with the Taint settings of the node. There are also the following explanations:

If the value of operator is Exists, the value attribute can be omitted
If the value of the operator is Equal, the relationship between its key and value is Equal (Equal to)
If the operator attribute is not specified, the default value is Equal

In addition, there are two special values:

If an empty key is combined with Exists, it can match all keys and value s, that is, it can tolerate all Taints of all nodes
Empty effects match all effects

Finally, if we want to unmark the node, we can use the following command:

$ kubectl taint nodes node1 test-
node/node1 untainted

reference material

https://www.qikqiak.com/k8strain/tenant/#pod_1

https://kubernetes.io/zh/docs/home/

Keywords: Kubernetes

Added by jgp4 on Sun, 23 Jan 2022 13:36:26 +0200

Programming VIP