Metrics server and HPA in kubernetes

1, K8S monitoring component metrics server

Installation steps

1.Add open source warehouse
[root@k8s-master ~]# helm repo add kaiyuanshe

#Search metrics server
[root@k8s-master ~]# helm search repo metrics-server metrics-server package
helm pull kaiyuanshe/metrics-server

tar -xf metrics-server-2.11.4.tgz

4.modify values.yaml file
cd metrics-server

vim values.yaml

# Replace with the image source and version number below
  tag: v0.4.1

args: []	# Delete the parentheses and write the following contents on a new line. After testing, it doesn't matter whether the following is indented or not. If the ports conflict, modify them below.
  - --cert-dir=/tmp
  - --secure-port=6443
  - --metric-resolution=30s
  - --kubelet-insecure-tls
  - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,externalDNS
  - --requestheader-username-headers=X-Remote-User
  - --requestheader-group-headers=X-Remote-Group
  - --requestheader-extra-headers-prefix=X-Remote-Extra-
  # If the port is occupied and the port is changed, the port needs to be changed in three places of two files (Note: the following ports must be modified at the same time)
[root@k8s-m-01 metrics-server]# grep -R '8443' ./
./templates/metrics-server-deployment.yaml:            - --secure-port=6443
./templates/metrics-server-deployment.yaml:          - containerPort: 6443
./values.yaml:  - --secure-port=6443

5.Create user
[root@k8s-master metrics-server]# kubectl create clusterrolebinding system:anonymous  --clusterrole=cluster-admin  --user=system:anonymous
6.install metrics-server 
[root@k8s-master metrics-server]# helm install metrics-server ./

7.see metrics-server of pod Did the service run
[root@k8s-master metrics-server]# kubectl get pod
metrics-server-675ccccb46-84pbm               1/1     Running   0          19m

7.Test command after service
[root@k8s-m-01 metrics-server]# kubectl top nodes
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master   110m         5%     948Mi           32%       
k8s-node1    41m          2%     1013Mi          35%       
k8s-node2    60m          3%     1228Mi          42%    

[root@k8s-m-01 metrics-server]# kubectl top pod
NAME                                          CPU(cores)   MEMORY(bytes)   
nfs-nfs-client-provisioner-8557b8c764-lc4nx   3m           6Mi 

2, HPA automatic telescopic

In the production environment, there will always be some unexpected things. For example, the traffic of the company's website suddenly increases. At this time, the previously created Pod is not enough to support all visits, and the operation and maintenance personnel can't guard the business services 24 hours. At this time, HPA can be configured to automatically expand the number of Pod copies and share the high concurrent traffic when the load is too high, When the flow returns to normal, HPA will automatically reduce the number of pods. HPA automatically expands the number of pods according to CPU utilization and memory utilization, so the Requests parameter must be defined to use HPA.

The full name of HPA is Horizontal Pod Autoscaler, which translates into Chinese as pod horizontal automatic scaling. HPA can automatically expand and shrink the number of pods in replication controller, deployment and replica based on CPU utilization (in addition to CPU utilization, it can also automatically expand and shrink based on the measurement index custom metrics provided by other applications). Pod autoscale does not apply to objects that cannot be scaled, such as DaemonSets. HPA is implemented by Kubernetes API resources and controllers. Resources determine the behavior of the controller. The controller periodically obtains the average CPU utilization, compares it with the target value, and then adjusts the number of copies in the replication controller or deployment.

# Compile pod resource list
kind: Deployment
apiVersion: apps/v1
  name: hpa
      app: hpa
        app: hpa
        - name: hpa
          image: alvinos/django:v1
            requests:           # How many resources are used at least
              cpu: 100m         # At least 100m for cpu
              memory: 100Mi     # Memory is at least 100Mi

            limits:             # How many resources are used at most
              cpu: 200m         # The cpu is 200m at most and cannot exceed 200m at most
              memory: 200Mi     # The maximum memory is 200Mi and cannot exceed
# Configure HPA resource list
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
  name: hpa
  namespace: default
  # Minimum pod quantity and maximum pod quantity of HPA
  maxReplicas: 10
  minReplicas: 2
  # HPA's scalable object description. HPA will dynamically modify the number of pod s of the object
    kind: Deployment
    name: hpa
    apiVersion: apps/v1
  # The monitored indicator array supports the coexistence of multiple types of indicators
    - type: Resource
      #  Core indicators, including cpu and memory (indicators defined in requests and limits of the container in the elastically scalable pod object.)
        name: cpu
        # CPU threshold
        # Calculation formula: average value of metric utilization (percentage) of all target pod s,
        # For example, limit CPU = 1000m, utilization=50% if 500m is actually used
        # For example (number of copies) deployment replica=3, limit. CPU = 1000m, then the actual used CPU of pod1 = 500m, POD2 = 300m, pod = 600m
        ## Then averageUtilization=(500/1000+300/1000+600/1000)/3 = (500 + 300 + 600)/(3*1000)) = 0.466667
        # For example, the limit is set to 200m, and the calculation result is: (40 + 41 + 42) / 600 = 0.205
        targetAverageUtilization: 40      # Note: if it exceeds 0.4%, write 40. If the cpu exceeds 40, expand the capacity immediately. The maximum capacity of cpu is 10 and the minimum is 2. 200 * 0.4 = 80 (its safety range is below 80)

# Compile service resource list
kind: Service
apiVersion: v1
  name: hpa
    - port: 80
      targetPort: 80
    app: hpa

Unit interpretation

requests: represents the resource limit of the container startup request. The allocated resources must meet this requirement
limits: represents the maximum number of resources that can be requested
Unit M: the unit of measurement of CPU is called millicore (m). Multiply the number of CPU cores of a node by 1000 to get the total number of CPUs of the node. For example, if a node has two cores, the total CPU of the node is 2000m.

Take dual core as an example:

      cpu: 50m     #Equal to 0.05
      memory: 512Mi
      cpu: 100m    #Equal to 0.1
      memory: 1Gi

Meaning: when the container starts, it requests 50 / 2000 cores (2.5%) and allows up to 100 / 2000 cores (5%)
The total number of 0.05 cores except 2 is 2.5%, and the total number of 0.1 cores except 2 is 5%

            cpu: 100m    #Equal to 0.1
            memory: 512Mi
            cpu: 200m    #Equal to 0.2
            memory: 1Gi

The meaning of cpu unit m: when the container starts, it requests 100 / 2000 cores (5%) and allows up to 200 / 2000 cores (10%)
The total number of 0.1 cores except 2 is 5%, and the total number of 0.2 cores except 2 is 10%


2,# Application resources
[root@k8s-master ~]# kubectl apply -f hpa.yaml 
deployment.apps/hpa created

# View hpa monitor cpu usage
[root@k8s-master ~]#kubectl get horizontalpodautoscalers.autoscaling 
hpa    Deployment/hpa   24%/40%   2         10        2          53s

3,# View host resource usage
[root@k8s-master ~]# kubectl top pods
NAME                                          CPU(cores)   MEMORY(bytes)   
hpa-5cb8bcdc4f-xvkkf                          11m          54Mi  

# View pod operation details
[root@k8s-master ~]# kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
hpa-5cb8bcdc4f-xvkkf                          1/1     Running   0          7m4s

# View svc operation details
[root@k8s-master ~]# kubectl get svc
NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
hpa                            ClusterIP   <none>        80/TCP           6m50s

# Enter the following command for each node in the cluster to perform the stress test
[root@k8s-master ~]# while true; do curl; echo ''; done

# Check the cpu running status of this pod again (the cpu traffic will increase, as shown below)
[root@k8s-master ~]# kubectl top pods
NAME                                          CPU(cores)   MEMORY(bytes)   
hpa-5cb8bcdc4f-xvkkf                          163m         56Mi 

# Check again that the number of pods will increase (this enables HPA to monitor cpu utilization, and the pod container will be automatically expanded with the increase of cpu utilization)
[root@k8s-master ~]# kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
hpa-7f5d745bf9-45fkj                          1/1     Running   0          4m39s
hpa-7f5d745bf9-5qb4d                          1/1     Running   0          4m55s
hpa-7f5d745bf9-5vnfl                          1/1     Running   0          4m40s
hpa-7f5d745bf9-fh66r                          1/1     Running   0          4m55s
hpa-7f5d745bf9-fnlx4                          1/1     Running   0          15m
hpa-7f5d745bf9-g7r5c                          1/1     Running   0          4m55s
hpa-7f5d745bf9-qdrc7                          1/1     Running   0          4m39s
hpa-7f5d745bf9-s2sbx                          1/1     Running   0          4m39s
hpa-7f5d745bf9-sz9zz                          1/1     Running   0          6m56s
hpa-7f5d745bf9-zw778                          1/1     Running   0          16m

You can see that it automatically scales to ten.

