Correct use of k8s probe

WeChat Public Number: Operations and Maintenance Development Story, Author: Teacher Xia

How to configure Pod's liveness and readiness and startup probes

When you use kubernetes, have you ever encountered a vicious cycle in which a Pod hangs up and restarts after it starts? Have you ever wondered how kubernetes detects that a pod is still alive? Even though the container is started, how does kubernetes know that the container's processes are ready to be served? Let's go through this article on the Kubernetes website Configure Liveness and Readiness Probes To explore.

This article will show you how to configure the viability and readability probes for containers.

Kubelet uses the liveness probe to determine when to restart the container. For example, when the application is running but cannot proceed further, the liveness probe will capture deadlock and restart the container in that state so that the application can continue running with bugs (who has fewer bugs yet).

Kubelet uses readiness probe Determines if the container is ready to accept traffic. Kubelet will only determine that the Pod is ready when all containers in the Pod are ready. The purpose of this signal is to control which Pods should act as backends for the service. If Pods are not ready, they will be removed from the service's load balancer.
Sometimes, there are existing applications that require more initialization time at startup. This does not affect the fast response to probe deadlocks.

Kubelet uses startup probe Determine if the container is started. Setting the survive detection parameters is tricky in these cases. The trick is to use a command to set the startup detection. For HTTP or TCP detection, the failureThreshold * periodSeconds parameter can be set to ensure that the startup time is long enough to cope with a bad situation.

Define the liveness command

Many long-running applications will eventually switch to broken state and cannot be recovered unless restarted. Kubernetes provides a liveness probe to detect and remedy this situation.
In this exercise, you will create a POD that runs a container based on the gcr.io/google_containers/busybox image. The following is Pod's configuration file exec-liveness.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    image: gcr.io/google_containers/busybox
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

This profile configures a container for Pod. PeriododSeconds specifies that kubelet executes a live probe every five seconds. initialDelaySeconds tells kubelet to wait five seconds before executing the probe for the first time. The probe detection command is to execute cat/tmp/healthy in the containerCommand. If the command is executed successfully, it will return 0, and kubelet will assume that the container is alive and healthy. If a non-zero value is returned, kubelet will kill the container and restart it.
When the container starts, execute the command:

/bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"

The cat/tmp/health command returns a successful return code within the first 30 seconds of the container's life. After 30 seconds, the cat/tmp/health command returns a failed return code.
Create Pod:

kubectl create -f https://k8s.io/docs/tasks/configure-pod-container/exec-liveness.yaml

Within 30 seconds, view Pod's event s:

kubectl describe pod liveness-exec

The result shows that there is no failed liveness probe:

FirstSeen    LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
24s       24s     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned liveness-exec to worker0
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "gcr.io/google_containers/busybox"
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "gcr.io/google_containers/busybox"
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e

After 35 seconds of startup, review the pod's event again:

kubectl describe pod liveness-exec

At the bottom is a message that liveness probe failed and the container was deleted and recreated.

FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
37s       37s     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned liveness-exec to worker0
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "gcr.io/google_containers/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "gcr.io/google_containers/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e
2s        2s      1   {kubelet worker0}   spec.containers{liveness}   Warning     Unhealthy   Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory

Wait another 30 seconds to confirm that the container has been restarted:

kubectl get pod liveness-exec

Add 1 to RESTARTS value from output

NAME            READY     STATUS    RESTARTS   AGE
liveness-exec   1/1       Running   1          1m

Define a liveness HTTP request

We can also use HTTP GET requests as liveness probe s. Here is an example of a PD running a container based on the gcr.io/google_containers/liveness image, http-liveness.yaml:
The configuration file defines only one container, and liveness Probe specifies that kubelet needs to execute a liveness probe every 3 seconds. initialDelaySeconds specifies that kubelet needs to wait 3 seconds before performing the first probe. The probe will send an HTTP to port 8080 of the server in the containerGET request. If the handler of the server's/healthz path returns a successful return code, kubelet will assume that the container is alive and healthy. If a failed return code is returned, kubelet will kill the container and restart it.
Any return code greater than 200 and less than 400 will be deemed a successful return code. Other return codes will be deemed a failed return code.
The container is alive for the first 10 seconds, /healthz handler returns a status code of 200. It then returns a return code of 500.

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
        w.WriteHeader(500)
        w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
})

Three seconds after the container starts, kubelet starts performing a health check. The first health check will succeed, but after 10 seconds, the health check will fail and kubelet will kill and restart the container.
Create a Pod to test HTTP liveness detection:

kubectl create -f https://k8s.io/docs/tasks/configure-pod-container/http-liveness.yaml

After 10 seconds, view Pod events to verify that liveness probes have failed and the Container has been restarted:
After 10 seconds, look at Pod's event, confirm that the liveness probe failed, and restart the container.

kubectl describe pod liveness-http

Define TCP liveness probes

The third kind of liveness probe s uses a TCP Socket. With this configuration, kubelet will attempt to open the socket of the container on the specified port. If a connection can be made, the container is considered healthy and if it cannot be considered a failure.

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: gcr.io/google_containers/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

The configuration of TCP checks is very similar to HTTP checks. This example uses both readiness and liveness probe s. Five seconds after the container starts, Kubelet will send the first readiness probe. This will attempt to connect to the goproxy container on port 8080. If the probe is successful, the pod will be marked as ready. Kubelet will perform the check every 10 seconds.
In addition to readiness probes, this configuration includes a liveness probe. 15 seconds after the container starts, kubelet will run the first liveness probe. Like readiness probes, this will attempt to connect to port 8080 on the goproxy container. If liveness probes fail, the container will restart.
Using named ports, you can use named ContainerPort as an HTTP or TCP liveness check:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
  path: /healthz
  port: liveness-port

Define readiness probe

Sometimes an application is temporarily unable to service external traffic. For example, an application may need to load a large amount of data or configuration files during startup. In this case, you do not want to kill the application, but you do not want to send requests. Kubernetes readiness probe s are provided to detect and mitigate these situations.Containers in Pod can report that they are not ready to handle the traffic sent by the Kubernetes service.

Readiness probe s are configured much like liveness probes. The only difference is to use readiness probes instead of liveness probes.

readinessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

Readinessprobe's HTTP and TCP probes are configured in the same way as liveness probe s.
Readines and live nssprobe can be used in parallel for the same container. Use both to ensure that traffic cannot reach the unprepared container and that the container restarts when it fails.

Define startupProbe Probe

This is a new feature from kubernetes1.16.
Probes allows Kubernetes to monitor the state of the application. Liveness Probe can be used to periodically check if the application is still alive. An example container defines this detector:

  livenessProbe : httpGet: 
    path: /healthz 
    port: liveness -port 
  failureThreshold: 3
  periodSeconds: 10

If it fails three times in 30 seconds, the container will restart. However, because this container is slow and takes more than 30 seconds to start, the probe will fail and the container will restart again.
This new feature allows you to define a way for startupProbe to block all other probes before the pod completes its startup:

startupProbe: 
  httpGet: 
    path: /healthz 
    port: liveness -port 
  failureThreshold: 30
  periodSeconds: 10

Now our slow container has a maximum of 5 minutes (30 checks* 10 seconds = 300 seconds) to complete its start-up.

Configure Probe

There are many precise and detailed configurations in Probe that allow you to accurately control liveness and readiness checks:

  • initialDelaySeconds: The number of seconds to wait for the first probe to execute after the container starts.
  • periodSeconds: The frequency at which the probe is performed. Default is 10 seconds, minimum is 1 second.
  • timeoutSeconds: Probe timeout. Default 1 second, minimum 1 second.
  • Successful Threshold: The minimum number of successive probes that succeed after a probe fails is considered successful. The default is 1. For liveness, it must be 1. The minimum value is 1.
  • failureThreshold: The minimum number of consecutive failures after a successful probe is identified as failures. The default is 3. The minimum value is 1.

Additional configuration items can be set for httpGet in HTTP probe:

  • Host: The host name of the connection, the IP to which the pod is connected by default. You may want to set "Host" in the http header instead of using IP.
  • scheme: schema used for connection, default HTTP.
  • Path: The path of the HTTP server accessed.
  • httpHeaders: A header for a custom request. HTTP runs a duplicate header.
  • Port: The port name or port number of the container being accessed. The port number must be between 1 and 65535.

For HTTP probes, kubelet sends HTTP requests to specified paths and ports to perform checks. Kubelet sends probes to the IP address of the container unless the address is overridden by the optional host field in httpGet. In most cases, you do not want to set the host field. There is a case where you can set it.Assume that the container is listening on 127.0.0.1 and that Pod's hostNetwork field is true. Then, the host under httpGet should be set to 127.0.0.1. If your pod depends on a virtual host, this may be a more common situation. You should not use host, but set the Host header in httpHeaders.

What should we do?

  • For microservices that provide HTTP protocols (REST services, and so on), a ReadinessProbe is always defined to check if the application (Pod) is ready to receive traffic.
  • For slow-start applications, we should use StartupProbe to prevent the container from being killed by LivenessProbe before it starts.
  • If the service is multiport, make sure ReadinessProbe covers all ports. For example, when using "admin" or "management" ports for readiness Probe (for example, 9090), make sure that the endpoint returns success only when the primary HTTP port (for example, 8080) is ready to accept traffic.
  • Use httpGet for ready detection of service ports and paths (e.g. /health).

What should we not do?

  • Do not rely on external dependencies such as data stores for ready/probing checks as this may cause cascading failures
1. Suppose 10 pod Services, database use Postgres,Cache usage redis: When your probe path depends on the working redis When connecting, if redis/Network failure, then all 10 Pod All will be "restarted" - this usually has a worse impact than it should have. Because the service is still available Postgres Get the data.
2. Services are best not to rely heavily on databases.
3. Only detect ports inside yourself, not outside pod Detectors should not depend on others in the same cluster Pod To prevent cascading failures.
  • You need to know exactly why you're using Liveness Probe, otherwise don't use Liveness Probe for your PD.
    • Liveness Probe can help restore "stuck" containers, but when we can control our applications and experience unexpected "stuck" processes and deadlocks, a better option is to deliberately crash inside the application to restore to a known good state.
    • Failed Liveness Probe s can cause containers to restart, potentially worsening the impact of load-related errors: Container restart can cause downtime (at least application startup time, such as 30s+), resulting in more errors and providing more traffic load to other containers, resulting in more failed containers, and so on
    • The combination of Liveness Probes and external dependencies is the worst case of cascading failure: a minor problem with a single environment will restart all containers.
  • If using LivenessProbe, do not set the same specifications for LivenessProbe and ReadinessProbe
    • You can use a Liveness Probe with the same health check but a higher failureThreshold (for example, mark Liveness Probe as not ready after three attempts and fail after ten attempts)
  • Do not use "exec" probes, they exist that cause zombie processes. A large part of the application processes we write will not solve processes that are dependent on the main process

summary

  • Use ReadinessProbe for Web applications to determine when a Pod should receive traffic
  • Incorrect use of Readiness/LivenessProbes may result in reduced availability and cascading failures
  • For slow-start applications, we should use StartupProbe

Keywords: Operation & Maintenance Kubernetes

Added by austinderrick2 on Thu, 14 Oct 2021 23:06:34 +0300