Linux Hung Task analysis

Transferred from: https://www.cnblogs.com/arnoldlu/p/10529621.html

The kernel often prints "INFO: task xxx:xxx blocked for more than 120 seconds." For such log information, the kernel's hung task mechanism works.

hung task mechanism is implemented through kernel thread khungtaskd, which monitors tasks_ If the process in the uninterruptible state does not switch within the 120s cycle, the detailed information will be printed.

1. Background of hung task

In D status, i.e. task_ A process in uninterruptible state cannot receive a kill signal.

If a process is in D state for a long time, users are often powerless.

It is not normal for a process to be in D state for a long time. The kernel designs D state to make the process wait for IO to complete. Under normal circumstances, IO should complete immediately, and then wake up the process responding to d.

Even under abnormal conditions, IO processing also has a timeout mechanism. In principle, the process should not be in D state for a long time.

If the process is in D state for a long time, first, the IO device is damaged, or there are bug s in the kernel or unreasonable mechanisms, resulting in the process being in D state for a long time and unable to wake up.

In view of this situation, the kernel provides a hung task mechanism to detect whether there are processes in D state in the system that have not been switched for more than 120s; If so, print the relevant warnings and stack.

2. Basic principle of hung task

The implementation of hung task wakes up once every 120s by creating a khungtaskd kernel thread;

Then traverse all processes in the kernel. Two conditions need to be met: the process is in TASK_UNINTERRUPTIBLE, and nvcsw+nivcsw==last_switch_count；

Finally, print the process information and stack.

3. hung task code analysis

3.1 task_ Related members of hung task in strcut

Before performing hung task analysis, you need to understand struct task_ state, nvcsw, nivcsw, last in strcut_ switch_ Count means several members.

struct task_struct {
...
    volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */---------------Current process status, TASK_UNINTERRUPTIBLE Indicates that the process will not be interrupted.
...
    unsigned long nvcsw, nivcsw; /* context switch counts */--------------------------nvcsw Indicates the number of active process switches, nivcsw Indicates the number of passive process switches. The sum of the two is the total number of process switches....
#ifdef CONFIG_DETECT_HUNG_TASK
/* hung task detection */
    unsigned long last_switch_count;--------------------------------------------------This variable can only be modified in two places. One is to set the initial value when creating a new process last_switch_count=nvcsw+nivcsw. The other is in khungtaskd Update in.
#endif
...
};

3.2 khungtaskd thread creation

watchdog() is the main function of the khuangtaskd thread. Every other sysctl_hung_task_timeout_secs wakes up once and calls check_hung_uninterruptible_tasks() checks all processes.

static int watchdog(void *dummy)
{
    unsigned long hung_last_checked = jiffies;

    set_user_nice(current, 0);---------------------------------------------------Set current process nice Is 0, i.e. normal priority.

    for ( ; ; ) {
        unsigned long timeout = sysctl_hung_task_timeout_secs;-------------------Get process hung Time limit.
        long t = hung_timeout_jiffies(hung_last_checked, timeout);

        if (t <= 0) {
            if (!atomic_xchg(&reset_hung_task, 0))
                check_hung_uninterruptible_tasks(timeout);
            hung_last_checked = jiffies;
            continue;
        }
        schedule_timeout_interruptible(t);-----------------------------------------dormancy sysctl_hung_task_timeout_secs Seconds.
    }

    return 0;
}

static int __init hung_task_init(void)
{
    atomic_notifier_chain_register(&panic_notifier_list, &panic_block);------------register panic Notification chain, in panic Perform relevant operations when.
    watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");---------------------Create kernel thread khungtaskd. 

    return 0;
}
subsys_initcall(hung_task_init);

panic_block register with panic_notifier_list notifies the linked list that if the system generates panic, then did_panic will be set to 1.

static int
hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
{
    did_panic = 1;

    return NOTIFY_DONE;
}

static struct notifier_block panic_block = {
    .notifier_call = hung_task_panic,
};

3.3 check whether the process is hung

check_hung_uninterruptible_tasks() traverses all processes and threads in the kernel, and first determines whether the state is TASK_UNINTERRUPTIBLE.

static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
    int max_count = sysctl_hung_task_check_count;-------------------Detect the maximum number of processes. The default is the maximum process number.
    int batch_count = HUNG_TASK_BATCHING;---------------------------The maximum number of processes per traversal is 1024.
    struct task_struct *g, *t;

    /*
     * If the system crashed already then all bets are off,
     * do not report extra hung tasks:
     */
    if (test_taint(TAINT_DIE) || did_panic)
        return;

    rcu_read_lock();
    for_each_process_thread(g, t) {
        if (!max_count--)
            goto unlock;
        if (!--batch_count) {
            batch_count = HUNG_TASK_BATCHING;
            if (!rcu_lock_break(g, t))--------------------------------prevent rcu_read_lock It takes too long. release rcu，And active scheduling. After scheduling comes back, check whether the response process is still there. If not, exit the traversal, otherwise continue.
                goto unlock;
        }
        /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
        if (t->state == TASK_UNINTERRUPTIBLE)-------------------------khungtaskd Monitor only TASK_UNINTERRUPTIBLE State of the process thread.
            check_hung_task(t, timeout);
    }
 unlock:
    rcu_read_unlock();
}


static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
    unsigned long switch_count = t->nvcsw + t->nivcsw;----------------Indicates the total number of thread switches, including active and passive.

    /*
     * Ensure the task is not frozen.
     * Also, skip vfork and any other user process that freezer should skip.
     */
    if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
        return;

    /*
     * When a freshly created task is scheduled once, changes its state to
     * TASK_UNINTERRUPTIBLE without having ever been switched out once, it
     * musn't be checked.
     */
    if (unlikely(!switch_count))
        return;

    if (switch_count != t->last_switch_count) {-------------------------If the total number of switches and last_switch_count Unequal, indicating last time khungtaskd to update last_switch_count Then process switching occurs; Conversely, equality means 120 s No switching occurred within the time.
        t->last_switch_count = switch_count;----------------------------to update last_switch_count. 
        return;
    }

    trace_sched_process_hang(t);

    if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic)----------If not enabled warning and panic，return.
        return;

    /*
     * Ok, the task did not get scheduled for more than 2 minutes,
     * complain:
     */
    if (sysctl_hung_task_warnings) {------------------------------------hung task Limit the number of wrong printing times. The default is 10 times. The maximum number of printing times is 10 during the operation of the whole system.
        sysctl_hung_task_warnings--;
        pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
            t->comm, t->pid, timeout);
        pr_err("      %s %s %.*s\n",
            print_tainted(), init_utsname()->release,
            (int)strcspn(init_utsname()->version, " "),
            init_utsname()->version);
        pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
            " disables this message.\n");
        sched_show_task(t);----------------------------------------------Display process ID,Name, status, stack and other information.
        debug_show_all_locks();------------------------------------------If enabled debug_locks，The lock held by the print process.
    }

    touch_nmi_watchdog();

    if (sysctl_hung_task_panic) {
        trigger_all_cpu_backtrace();
        panic("hung_task: blocked tasks");
    }
}

Let's take a look at the process details:

void sched_show_task(struct task_struct *p)
{
    unsigned long free = 0;
    int ppid;
    unsigned long state = p->state;

    if (!try_get_task_stack(p))
        return;
    if (state)
        state = __ffs(state) + 1;
    printk(KERN_INFO "%-15.15s %c", p->comm,
        state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');------------------Process name and status, here should be D. 
    if (state == TASK_RUNNING)
        printk(KERN_CONT "  running task    ");
#ifdef CONFIG_DEBUG_STACK_USAGE
    free = stack_not_used(p);
#endif
    ppid = 0;
    rcu_read_lock();
    if (pid_alive(p))
        ppid = task_pid_nr(rcu_dereference(p->real_parent));
    rcu_read_unlock();
    printk(KERN_CONT "%5lu %5d %6d 0x%08lx\n", free,
        task_pid_nr(p), ppid,
        (unsigned long)task_thread_info(p)->flags);------------------------------free Indicates the amount of stack idle; The second represents the thread/process pid；The third represents the parent process pid；The last one represents the of the process flags. 

    print_worker_info(KERN_INFO, p);
    show_stack(p, NULL);
    put_task_stack(p);
}

The following log shows the recvComm process, with a pid of 175, a parent process of 148, and a current status of D; Currently, hung's stack is a read call and is stuck in usb_sourceslink_read() function.

4. Configuration of khungtaskd

Configure through sysctl or in / proc/sys/kernel /

hung_task_panic -------------------------------- whether to panic after detecting hung. The default value is 0

hung_task_check_count -------------------- the maximum number of check tasks. The default value is 32768

hung_task_timeout_secs ----------------- timeout, default 120

hung_task_warnings ----------------- number of times to print hung warning s. The default value is 10

You can also set whether to panic after hung through bootargs.

/*
 * Should we panic (and reboot, if panic_timeout= is set) when a
 * hung task is detected:
 */
unsigned int __read_mostly sysctl_hung_task_panic =
                CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;

static int __init hung_task_panic_setup(char *str)
{
    int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);

    if (rc)
        return rc;
    return 1;
}
__setup("hung_task_panic=", hung_task_panic_setup);

contact information: arnoldlu@qq.com

Added by dannyone on Sat, 08 Jan 2022 07:15:18 +0200

Programming VIP