Scheduler 28 - miscellaneous summary

1, Get binding information

1. Get through / proc / < PID > / status

# cat /proc/<pid>/status | grep Cpus_allowed
Cpus_allowed:    ff
Cpus_allowed_list:    0-7

Call path and function:

struct pid_entry tgid_base_stuff[] //fs/proc/base.c
    ONE("status", S_IRUGO, proc_pid_status),
struct pid_entry tid_base_stuff[] //fs/proc/base.c
    ONE("status", S_IRUGO, proc_pid_status),
        proc_pid_status //fs/proc/array.c
            task_cpus_allowed //fs/proc/array.c

static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
{
    //Take is task->cpus_ptr instead of task->cpus_mask
    seq_printf(m, "Cpus_allowed:\t%*pb\n", cpumask_pr_args(task->cpus_ptr));
    seq_printf(m, "Cpus_allowed_list:\t%*pbl\n", cpumask_pr_args(task->cpus_ptr));
}

2. Through sched_getaffinity() system call

//system call
int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);

//Kernel functions, where mask Is the parameter returned to user space
long sched_getaffinity(pid_t pid, struct cpumask *mask)
{
    struct task_struct *p = find_process_by_pid(pid);

    //What is used here is p->cpus_mask，And with cpu_active_mask Results after phase and phase
    cpumask_and(mask, &p->cpus_mask, cpu_active_mask);

}

3. Differences between the two acquisition methods

sched_ The getaffinity () system call returns p - > CPUs_ mask & cpu_ active_ The result of mask. Note that the result is affected by offline and isolate cpu cores after it is compared with the previous one. sched_ The final setting of the setaffinity () system call is p - > CPUs_ mask. Cat / proc / < PID > / status gets task - > CPUs directly_ The value of PTR will not be affected.

4. p->cpus_ PTR pointer and P - > CPUs_ Differences between mask variables

sched_ The setaffinity () system call is finally called to the scheduling class set_cpus_allowed callback. This callback of the five scheduling classes points to set_cpus_allowed_common(), which sets p - > cpus_mask. Search under kernel/sched and find cpus_ptr in fair c,deadline.c,rt.c,core.c, while cpus_mask is only available in core C. At core You can see p - > CPUs in C_ Mask is just sched_setaffinity()/sched_getaffinity() is used in the execution path of the system call.

In fact, they are the same thing, as follows:

//fork --> copy_process --> dup_task_struct
static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
{
    ...
    if (orig->cpus_ptr == &orig->cpus_mask) //equation holds good under all circumstances
        tsk->cpus_ptr = &tsk->cpus_mask;
    ...
}

//Go back to the first task:

//init/init_task.c
struct task_struct init_task = {
    ...
    .cpus_ptr    = &init_task.cpus_mask,
    .cpus_mask    = CPU_MASK_ALL,
    .nr_cpus_allowed= NR_CPUS, //This may initially be 32
    ...
}

2, Locks in scheduling

1. __task_rq_lock annotation translation

(1) Serialization rules:

Locking sequence:

    p->pi_lock
        rq->lock
            hrtimer_cpu_base->lock (hrtimer_start() Bandwidth control usage)

    rq1->lock
        rq2->lock  condition: rq1 < rq2

(2) General status:

Normally, the scheduling status is serialized by rq - > lock__ schedule() gets the rq - > lock of the local CPU. It can choose to delete the task from the run queue, and always check the local rq data structure to find the most qualified task to run.

The task queue is also protected by RQ - > lock, and the task may be taken from another CPU. A wake-up from another LLC domain may use IPI to transfer the queue to the local CPU to avoid bouncing around the status of the running queue [see ttwu_queue_wakelist()]

Task wakeups, especially those involving migration, are very complex to avoid having to use two RQ - > locks.

(3) Special status:

System calls and any external operations will use task_rq_lock() get p - > pi_ Lock and RQ - > lock. Therefore, the state they change is stable when holding any lock:

- sched_setaffinity()/set_cpus_allowed_ptr():    p->cpus_ptr, p->nr_cpus_allowed
- set_user_nice():    p->se.load, p->*prio
- __sched_setscheduler():
    p->sched_class, p->policy, p->*prio,
    p->se.load, p->rt_priority,
    p->dl.dl_{runtime, deadline, period, flags, bw, density}
- sched_setnuma():    p->numa_preferred_nid
- sched_move_task()/cpu_cgroup_fork():    p->sched_task_group
- uclamp_update_active():    p->uclamp*

(4) p->state <- TASK_*:

Use set_current_state()，__ set_current_state() or et_special_state() changes without locks, viewing their respective comments, or through try_to_wake_up(). The latter uses p - > pi_ Lock serializes itself for concurrency.

(5) p->on_rq <- { 0, 1 = TASK_ON_RQ_QUEUED, 2 = TASK_ON_RQ_MIGRATING }:

Activated by_ Task() is set and deactivated by_ Task() is cleared under RQ - > lock. Non zero indicates that the task is runnable, special on_ RQ_ The migrating state is used for migration without holding two RQ - > locks. It represents a task_cpu() is unstable. See task_rq_lock().

(6) p->on_cpu <- { 0, 1 }:

Prepared by_ Task () is set and by finish_task() clears, so that it will be set before p is scheduled and cleared after p is switched. Both are protected by RQ - > lock. Non zero indicates that the task is running on its CPU. [astute readers will observe that two tasks on a CPU may have - > on_cpu = 1 at the same time.]

(7) task_cpu(p): set_task_cpu() changes, and the rule is:

- Do not call on blocked tasks set_task_cpu(): 
  We don't care if we don't run on it CPU，This simplifies the task of hot plugging and blocking CPU The assignment does not need to be valid.

- be used for try_to_wake_up()，stay p->pi_lock Next call:
  This allows try_to_wake_up() Use only one rq->lock，View its comments.

- Used in rq->lock Migration of calls:
  [See task_rq_lock() Medium task_on_rq_migrating()]
    move_queued_task()
    detach_task()

- Used in double_rq_lock() Migration of calls:
  __migrate_swap_task()
  push_rt_task() / pull_rt_task()
  push_dl_task() / pull_dl_task()
  dl_task_offline_migration()

3, The scheduling class abandoned the use of the next pointer on 5.10

Instead, the linker is used to link the scheduling classes together, with the low priority scheduling class first and the high priority scheduling class last.

//fair.c It's gone next The pointer
const struct sched_class fair_sched_class __section("__fair_sched_class") = {
    ...
}

#define __section(section)    __attribute__((__section__(section)))

//include/asm-generic/vmlinux.lds.h
#define SCHED_DATA                \
    STRUCT_ALIGN();                \
    __begin_sched_classes = .;        \
    *(__idle_sched_class)            \
    *(__fair_sched_class)            \
    *(__rt_sched_class)            \
    *(__dl_sched_class)            \
    *(__stop_sched_class)            \
    __end_sched_classes = .;


//Use the following macros to access:

//kernel/sched/sched.h

/* Defined in include/asm-generic/vmlinux.lds.h */
extern struct sched_class __begin_sched_classes[];
extern struct sched_class __end_sched_classes[];

#define sched_class_highest (__end_sched_classes - 1)
#define sched_class_lowest  (__begin_sched_classes - 1)

#define for_class_range(class, _from, _to) \
    for (class = (_from); class != (_to); class--) //[ )

#define for_each_class(class) \
    for_class_range(class, sched_class_highest, sched_class_lowest)

4, cpumask_var_t and struct cpumask

typedef struct cpumask { unsigned long bits[1]; } cpumask_t;
typedef struct cpumask cpumask_var_t[1];

cpumask_var_t type is "struct cpumask [1]", which is an array name with an array length of 1. It can also be considered as a constant pointer to struct cpumask type. In addition to being an lvalue, it is equivalent to "struct cpumask *.

give an example:

#include <stdio.h>

typedef struct cpumask { unsigned long bits[1]; } cpumask_t;
typedef struct cpumask cpumask_var_t[1];

cpumask_var_t mt = {0xef}; //Define array and assign values

void print_mask_bit(struct cpumask *mask)
{
    printf("mask: 0x%lx\n", mask->bits[0]);
    
}

void main()
{
    print_mask_bit(mt);    //Pay attention to the transmission of parameters
}

/*
$ gcc main.c -o pp
$ ./pp
mask: 0xef
*/

Added by blackwidow on Sun, 23 Jan 2022 03:26:18 +0200

Programming VIP

Scheduler 28 - miscellaneous summary

1, Get binding information

2, Locks in scheduling

3, The scheduling class abandoned the use of the next pointer on 5.10

4, cpumask_var_t and struct cpumask

Popular Keywords