Dispatcher 34-RT Load Balancing

Based on Linux-4.19.153

1. Description of related structure members

1. struct root_domain

Real-time schedulers require several global or system-wide resources to make scheduling decisions, as well as scalability bottlenecks due to increased number of CPU s (due to competition among lock-protected resources), and Root Domain was introduced to reduce such competition to improve scalability.
Cpuset provides a mechanism for dividing CPUs into subsets that are used by a process or a group of processes. Several cpusets can overlap. If no other cpuset contains overlapping CPUs, this cpuset is called "exclusive". Each mutually exclusive cpuset defines an isolated domain, also known as the root domain, that is separate from other cpusets or CPUs. There is struct root_for each root domian-related information In the domain structure (object):

 * We added the concept of root-domain to define variables for per-domain.
 * Each mutually exclusive cpuset essentially ends up with a member CPU and any other cpuset
 * Fully divided to define an island domain. Whenever we create a new exclusive cpuset, we also
 * A new root-domain object is created and attached.

struct root_domain {
    //root domain Count of references when rd Add 1 when referenced by the running queue and subtract 1 when referenced
    atomic_t        refcount;
    //Real-time task overloaded(rt overload)Of CPU Number of
    atomic_t        rto_count;
    struct rcu_head        rcu;
    //Belongs to this rd Of CPU Mask
    cpumask_var_t        span;
    cpumask_var_t        online;

     * Indicate pullable load on at least one CPU, e.g:
     * - More than one runnable task
     * - Running task is misfit
    //Indicate the rd Any CPU More than one runnable task
    int            overload;

    /* Indicate one or more cpus over-utilized (tipping point) */
    int            overutilized;

     * The bit corresponding to a CPU gets set here if such CPU has more
     * than one runnable -deadline task (as it is below for RT tasks).
    cpumask_var_t        dlo_mask;
    atomic_t        dlo_count;
    struct dl_bw        dl_bw;
    struct cpudl        cpudl;

     * Indicate whether a root_domain's dl_bw has been checked or
     * updated. It's monotonously increasing value.
     * Also, some corner cases, like 'wrap around' is dangerous, but given
     * that u64 is 'big enough'. So that shouldn't be a concern.
    u64 visit_gen;

     * For IPI pull requests, loop across the rto_mask.
    struct irq_work        rto_push_work;
    raw_spinlock_t        rto_lock;
    /* These are only updated and read within rto_lock */
    int            rto_loop;
    int            rto_cpu;
    /* These atomics are updated outside of a lock */
    atomic_t        rto_loop_next;
    atomic_t        rto_loop_start;
     * The "RT overload" flag: it gets set if a CPU has more than
     * one runnable RT task.
    //some CPU More than one runnable real-time task with corresponding bits set
    cpumask_var_t        rto_mask;
    //Included in rd In CPU Priority management structure members
    struct cpupri        cpupri;

    unsigned long        max_cpu_capacity;

     * NULL-terminated list of performance domains intersecting with the
     * CPUs of the rd. Protected by RCU.
    struct perf_domain __rcu *pd;

These RDS are used to reduce the global range of per-domain variables. Whenever a mutually exclusive cpuset is created, a new root domain object is also created, information from the CPU members. By default, a single high-level rd is created with all CPUs as members. All real-time scheduling decisions are made within the scope of only one rd.

2. struct task_struct

struct task_struct {
    struct sched_rt_entity        rt;
    #ifdef CONFIG_SMP
        /*Qualified RT considers that rq->rt.pushable_is hung through this member Tasks list, indicating a pushable task*/
        struct plist_node        pushable_tasks;
        struct rb_node            pushable_dl_tasks;


2. CPU Priority Management

1. CPU Priority Management tracks the priority of each CPU in the system to make process migration decisions more efficient. There are 102 CPU priorities. Here is the relationship between cpupri and prio:

cpupri                    prio
CPUPRI_INVALID (-1)        -1
CPUPRI_IDLE(0)            MAX_PRIO(140)
CPUPRI_NORMAL(1)        MAX_RT_PRIO ~ MAX_PRIO-1 (100~139)
2~101                    99~0

Note that CPU running idle task has cpupri=0 and CPU running CFS task has cpupri=1.

static int convert_prio(int prio)
    int cpupri;

    if (prio == CPUPRI_INVALID) /* -1 */
        cpupri = CPUPRI_INVALID; /* -1 */
    else if (prio == MAX_PRIO) /* 140 */
        cpupri = CPUPRI_IDLE; /* 0 */
    else if (prio >= MAX_RT_PRIO) /* 100 */
        cpupri = CPUPRI_NORMAL; /* 1 */
        cpupri = MAX_RT_PRIO - prio + 1; /* 100 - prio + 1 */

    return cpupri;

Pass prio=99 returns 0, and pass prio=100 returns 100.

The higher the cpupri value, the higher the priority (using subtraction). In CPUPRI_ CPUs in INVALID state are not eligible to participate in task routing. Cpupri belongs to the root domain, and each mutually exclusive cpuset consists of a root momain containing cpupri data. The system maintains these CPU states from a bitmap of two dimensions:
(1) The priority of the CPU, mapped from the task priority.
(2) CPU at a certain priority.

2. Related data structures

struct cpupri_vec {
    //At this priority CPU Quantity
    atomic_t        count;
    //At this priority CPU Bit code
    cpumask_var_t        mask;

//Reflect two dimensions
struct cpupri {
    //Hold about one cpuset All at a particular priority CPU Information
    struct cpupri_vec    pri_to_cpu[CPUPRI_NR_PRIORITIES];
    //Indicate a CPU Priority, pointing to an array, each CPU A member used primarily to record the current cpu Of cpupri Value, easy to update and modify.
    int            *cpu_to_pri;

Via cpupri_find()/cpupri_find_fitness() and cpupri_set() to find and set CPU priorities is the key to finding tasks to migrate quickly in real-time load balancing.

3. cpupri_set() function

(1) Functional analysis

 * cpupri_set - update the CPU priority setting
 * @cp: The cpupri context
 * @cpu: The target CPU
 * @newpri: The priority (INVALID,NORMAL,RT1-RT99,HIGHER) to assign to this CPU
 * Note: Assumes cpu_rq(cpu)->lock is locked
 * Returns: (void)
void cpupri_set(struct cpupri *cp, int cpu, int newpri)
    //Get Current cpu Of cpupri
    int *currpri = &cp->cpu_to_pri[cpu];
    int oldpri = *currpri;
    int do_mb = 0;

    //take p->prio Convert to cpuprio
    newpri = convert_prio(newpri);


    if (newpri == oldpri)

    //If so cpupri Changed, update this cpupri Corresponding information
    if (likely(newpri != CPUPRI_INVALID)) {
        struct cpupri_vec *vec = &cp->pri_to_cpu[newpri];

        cpumask_set_cpu(cpu, vec->mask);

        do_mb = 1;
    //Then delete the old information
    if (likely(oldpri != CPUPRI_INVALID)) {
        struct cpupri_vec *vec  = &cp->pri_to_cpu[oldpri];

        if (do_mb)
        cpumask_clear_cpu(cpu, vec->mask);

    *currpri = newpri;

For example, if a task on CPU2 switches from a CFS task at prio=120 to a RT task at prio=97, read cp->cpu_first To_ The current cpupri value of pri[2] since the CFS task was previously run on CPU2, the read value is 1. Then calculate that the new-cpupri is 101-97=4 under the new task run, and set the CPU2 mask into cp->pri_ To_ In the vec->mask of cpu[4], add the vec->count count count by one, which means that the CPU at cpupri=4 priority has an additional CPU2. Then CPU2 from cp->pri_ To_ The vec->mask of cpu[1] is deleted and the vec->count count count is reduced by 1, which means that the CPU with cpupri=1 priority is reduced by another CPU2.

(2) cpupri_ Call path of set ():

                rt_sched_class.rq_offline Callback
                    rq_offline_rt //Pass-through(cpupri, rq->cpu, CPUPRI_INVALID)
                rt_sched_class.rq_online Callback

This is mainly called in the path of the enqueue/dequeue RT task. It should be called when the task on a CPU is switched. Since the cpupri of the CFS task are all 1, only task switching involving RT will be called, and the calling functions are all in rt.c.


3. PUSH Task Migration

1. Basic idea of PUSH task

Based on cpupri, a set of CPUs with the lowest cpu priority are searched for as candidates, then a cpu is selected from the candidate CPUs as the target cpu, and then the queue on the push rq has the highest priority and the push RT task is in the past. Continue to loop until there are no pushable tasks.

Source CPU is push_ Rt_ The CPU to which RQ in the task (struct RQ *rq) parameter belongs, pushing RT tasks outward from the RQ of that cpu.


2. Timing of PUSH tasks

Push_ Rt_ The task() function is called at the following point in time:

(1) rt_mutex lock priority change, u sched_setscheduler() causes scheduling class changes, u schedule() task switching

rt_mutex_setprio //core.c
__sched_setscheduler //core.c
    check_class_changed //core.c Called when the dispatch class changes, the last dispatch class's switched_from,Call the next dispatch class again switched_to
        rt_sched_class.switched_to //rt.c Callback
            switched_to_rt //rt.c if p stay rq Above and not rq And p Run in multiple cpu Run on and rq->rt.overload Called
__schedule //core.c
    pick_next_task //core.c Select Next Task
        rt_sched_class.pick_next_task //rt.c Callback
            pick_next_task_rt //rt.c Unconditional call
                rt_queue_push_tasks //rt.c Judgement parameters rq Can be found on push Tasks, that is rq->rt.pushable_tasks Chain list is not for air conditioning
                    queue_balance_callback(rq, &per_cpu(rt_push_head, rq->cpu), push_rt_tasks); //rt.c Head insert hook rq->balance_callback linked list

Callback time:

schedule_tail //core.c No call place found
__schedule //core.c Task Switching Function Last Called
    balance_callback //core.c
        __balance_callback //core.c Callback in turn rq->balance_callback All functions on the list, hold rq->lock Turn off interrupt call

(2) When there are CPUs performing pull RT tasks, tell other CPUs to launch some tasks

pull_rt_task(rq) //rt.c Enabling RT_PUSH_IPI Will execute, triggered when the task is pulled push. rq For the current queue, tell others cpu Previously Current cpu upper push Some tasks
    tell_cpu_to_push //rt.c Yes rto Of cpu just queue
        irq_work_queue_on(&rq->rd->rto_push_work, cpu);
            rto_push_irq_work_func //Find something to do push Task, hold rq->lock spin Lock Call
                irq_work_queue_on(&rd->rto_push_work, cpu); //Own queue You, if you have rto Of cpu On and on queue Self, make up one"Kernel Threads"Run until none rto Of cpu.

    init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);

(3) If the wake-up task is an RT task and the execution is not scheduled in time, push it away from the wake-up rq

ttwu_do_wakeup //core.c
wake_up_new_task //core.c
    rt_sched_class.task_woken //Schedule class callback
        task_woken_rt //rt.c

Task_ Woken_ Call push_in rt() Rt_ The conditions for tasks () are harsh, as follows. Task p represented as wake-up is not a running task on rq, and the current RQ does not have a resched flag bit set (it will not be resched immediately), and p allows it to run on other CPUs, and the task currently running by RQ is a DL or RT task, and the current task of RQ can only run on the current CPU or has a higher priority than p. Will call push_rt_tasks() will push the awakened RT task away.

static void task_woken_rt(struct rq *rq, struct task_struct *p)
    if (!task_running(rq, p) &&
        !test_tsk_need_resched(rq->curr) &&
        p->nr_cpus_allowed > 1 &&
        (dl_task(rq->curr) || rt_task(rq->curr)) &&
        (rq->curr->nr_cpus_allowed < 2 || rq->curr->prio <= p->prio))

3. Ending conditions for PUSH tasks

See push_rt_tasks(rq), push tasks outward from RQ until there are no tasks to push.


4. PUSH Task Logic Implementation - push_rt_tasks()

(1) push_rt_tasks()

//rt.c Role: from parameters rq Push up some tasks to others rq upper
static void push_rt_tasks(struct rq *rq)
    /* push_rt_task will return true if it moved an RT */
    while (push_rt_task(rq)) //If there are tasks available PUSH Will go on

(2) push_rt_task()

 * If the current CPU has more than one RT task, see if the non
 * running task can migrate over to a CPU that is running a task
 * of lesser priority.
//push Return 1 for out-of-office tasks or 0 for out-of-office tasks
static int push_rt_task(struct rq *rq)
    struct task_struct *next_task;
    struct rq *lowest_rq;
    int ret = 0;

    /* update_rt_migration()Set in, 1 extra RT task and 1 migrable RT task */
    if (!rq->rt.overloaded)
        return 0;

    //from rq->rt.pushable_tasks Chain head can be removed push Of task,The first one to take out is the one with the highest priority RT task.
    next_task = pick_next_pushable_task(rq);
    if (!next_task)
        return 0;

    //Remove should be Runnable Of, not in running Tasks
    if (unlikely(next_task == rq->curr)) {
        return 0;

     * It's possible that the next_task slipped in of
     * higher priority than current. If that's the case
     * just reschedule current.
     * Translation:
     * next_task Is probably higher than the current priority, if so,
     * Only one reschedule is triggered.
    if (unlikely(next_task->prio < rq->curr->prio)) {
        return 0;

    /* We might release rq lock */

    /* find_lock_lowest_rq locks the rq if found */
    //according to cpupri find cpu Lowest priority cpu As a Task push Goal to cpu
    lowest_rq = find_lock_lowest_rq(next_task, rq);
    //1.If lowest_rq Can't find
    if (!lowest_rq) {
        struct task_struct *task;
         * find_lock_lowest_rq releases rq->lock
         * so it is possible that next_task has migrated.
         * We need to make sure that the task is still on the same
         * run-queue and is also still the next task eligible for pushing.
         * Translation:
         * find_lock_lowest_rq Release rq->lock, so next_task may have been moved.
         * You need to make sure that the task is still in this rq and is still the next one eligible for pushing. therefore
         * This function needs to be executed again.
        task = pick_next_pushable_task(rq);
        //(1)Reselected pending push task Or the original task
        if (task == next_task) {
             * The task hasn't migrated, and is still the next
             * eligible task, but we failed to find a run-queue
             * to push it to.  Do not retry in this case, since
             * other CPUs will pull from us when ready.
             * Translation:
             * The task hasn't been migrated yet and is still the next eligible task, but we couldn't find the destination to push it to
             * Standard run queue. Do not try again in this case, as the other CPU s will pull from us when they are ready.
            goto out;

        //(2)Reselected pending push task Not original task
        if (!task)
            /* No more tasks, just exit */
            goto out;

        /* Something has shifted, try again. */
        //Re-selected is different task Yes, try again
        next_task = task;
        goto retry;

    //2.If lowest_rq If not found, move the task from rq Pick up and down to lowest_rq upper
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq->cpu);
    activate_task(lowest_rq, next_task, 0);
    ret = 1;

    //To target lowest_rq Trigger a reschedule

    //CONFIG_LOCKDEP Relevant, if not enabled, just release lowest_rq->lock
    double_unlock_balance(rq, lowest_rq);


    return ret;

(3) pick_next_pushable_task()

Select the highest priority RT task with queue status on rq and the highest priority RT task with push.

static struct task_struct *pick_next_pushable_task(struct rq *rq) //rt.c
    struct task_struct *p;

    //rq->rt.pushable_tasks A non-empty list means there is something to do push Tasks
    if (!has_pushable_tasks(rq))
        return NULL;

    //first This also makes the one with the highest priority on the list RT Tasks, that is push rq upper queue Of the highest priority RT task
    p = plist_first_entry(&rq->rt.pushable_tasks, struct task_struct, pushable_tasks);

    BUG_ON(rq->cpu != task_cpu(p)); /*Check p is mounted on this cpu rq*/
    BUG_ON(task_current(rq, p)); /*return rq->curr == p;Check p is not a running task*/
    BUG_ON(p->nr_cpus_allowed <= 1);/*Check p is allowed to run on more than one cpu, otherwise it cannot push*/

    BUG_ON(!task_on_rq_queued(p)); /*return p->on_rq == TASK_ON_RQ_QUEUED; Check p is queue on rq*/
    BUG_ON(!rt_task(p)); /*return prio < 100 Check p must be an RT task*/

    return p;

(4) find_lock_lowest_rq()

Find the cpu with the lowest cpu priority as the target cpu for the push task based on cpupri.

/* Will lock the rq it finds */
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
    struct rq *lowest_rq = NULL;
    int tries;
    int cpu;

    //Maximum try 3 second
    for (tries = 0; tries < RT_MAX_TRIES; tries++) {
        //Select one cpu Lowest priority cpu(than task Running cpu Low priority, otherwise return-1)
        cpu = find_lowest_rq(task);
        //If not found cpu==-1 Perhaps true, or later permanently not true
        if ((cpu == -1) || (cpu == rq->cpu))

        //Eureka task To be push Goal to cpu Of rq
        lowest_rq = cpu_rq(cpu);

        //this if The judgement is possible because it is not held lowest_rq Lock, it may have another queue High priority tasks
        if (lowest_rq->rt.highest_prio.curr <= task->prio) {
             * Target rq has tasks of equal or higher priority,
             * retrying does not release any lock and is unlikely
             * to yield a different result.
             * Translation:
             * Target rq has the same or higher priority tasks, retrying will not release any locks
             * And it is unlikely to produce different results. So give up retry and go back and find no lowest_rq.
            lowest_rq = NULL;

        /* if the prio of this runqueue changed, try again */
        //? ###############
        //Here are some checks to see if the environment has changed to see if you should lowest_rq Set as NULL
        if (double_lock_balance(rq, lowest_rq)) {
             * We had to unlock the run queue. In the mean time, task could have
             * migrated already or had its affinity changed.
             * Also make sure that it wasn't scheduled on its rq.
            if (unlikely(task_rq(task) != rq ||
                     !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_allowed) ||
                     task_running(rq, task) ||
                     !rt_task(task) ||
                     !task_on_rq_queued(task))) {

                double_unlock_balance(rq, lowest_rq);
                lowest_rq = NULL;

        /* If this rq is still suitable use it. */
        //Probably successful,Satisfaction is not retry Yes, just go back to the one you found lowest_rq
        if (lowest_rq->rt.highest_prio.curr > task->prio)

        /* try again */
        double_unlock_balance(rq, lowest_rq);
        lowest_rq = NULL;

    return lowest_rq;


5. When to go to rq->rt.pushable_ Add pushable tasks to tasks list

In enqueue_ Task_ In rt, p->pushable_is only hung if P is not an executing task and can be run on multiple CPU s Tasks list, the smaller the p->prio, the higher the priority hangs in the front position.

static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
    plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
    plist_node_init(&p->pushable_tasks, p->prio);
    //p->prio The smaller the value, the higher the insertion position
    plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);

    /* Update the highest prio pushable task */
    if (p->prio < rq->rt.highest_prio.next)
        rq->rt.highest_prio.next = p->prio;

static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
    plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);

    /* Update the new highest prio pushable task */
    if (has_pushable_tasks(rq)) {
        p = plist_first_entry(&rq->rt.pushable_tasks, struct task_struct, pushable_tasks);
        rq->rt.highest_prio.next = p->prio;
    } else
        rq->rt.highest_prio.next = MAX_RT_PRIO;

Call path:

    enqueue_task_rt //rt.c At the end of the function, only if p Satisfaction is not an ongoing task and can be more than one CPU Run on to invoke
        enqueue_pushable_task(rq, p);

    dequeue_task_rt //Unconditional execution
        dequeue_pushable_task(rq, p);


6. rt_ Setup of rq->overloaded flag

Judge rt_when enqueue/dequeue RT task Real-time updates for migrable real-time tasks are available on rq.

static void update_rt_migration(struct rt_rq *rt_rq)
    if (rt_rq->rt_nr_migratory && rt_rq->rt_nr_total > 1) {
        if (!rt_rq->overloaded) {
            //rd->rto_count++ And setting updates rd->rto_mask cpu Mask
            rt_rq->overloaded = 1;
    } else if (rt_rq->overloaded) {
        //rd->rto_count-- And clear updates rd->rto_mask cpu Mask
        rt_rq->overloaded = 0;

Call path:

        inc_rt_migration //Unconditional call, unconditional rt_rq->rt_nr_total++,if p Allow more than one cpu Run Up to Execute rt_rq->rt_nr_migratory++;
        dec_rt_migration //Unconditional call, unconditional execution rt_rq->rt_nr_total--,if p Allow more than one cpu Run Up to Execute rt_rq->rt_nr_migratory--;


IV. PULL Task Migration

1. Basic idea of PULL task

When the next RT task is selected, if the highest priority RT task on rq is found to have a lower priority than prev, the pull rt task is considered necessary. There are two cases:

(1) Do not enable RT_PUSH_IPI

Pull up the rt task from the CPU with the highest priority of runnable RT than yourself, do this for each cpu, and then trigger this CPU preemptive scheduling.

(2) Enable RT_PUSH_IPI

Use queue irq_on each rto cpu one by one Work triggers rto cpu to push task, then uses push task processing logic to replace pull task with push task.

2. Timing of PULL tasks

rt_mutex_setprio //core.c
__sched_setscheduler //core.c
    check_class_changed //core.c
            switched_from_rt //if p yes runnable Of rt Task and rq No more on rt Task in Run Call
            prio_changed_rt //if p Is the task currently being executed and has a lower priority call
                    queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task);
                    pick_next_task_rt //Judgment Needs pull When pull

Execute when you want to select the next RT task. Need_ Pull_ Rt_ The task is used to determine whether a pull task is required, as long as the highest priority of the queue's RT thread on the current RQ is lower than that of the prev task, the pull task is considered necessary in the rq.

static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
    /* Try to pull RT tasks here if we lower this rq's prio */
    return rq->rt.highest_prio.curr > prev->prio;

Call time two, queue_balance_callback, same as push_rt_task


3. Logical implementation of PULL tasks--pull_rt_task()

3.1 First look at not enabled RT_ PUSH_ Situation with IPI sched feat

(1) pull_rt_task function

static void pull_rt_task(struct rq *this_rq)
    int this_cpu = this_rq->cpu, cpu;
    bool resched = false;
    struct task_struct *p;
    struct rq *src_rq;
    //return rq->rd->rto_count, As long as one cpu Add 1 if there are migrable tasks
    int rt_overload_count = rt_overloaded(this_rq);

    if (likely(!rt_overload_count))

     * Match the barrier from rt_set_overloaded; this guarantees that if we
     * see overloaded we must also see the rto_mask bit.

    /* If we are the only overloaded CPU do nothing */
    //Currently only Ben cpu One is rt_overload,Then there is no need to pull rt Task is coming
    if (rt_overload_count == 1 && cpumask_test_cpu(this_rq->cpu, this_rq->rd->rto_mask))

    //If this is enabled feature,Notify others CPU Push Task to this rq,Instead of pulling
    if (sched_feat(RT_PUSH_IPI)) {

    //For each rt Overloaded cpu All Executed
    for_each_cpu(cpu, this_rq->rd->rto_mask) {
        //Skip this cpu,Definitely not subordinate to this cpu Upper Book cpu Drop-up task
        if (this_cpu == cpu)

        src_rq = cpu_rq(cpu);

         * Don't bother taking the src_rq->lock if the next highest
         * task is known to be lower-priority than our current task.
         * This may look racy, but if this value is about to go
         * logically higher, the src_rq will push this task away.
         * And if its going logically lower, we do not care
         * Translation:
         * If the next highest priority task is known to have a lower priority than the current one, no
         * Hold src_ Rq->lock. This may seem like competition, but if this value is logically imminent
         * Getting taller, src_rq will push this task away. If it falls logically, we don't care
        //Select only those with the highest priority higher than yourself as an alternative src_rq (enqueue Will be updated)
        if (src_rq->rt.highest_prio.next >= this_rq->rt.highest_prio.curr)

         * We can potentially drop this_rq's lock in
         * double_lock_balance, and another CPU could alter this_rq
         * Translation:
         * In double_ Lock_ This_may be released in balance RQ locks, and the other
         * CPU This_may be changed RQ
        double_lock_balance(this_rq, src_rq);

         * We can pull only a task, which is pushable on its rq, and no others.
        //from src_rq Select the one with the highest priority runnable Of RT task
        p = pick_highest_pushable_task(src_rq, this_cpu);

         * Do we have an RT task that preempts the to-be-scheduled task?
        if (p && (p->prio < this_rq->rt.highest_prio.curr)) {
            WARN_ON(p == src_rq->curr); //Selected RT Task cannot be src_rq Tasks in progress on
            WARN_ON(!task_on_rq_queued(p)); //Selected RT Task cannot be non-right runnable Tasks

             * There's a chance that p is higher in priority than what's currently
             * running on its CPU. This is just that p is wakeing up and hasn't had
             * a chance to schedule. We only pull p if it is lower in priority than
             * the current task on the run queue.
            /* If selected from src_ p ratio src_selected on RQ The task being executed on RQ has a higher priority, so don't skip it.
             * Because it can preempt low-priority tasks and be scheduled for execution quickly.
            if (p->prio < src_rq->curr->prio)
                goto skip;

            resched = true;

            //From Source src_rq Pick it up and put it down this_rq
            deactivate_task(src_rq, p, 0);
            set_task_cpu(p, this_cpu);
            activate_task(this_rq, p, 0);
             * We continue with the search, just in
             * case there's an even higher prio task
             * in another runqueue. (low likelihood
             * but possible)
        double_unlock_balance(this_rq, src_rq);

    //if pull When a task arrives, a preemption schedule is triggered
    if (resched)

(2) pick_highest_pushable_task function

 * Return the highest pushable rq's task, which is suitable to be executed
 * on the CPU, NULL otherwise
//Pass-through: rq: source rq, cpu: Destination cpu
static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
    struct plist_head *head = &rq->rt.pushable_tasks;
    struct task_struct *p;

    //judge rq->rt.pushable_tasks Empty representation rq Nothing on top push Tasks
    if (!has_pushable_tasks(rq))
        return NULL;

     * Traverse src_by priority from high to low Every push able task on rq, if not
     * running And affinity allows the first qualified task to be returned when running on the target cpu p
    plist_for_each_entry(p, head, pushable_tasks) {
        if (pick_rt_task(rq, p, cpu))
            return p;

    return NULL;

static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
    if (!task_running(rq, p) && cpumask_test_cpu(cpu, &p->cpus_allowed))
        return 1;

    return 0;


3.2 enable RT_ PUSH_ Situation with IPI sched feat

pull_rt_task logic delegated to tell_cpu_to_push(this_rq), let other cpu go to this_ push tasks on RQ replace pull tasks to reduce lock competition caused by pull tasks.

(1) tell_cpu_to_push() function:

static void tell_cpu_to_push(struct rq *rq)
    int cpu = -1;

    /* Keep the loop going if the IPI is currently active */
    //The only place to increase its value, no place to lower it

    /* Only one CPU can initiate a loop at a time */
    if (!rto_start_trylock(&rq->rd->rto_loop_start))


     * The rto_cpu is updated under the lock, if it has a valid CPU
     * then the IPI is still running and will continue due to the
     * update to loop_next, and nothing needs to be done here.
     * Otherwise it is finishing up and an ipi needs to be sent.
    //Initialized as-1,Only in rto_next_cpu Medium assignment cpu id or-1
    if (rq->rd->rto_cpu < 0)
        //Return a rt overload Of cpu
        cpu = rto_next_cpu(rq->rd);


    //take rd->rto_loop_start Set to 0

    if (cpu >= 0) {
        /* Make sure the rd does not get freed while pushing. rd->refcount++;*/
        //Directional parameter cpu Designated CPU upper queue One irq_work
        irq_work_queue_on(&rq->rd->rto_push_work, cpu); rto_push_irq_work_func

(2) rto_next_cpu function:

static int rto_next_cpu(struct root_domain *rd)
    int next;
    int cpu;

     * When starting the IPI RT pushing, the rto_cpu is set to -1,
     * rt_next_cpu() will simply return the first CPU found in
     * the rto_mask.
     * If rto_next_cpu() is called with rto_cpu is a valid CPU, it
     * will return the next CPU found in the rto_mask.
     * If there are no more CPUs left in the rto_mask, then a check is made
     * against rto_loop and rto_loop_next. rto_loop is only updated with
     * the rto_lock held, but any CPU may increment the rto_loop_next
     * without any locking.
    for (;;) {

        /* When rto_cpu is -1 this acts like cpumask_first() */
        cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);

        rd->rto_cpu = cpu;

        //Normally it's back here
        if (cpu < nr_cpu_ids)
            return cpu;

        //rto_mask Not in cpu Mask, assign value-1
        rd->rto_cpu = -1;

         * ACQUIRE ensures we see the @rto_mask changes
         * made prior to the @next value observed.
         * Matches WMB in rt_set_overload().
        next = atomic_read_acquire(&rd->rto_loop_next);

        if (rd->rto_loop == next)

        rd->rto_loop = next;

    return -1;

(3) rto_push_irq_work_func function.

Note that this is called in the context of a hard interrupt.

/* Called from hardirq context */
void rto_push_irq_work_func(struct irq_work *work)
    struct root_domain *rd =container_of(work, struct root_domain, rto_push_work);
    struct rq *rq;
    int cpu;

    rq = this_rq();

     * We do not need to grab the lock to check for has_pushable_tasks.
     * When it gets updated, a check is made if a push is possible.
    if (has_pushable_tasks(rq)) {
        //trigger push Process of the task


    /* Pass the IPI to the next rt overloaded queue */
    //Take Next rto cpu
    cpu = rto_next_cpu(rd);


    if (cpu < 0) {

    /* Try the next RT overloaded CPU */
     * Own queue itself, but queue's cpu is the next rto cpu until all
     * The rto cpu of the push task executed before stopping.
    irq_work_queue_on(&rd->rto_push_work, cpu); //rto_push_irq_work_func


Added by typo on Sun, 06 Mar 2022 20:04:39 +0200