The mandatory scheduling of processes in linux kernel, that is, involuntary, passive and deprivation scheduling, is mainly caused by time. As mentioned earlier, this scheduling occurs on the eve of the process returning from system space to user space. Of course, such scheduling does not occur every time you return from system space to user space. Reference entry. From the previous blog S code fragment ret_ with_ From reschedule, it can be seen that whether schedule is really called at this time depends on the current process task_ Need in struct structure_ Whether the resched is 1 (not 0). Therefore, the problem comes down to when the need_resched of the current process is set to 1. In the current version of the kernel, under the condition of single CPU, there are mainly the following situations:
- In the service program with clock interruption, it is found that the current process (continuously) runs too long.
- When a sleeping process is awakened, it is found that the awakened process is more qualified to run than the current process.
- A process changes scheduling policy or comity through system calls. This situation should actually be regarded as active and resource scheduling, so such system calls will cause immediate scheduling.
Look at the first case first. In the previous section, the reader has seen that, Queue the executable process when scheduling Each process in the ready process queue calculates a current weight. For general interactive applications, its value mainly depends on the remaining time quota of the process, that is, the current value of a counter in the task_struct structure. For processes with real-time requirements, that is, processes with scheduling policy of SCHED_RR or SCHED_FIFO, the running qualification is irrelevant, , and all have very high weights. When all processes in the queue are interactive processes, that is, the scheduling policy is sched_ When other processes and all these processes have used up the time quota, the time quota of each process must be recalculated and set, and its value mainly depends on the priority set for each process. During operation, the time quota of the current process will be decreased every time a clock interrupt occurs, so that the running qualification of the current process will be gradually reduced. When the value of the counter drops to 0, a scheduling will be forced to deprive the current process of running.
In the clock interrupt blog, readers have seen that in the clock interrupt service program do_timer_ To call a function do in interrupt_ Timer, and has browsed through the code of this function. In this function, for the single CPU structure (each CPU in the SMP structure uses a local timer, called APIC timer), another function update_process_times is called to adjust some time-related running parameters of the current process. The code is as follows:
do_timer_interrupt=>do_timer=>update_process_times
/* * Called from the timer interrupt handler to charge one tick to the current * process. user_tick is 1 if the tick is user time, 0 for system. */ void update_process_times(int user_tick) { struct task_struct *p = current; int cpu = smp_processor_id(), system = user_tick ^ 1; update_one_process(p, user_tick, system, cpu); if (p->pid) { if (--p->counter <= 0) { p->counter = 0; p->need_resched = 1; } if (p->nice > 0) kstat.per_cpu_nice[cpu] += user_tick; else kstat.per_cpu_user[cpu] += user_tick; kstat.per_cpu_system[cpu] += system; } else if (local_bh_count(cpu) || local_irq_count(cpu) > 1) kstat.per_cpu_system[cpu] += system; }
As long as it is not process 0, it will be subtracted from the counter of the current process by 1 When the counter drops to 0, the task is_ Need in struct structure_ Reset to 1 As for other operations in the function, including update_one_process is only related to statistical information. We don't care here. Readers can read it by themselves.
Look at the second case. In the kernel, when you want to wake up a sleeping process, you can call a function wake_up_process. The code of this function is as follows:
/* * Wake up a process. Put it on the run-queue if it's not * already there. The "current" process is always on the * run-queue (except when the actual re-schedule is in * progress), and as such you're allowed to do the simpler * "current->state = TASK_RUNNING" to mark yourself runnable * without the overhead of this. */ inline void wake_up_process(struct task_struct * p) { unsigned long flags; /* * We want the common case fall through straight, thus the goto. */ spin_lock_irqsave(&runqueue_lock, flags); p->state = TASK_RUNNING; if (task_on_runqueue(p)) goto out; add_to_runqueue(p); reschedule_idle(p); out: spin_unlock_irqrestore(&runqueue_lock, flags); }
It can be seen that the so-called wake-up is to set the status of the process to TASK_RUNNING, and hang the process into the runqueue (i.e. the execution process queue), and then call the function reschedule_idle. For a single CPU structure, this function is very simple.
wake_up_process=>reschedule_idle
static void reschedule_idle(struct task_struct * p) { #ifdef CONFIG_SMP #else /* UP */ int this_cpu = smp_processor_id(); struct task_struct *tsk; tsk = cpu_curr(this_cpu); if (preemption_goodness(tsk, p, this_cpu) > 1) tsk->need_resched = 1; #endif }
The purpose is to compare the awakened process with the current process. If the awakened process is more qualified to run, the need of the current process will be_ The resched flag is set to 1. Function preemption_goodness calculates the difference between the comprehensive weights of two processes, and its code is also in kernel / sched As defined in C:
wake_up_process=>reschedule_idle=>preemption_goodness
/* * the 'goodness value' of replacing a process on a given CPU. * positive value means 'replace', zero or negative means 'dont'. */ static inline int preemption_goodness(struct task_struct * prev, struct task_struct * p, int cpu) { return goodness(p, cpu, prev->active_mm) - goodness(prev, cpu, prev->active_mm); }
Readers may have noticed that in the reschedule_ The current golden policy qualified pointer in idle does not operate current through a macro, but operates CPU through another macro_ Curr got it. What is the difference between the two? Let's look at the CPU first_ The definition of curr is also defined in this file:
#define cpu_curr(cpu) aligned_data[(cpu)].schedule_data.curr
I wonder if the reader will remember that this is the amount selected in the schedule but set before switching, (see kernel/sched.c, line 586). Therefore, this is consistent with current most of the time, but the process is not the real current process for a short time before the handover is completed. However, it is obviously more accurate to compare the wake-up process with this process, because the process is already in place when the CPU wants to return from system space to user space.
The third situation should actually be regarded as voluntary surrender. However, in terms of the form of kernel code, the need of the current process is also changed in the same way_ The resched flag is set to 1 to schedule the process before it returns to user space, so it is also placed in this blog. There are two such system calls, one is sched_setscheduler and sched_yield. System call sched_ The function of setscheduler is to change the scheduling policy of processes. After the user logs in to the system, the use scheduling policy of the first process is SCHED_OTHER, that is, interactive applications without real-time requirements by default. When a new process is created through fork, the scheduling policy used by the process is inherited to the child process. However, the user can call sched through the system_ Setscheduler changes its usage scheduling policy. Implementation of this system call in kernel code sys_sched_setscheduler is in kernel / sched In C:
asmlinkage long sys_sched_setscheduler(pid_t pid, int policy, struct sched_param *param) { return setscheduler(pid, policy, param); }
sys_sched_setscheduler=>setscheduler
static int setscheduler(pid_t pid, int policy, struct sched_param *param) { struct sched_param lp; struct task_struct *p; int retval; retval = -EINVAL; if (!param || pid < 0) goto out_nounlock; retval = -EFAULT; if (copy_from_user(&lp, param, sizeof(struct sched_param))) goto out_nounlock; /* * We play safe to avoid deadlocks. */ read_lock_irq(&tasklist_lock); spin_lock(&runqueue_lock); p = find_process_by_pid(pid); retval = -ESRCH; if (!p) goto out_unlock; if (policy < 0) policy = p->policy; else { retval = -EINVAL; if (policy != SCHED_FIFO && policy != SCHED_RR && policy != SCHED_OTHER) goto out_unlock; } /* * Valid priorities for SCHED_FIFO and SCHED_RR are 1..99, valid * priority for SCHED_OTHER is 0. */ retval = -EINVAL; if (lp.sched_priority < 0 || lp.sched_priority > 99) goto out_unlock; if ((policy == SCHED_OTHER) != (lp.sched_priority == 0)) goto out_unlock; retval = -EPERM; if ((policy == SCHED_FIFO || policy == SCHED_RR) && !capable(CAP_SYS_NICE)) goto out_unlock; if ((current->euid != p->euid) && (current->euid != p->uid) && !capable(CAP_SYS_NICE)) goto out_unlock; retval = 0; p->policy = policy; p->rt_priority = lp.sched_priority; if (task_on_runqueue(p)) move_first_runqueue(p); current->need_resched = 1; out_unlock: spin_unlock(&runqueue_lock); read_unlock_irq(&tasklist_lock); out_nounlock: return retval; }
From the code summary, we can see that the linux kernel has three different scheduling policies, namely SCHED_FIFO,SCHED_RR and SCHED_OTHER. Each process must take one of them (see line 918). In addition to the scheduling policy, there are some parameters. The combination of a process's scheduling policy and scheduling parameters determines that it is subject to various characteristics of kernel scheduling, =.
capable here is an inline function that checks current - > cap_ Effective, to see whether a flag bit is 1, that is, whether the process is allowed to perform a specific operation. Function move_first_runqueue moves the process from the current position of the executable queue to the front of the queue (if the process is in the executable process queue), so that it is in a more favorable position during scheduling (compared with processes with the same running qualification). Finally, set the need_sched of the current process to 1 to force a scheduling.
Another system calls sched_yield enables a running process to make way for other processes, but does not go to sleep. Implementation of sys in kernel_ sched_yield is also in kernel / sched In C:
asmlinkage long sys_sched_yield(void) { /* * Trick. sched_yield() first counts the number of truly * 'pending' runnable processes, then returns if it's * only the current processes. (This test does not have * to be atomic.) In threaded applications this optimization * gets triggered quite often. */ int nr_pending = nr_running; #if CONFIG_SMP int i; // Substract non-idle processes running on other CPUs. for (i = 0; i < smp_num_cpus; i++) if (aligned_data[i].schedule_data.curr != idle_task(i)) nr_pending--; #else // on UP this process is on the runqueue as well nr_pending--; #endif if (nr_pending) { /* * This process can only be rescheduled by us, * so this is safe without any locking. */ if (current->policy == SCHED_OTHER) current->policy |= SCHED_YIELD; current->need_resched = 1; } return 0; }
Different from changing the scheduling policy or parameters, the position of the current process in the executable process queue is not changed here. It goes without saying that comity only makes sense if there are other ready processes in the system, so we need to check NR first_ Pending is the number of processes waiting to run. The sched of current - > policy will be in the code_ The yield flag is set to 1, and this flag bit is cleared to 0 in the following scheduling The relevant codes are in__ schedule_ In tail, this is through switch in schedule_ To calls after the handover process.
Compared with active scheduling, the current inherited need_ There is an important difference between the forced scheduling with the resched flag set to 1, that is, there is a delay from the discovery of the necessity of scheduling to the real occurrence of scheduling, It is called dispatch latency. Among the three conditions, the third one (changing scheduling policy or comity) is not sensitive to time. The first is caused by time, but there is no real-time requirement in fact. The second is when a process is awakened and found that the weight of the process is higher than the current process, that is, it is more urgent.
There are two sources of waking up a sleeping process. One is inter process communication. For example, one process sends a signal to another process, which readers have seen in the system call exit blog. Of course, inter process communication is not limited to signal transmission. For other examples, readers will see communication through pipes, message queues, socket s and other means in process Jiaxin communication in the future. The typical scenario is in the application of cline and server. A process waits for service requests from other processes in sleep, and when other processes send a request to it by some means, they will wake it up from sleep. The second source is usually more urgent, that is, an interruption caused by an occurrence at a certain time, In the interrupt service program or bh function, a process is to be interrupted due to the sending of the event (or several processes) wake up, so that they can further process time in user space. This situation often has higher time requirements. There are two questions. The first is whether the wake-up process must be selected when scheduling occurs. This is guaranteed by the setting of sched_fif and SCHED_RR scheduling policies and the use of priorities. The second is When (within a few microseconds) scheduling will occur is not guaranteed in the current linux kernel, but only from a statistical and average point of view.