The Main Scheduler

The main scheduler function (schedule) is invoked directly at many points in the kernel to allocate the CPU to a process other than the currently active one. After returning from system calls, the kernel also checks whether the reschedule flag tif_need_resched of the current process is set — for example, the flag is set by scheduler_tick as mentioned above. If it is, the kernel invokes schedule. The function then assumes that the currently active task is definitely to be replaced with another task.

Before I discuss schedule in detail, I need to make one remark that concerns the_sched prefix. This is used for functions that can potentially call schedule, including the schedule function itself. The declaration looks as follows:


The purpose of the prefix is to put the compiled code of the function into a special section of the object file, namely, .sched.text (see Appendix C for more information on ELF sections). This information enables the kernel to ignore all scheduling-related calls when a stack dump or similar information needs to be shown. Since the scheduler function calls are not part of the regular code flow, they are of no interest in such cases.

Let's come back to the implementation of the main scheduler schedule. The function first determines the current run queue and saves a pointer to the task structure of the (still) active process in prev.

kernel/sched.c asmlinkage void _sched schedule(void)

struct task_struct *prev, *next; struct rq *rq; int cpu;


cpu = smp_processor_id(); rq = cpu_rq(cpu); prev = rq->curr;

As in the periodic scheduler, the kernel takes the opportunity to update the run queue clock and clears the reschedule flag tif_need_resched in the task structure of the currently running task.




Again thanks to the modular structure of the scheduler, most work can be delegated to the scheduling classes. If the current task was in an interruptible sleep but has received a signal now, it must be promoted to a running task again. Otherwise, the task is deactivated with the scheduler-class-specific methods (deactivate_task essentially ends up in calling sched_class->dequeue_task):

kernel/sched.c if (unlikely((prev->state & TASK_INTERRUPTIBLE) && unlikely(signal_pending(prev)))) { prev->state = TASK_RUNNING;

deactivate_task(rq, prev, 1);

put_prev_task first announces to the scheduler class that the currently running task is going to be replaced by another one. Note that this is not equivalent to taking the task off the run queue, but provides the opportunity to perform some accounting and bring statistics up to date. The next task that is supposed to be executed must also be selected by the scheduling class, and pick_next_task is responsible to do so:

prev->sched_class->put_prev_task(rq, prev); next = pick_next_task(rq, prev);

It need not necessarily be the case that a new task has been selected. If only one task is currently able to run because all others are sleeping, it will naturally be left on the CPU. If, however, a new task has been selected, then task switching at the hardware level must be prepared and executed.

kernel/sched.c if (likely(prev != next)) { rq->curr = next; context_switch(rq, prev, next);

context_switch is the interface to the architecture-specific methods that perform a low-level context switch.

The following code checks if the reschedule bit of the current task is set, and the scheduler jumps to the label described above and the search for a new process recommences:

kernel/sched.c if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) goto need_resched;

Notice that the above piece of code is executed in two different contexts: When no context switch has been performed, it is run directly at the end of the schedule function. If, however, a context switch has been performed, the current process will stop running right before this point — the new task has taken over the CPU. However, when the previous task is reselected to run later on, it will resume its execution directly at this point. Since prev will not point to the proper process in this case, the current thread needs to be found via current by test_thread_flag.

Continue reading here: Interaction with fork

Was this article helpful?

+1 0