Figure 225 Time flow for initiation of load balancing on SMP systems

Run queues are CPU-specific, so cpu denotes the processor to which the run queue belongs. The kernel provides one migration thread per run queue to which migration requests can be posted — they are kept on the list migration_queue. Such requests usually originate from the scheduler itself, but can also become necessary when a process is restricted to a certain set of CPUs and must not run on the one it is currently executing on anymore. The kernel tries to balance run queues periodically, but if this fails to be satisfactory for a run queue, then active balancing must be used. active_balance is set to a nonzero value if this is required, and cpu notes the processor from which the request for active balancing initiates.

Furthermore, all run queues are organized in scheduling domains. This allows for grouping CPUs that are physically adjacent to each other or share a common cache such that processes should preferably be moved between them. On ''normal'' SMP systems, however, all processors will be contained in one scheduling domain. I will therefore not discuss this structure in detail, but only mention that it contains numerous parameters that can be set via /proc/sys/kernel/cpuX/domainY. These include the minimal and maximal time interval after which load balancing is initiated, the minimal imbalance for a queue to be re-balanced, and so on. Besides, the structure also manages fields that are set at run time and that allow the kernel to keep track when the last balancing operation has been performed, and when the next will take place.

So what does load_balance do? The function checks if enough time has elapsed since the last re-balancing operation, and initiates a new re-balancing cycle if necessary by invoking load_balance. The code flow diagram for this function is shown in Figure 2-26. Notice that I describe a simplified version because the SMP scheduler has to deal with a very large number of corner cases that obstruct the view on the essential actions.

Figure 2-26: Code flow diagram for load_balance.

First of all, the function has to identify which queue has most work to do. This task is delegated to find_busiest_queue, which is called for a specific run queue. The function iterates over the queues of all processors (or, to be precise, of all processors in the current scheduling group) and compares their load weights. The busiest queue is the queue with the largest value found in the end.

Once find_busiest_queue has identified a very busy queue, and if at least one task is running on this queue (load balancing will otherwise not make too much sense), a suitable number of its tasks are migrated to the current queue using move_tasks. This function, in turn, invokes the scheduler-class-specific load_balance method.

When selecting potential migration candidates, the kernel must ensure that the process in question

□ is not running at the moment or has just finished running because this fact would cancel out the benefits of the CPU caches currently filled with the process data.

□ may execute on the processor associated with the current queue on the grounds of its CPU affinity.

If balancing failed (e.g., because all tasks on the remote queue have a higher kernel-internal priority value, i.e., a lower nice priority), the migration thread that is responsible for the busiest run queue is woken up. To ensure that active load balancing is performed that is slightly more aggressive than the method tried now, load_balance sets the active_balance flag of the busiest run queue and also notes the CPU from which the request originates in rq->cpu.

Continue reading here: The Migration Thread

Was this article helpful?

0 0