Deeper Look at Wait Queues

The previous discussion is all that most driver writers will need to know to get their job done. Some, however, will want to dig deeper. This section attempts to get the curious started; everybody else can skip to the next section without missing much that is important.

The wait_queue_head_t type is a fairly simple structure, defined in <linux/wait.h>. It contains only a lock variable and a linked list of sleeping processes. The individual data items in the list are of type wait_queue_t, and the list is the generic list defined in <linux/list.h> and described in "Linked Lists" in Chapter 10. Normally the wait_queue_t structures are allocated on the stack by functions like interruptible_sleep_on; the structures end up in the stack because they are simply declared as automatic variables in the relevant functions. In general, the programmer need not deal with them.

Some advanced applications, however, can require dealing with wait_queue_t variables directly. For these, it's worth a quick look at what actually goes on inside a function like interruptible_sleep_on. The following is a simplified version of the implementation of interruptible_sleep_on to put a process to sleep:

void simplified_sleep_on(wait_queue_head_t *queue) {

wait_queue_t wait;

init_waitqueue_entry(&wait, current); current->state = TASK_INTERRUPTIBLE;

add_wait_queue(queue, &wait); schedule();

remove_wait_queue (queue, &wait);

The code here creates a new wait_queue_t variable (wait, which gets allocated on the stack) and initializes it. The state of the task is set to TASK_INTER-RUPTIBLE, meaning that it is in an interruptible sleep. The wait queue entry is then added to the queue (the wait_queue_head_t * argument). Then schedule is called, which relinquishes the processor to somebody else. schedule returns only when somebody else has woken up the process and set its state to TASK_RUNNING. At that point, the wait queue entry is removed from the queue, and the sleep is done.

Figure 5-1 shows the internals of the data structures involved in wait queues and how they are used by processes.

wait_queue_head_t spinlock_t lock;

structlist_head task_list;

wait_queue_t struct task_struct *task;

struct list_head task_list;

The device structure with its wait_queue_head_t

The struct wait_queue itself

The current process and its associated stack page

Another process and its associated stack page

Wait Queues in Linux 2.4

No process is sleeping on the queue f-

The current process is sleeping on the device's queue

Several processes are sleeping on the same queue f

Figure 5-1. Wait queues in Linux 2.4

A quick look through the kernel shows that a great many procedures do their sleeping "manually" with code that looks like the previous example. Most of those implementations date back to kernels prior to 2.2.3, before wait_event was introduced. As suggested, wait_event is now the preferred way to sleep on an event, because interruptible_sleep_on is subject to unpleasant race conditions. A full description of how that can happen will have to wait until "Going to Sleep Without Races" in Chapter 9; the short version, simply, is that things can change in the time between when your driver decides to sleep and when it actually gets around to calling interruptible_sleep_on.

One other reason for calling the scheduler explicitly, however, is to do exclusive waits. There can be situations in which several processes are waiting on an event; when wake_up is called, all of those processes will try to execute. Suppose that the event signifies the arrival of an atomic piece of data. Only one process will be able to read that data; all the rest will simply wake up, see that no data is available, and go back to sleep.

This situation is sometimes referred to as the ''thundering herd problem.'' In highperformance situations, thundering herds can waste resources in a big way. The creation of a large number of runnable processes that can do no useful work generates a large number of context switches and processor overhead, all for nothing. Things would work better if those processes simply remained asleep.

For this reason, the 2.3 development series added the concept of an exclusive sleep. If processes sleep in an exclusive mode, they are telling the kernel to wake only one of them. The result is improved performance in some situations.

The code to perform an exclusive sleep looks very similar to that for a regular sleep:

void simplified_sleep_exclusive(wait_queue_head_t *queue) {

wait_queue_t wait;

init_waitqueue_entry(&wait, current);

current->state = TASK_INTERRUPTIBLE | TASK_EXCLUSIVE;

add_wait_queue_exclusive(queue, &wait); schedule();

remove_wait_queue (queue, &wait);

Adding the TASK_EXCLUSIVE flag to the task state indicates that the process is in an exclusive wait. The call to add_wait_queue_exclusive is also necessary, however. That function adds the process to the end of the wait queue, behind all others. The purpose is to leave any processes in nonexclusive sleeps at the beginning, where they will always be awakened. As soon as wake_up hits the first exclusive sleeper, it knows it can stop.

The attentive reader may have noticed another reason to manipulate wait queues and the scheduler explicitly. Whereas functions like sleep_on will block a process on exactly one wait queue, working with the queues directly allows sleeping on multiple queues simultaneously. Most drivers need not sleep on more than one queue; if yours is the exception, you will need to use code like what we've shown.

Those wanting to dig even deeper into the wait queue code can look at <linux/sched.h> and kernel/sched.c.

Continue reading here: Sample Implementation scullpipe

Was this article helpful?

0 0