Sample Implementation scullpipe
The /dev/scullpipe devices (there are four of them by default) are part of the scull module and are used to show how blocking I/O is implemented.
Within a driver, a process blocked in a read call is awakened when data arrives; usually the hardware issues an interrupt to signal such an event, and the driver awakens waiting processes as part of handling the interrupt. The scull driver works differently, so that it can be run without requiring any particular hardware or an interrupt handler. We chose to use another process to generate the data and wake the reading process; similarly, reading processes are used to wake sleeping writer processes. The resulting implementation is similar to that of a FIFO (or named pipe) filesystem node, whence the name.
The device driver uses a device structure that embeds two wait queues and a buffer. The size of the buffer is configurable in the usual ways (at compile time, load time, or runtime).
typedef struct Scull_Pipe {
wait_queue_head_t inq, outq; /* read and write queues */ char *buffer, *end; /* begin of buf, end of buf */
int buffersize; /* used in pointer arithmetic */
char *rp, *wp; /* where to read, where to write */
int nreaders, nwriters; /* number of openings for r/w */
struct fasync_struct *async_queue; /* asynchronous readers */ struct semaphore sem; /* mutual exclusion semaphore */
devfs_handle_t handle; /* only used if devfs is there */
The read implementation manages both blocking and nonblocking input and looks like this (the puzzling first line of the function is explained later, in "Seeking a Device"):
ssize_t scull_p_read (struct file *filp, char *buf, size_t count, loff_t *f_pos)
Scull_Pipe *dev = filp->private_data;
if (down_interruptible(&dev->sem))
return -ERESTARTSYS; while (dev->rp == dev->wp) { /* nothing to read */ up(&dev->sem); /* release the lock */ if (filp->f_flags & O_NONBLOCK)
return -EAGAIN; PDEBUG("\M%s\" reading: going to sleep\n", current->comm); if (wait_event_interruptible(dev->inq, (dev->rp != dev->wp)))
return -ERESTARTSYS; /* signal: tell the fs layer to handle it */ /* otherwise loop, but first reacquire the lock */ if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
/* ok, data is there, return something */ if (dev->wp > dev->rp)
count = min(count, dev->wp - dev->rp); else /* the write pointer has wrapped, return data up to dev->end */
count = min(count, dev->end - dev->rp); if (copy_to_user(buf, dev->rp, count)) { up (&dev->sem); return -EFAULT;
dev->rp += count; if (dev->rp == dev->end)
dev->rp = dev->buffer; /* wrapped */ up (&dev->sem);
/* finally, awaken any writers and return */ wake_up_interruptible(&dev->outq);
PDEBUG("V%sV did read %li bytes\n",current->comm, (long)count); return count;
As you can see, we left some PDEBUG statements in the code. When you compile the driver, you can enable messaging to make it easier to follow the interaction of different processes.
Note also, once again, the use of semaphores to protect critical regions of the code. The scull code has to be careful to avoid going to sleep when it holds a semaphore—otherwise, writers would never be able to add data, and the whole thing would deadlock. This code uses wait_event_intermptible to wait for data if need be; it has to check for available data again after the wait, though. Somebody else could grab the data between when we wake up and when we get the semaphore back.
It's worth repeating that a process can go to sleep both when it calls schedule, either directly or indirectly, and when it copies data to or from user space. In the latter case the process may sleep if the user array is not currently present in main memory. If scull sleeps while copying data between kernel and user space, it will sleep with the device semaphore held. Holding the semaphore in this case is justified since it will not deadlock the system, and since it is important that the device memory array not change while the driver sleeps.
The if statement that follows interruptible_sleep_on takes care of signal handling. This statement ensures the proper and expected reaction to signals, which could have been responsible for waking up the process (since we were in an interrupt-ible sleep). If a signal has arrived and it has not been blocked by the process, the proper behavior is to let upper layers of the kernel handle the event. To this aim, the driver returns -ERESTARTSYS to the caller; this value is used internally by the virtual filesystem (VFS) layer, which either restarts the system call or returns -EINTR to user space. We'll use the same statement to deal with signal handling for every read and write implementation. Because signal_pending was introduced only in version 2.1.57 of the kernel, sysdep.h defines it for earlier kernels to preserve portability of source code.
The implementation for write is quite similar to that for read (and, again, its first line will be explained later). Its only ''peculiar'' feature is that it never completely fills the buffer, always leaving a hole of at least one byte. Thus, when the buffer is empty, wp and rp are equal; when there is data there, they are always different.
static inline int spacefree(Scull_Pipe *dev)
return dev->buffersize - 1; return ((dev->rp + dev->buffersize - dev->wp) % dev->buffersize) - 1;
ssize_t scull_p_write(struct file *filp, const char *buf, size_t count, loff_t *f_pos)
Scull_Pipe *dev = filp->private_data;
if (down_interruptible(&dev->sem)) return -ERESTARTSYS;
/* Make sure there's space to write */ while (spacefree(dev) == 0) { /* full */ up(&dev->sem);
return -EAGAIN; PDEBUG("\"%s\" writing: going to sleep\n",current->comm); if (wait_event_interruptible(dev->outq, spacefree(dev) > 0))
return -ERESTARTSYS; /* signal: tell the fs layer to handle it */ if (down_interruptible(&dev->sem)) return -ERESTARTSYS;
/* ok, space is there, accept something */ count = min(count, spacefree(dev)); if (dev->wp >= dev->rp)
count = min(count, dev->end - dev->wp); /* up to end-of-buffer */ else /* the write pointer has wrapped, fill up to rp-1 */
count = min(count, dev->rp - dev->wp - 1); PDEBUG("Going to accept %li bytes to %p from %p\n",
(long)count, dev->wp, buf); if (copy_from_user(dev->wp, buf, count)) { up (&dev->sem); return -EFAULT;
dev->wp += count; if (dev->wp == dev->end)
dev->wp = dev->buffer; /* wrapped */ up(&dev->sem);
wake_up_interruptible(&dev->inq); /* blocked in read() and select() */
/* and signal asynchronous readers, explained later in Chapter 5 */ if (dev->async_queue)
kill_fasync(&dev->async_queue, SIGIO, POLL_IN); PDEBUG("\M%s\" did write %li bytes\n",current->comm, (long)count); return count;
The device, as we conceived it, doesn't implement blocking open and is simpler than a real FIFO. If you want to look at the real thing, you can find it in fs/pipe.c, in the kernel sources.
To test the blocking operation of the scullpipe device, you can run some programs on it, using input/output redirection as usual. Testing nonblocking activity is trickier, because the conventional programs don't perform nonblocking operations. The misc-progs source directory contains the following simple program, called nbtest, for testing nonblocking operations. All it does is copy its input to its output, using nonblocking I/O and delaying between retrials. The delay time is passed on the command line and is one second by default.
fcntl(0, F_SETFL, fcntl(0,F_GETFL) | O_NONBLOCK); /* stdin */
fcntl(1, F_SETFL, fcntl(1,F_GETFL) | O_NONBLOCK); /* stdout */
m=write(1, buffer, n); if ((n<0 || m<0) && (errno != EAGAIN))
break; sleep(delay);
perror( n<0 ? "stdin" : "stdout"); exit(1);
Continue reading here: Poll and select
Was this article helpful?
