Representation of Block Devices

Block devices have a set of characteristic properties managed by the kernel. The kernel uses what is known as request queue management to support communication with devices of this kind as effectively as possible. This facility enables requests to read and write data blocks to be buffered and rearranged. The results of requests are likewise kept in caches so that the data can be read and reread in a very efficient way. This is useful when a process repeatedly accesses the same part of a file or when different processes access the same data in parallel.

A comprehensive network of data structures as described below is needed to perform these tasks. Figure 6-9 shows an overview of the various elements of the block layer.

Figure 6-9: Overview of the block device layer.

Raw block devices are represented by struct block_device, which I discuss further below. Since this structure is managed in an interesting way by the kernel, we need to take a close look at this first.

By convention, the kernel stores the block_device instance associated with a block device immediately in front of the block device's inode. This behavior is implemented by the following data structure:

fs/block_dev.c struct bdev_inode {

struct block_device bdev; struct inode vfs_inode;

All inodes that represent block devices are kept on the bdev pseudo-filesystem (see Section 8.4.1), which is not visible to userland. This allows for using standard VFS functions to work with the collection of block device inodes.

In particular, this is exploited by the auxiliary function bdget. Given a device number represented by a dev_t, the function searches through the pseudo-filesystem to see if a corresponding inode already exists and returns a pointer to it. Thanks to struct bdev_inode, this immediately allows for finding the block_device instance for the device. If the inode does not yet exist because the device has not been opened before, bdget and the pseudo-filesystem automatically ensure that a new bdev_inode is allocated and set up properly.

In contrast to the character device layer, the block device layer provides comprehensive queueing functions as demonstrated by the request queue associated with each device. Queues of this kind are the reason for most of the complexity of the block device layer. As Figure 6-9 shows, the individual array entries (in simplified form) contain pointers to various structures and procedures whose most important elements are as follows:

□ A wait queue to hold both read and write requests to the device.

□ Function pointers to the I/O scheduler implementation used to rearrange requests.

□ Characteristic data such as sector and block size or device capacity.

□ The generic hard disk abstraction genhd that is available for each device and that stores both partitioning data and pointers to low-level operations.

Each block device must provide a probe function that is registered with the kernel either directly by means of register_blkdev_range or indirectly via the gendisk object discussed below using add_disk. The function is invoked by the filesystem code to find the matching gendisk object.

Read and write requests to block devices do not immediately cause the corresponding operations to be executed. Instead, they are collected and transferred to the device later in a kind of concerted action. For this reason, no specific functions to perform the read and write operations are held in the file_operations structure for the corresponding device files. Instead, they contain generic versions such as generic_read_file and generic_write_file, which are discussed in Chapter 8.

What is remarkable is that exclusively generic functions are used — a distinctive feature of block devices. In character devices these functions are represented by driver-specific versions. All hardware-specific details are handled when requests are executed; all other functions work with an abstracted queue and receive their results from buffers and caches that do not interact with the underlying device until it is absolutely necessary. The path from the read or write system call to actual communication with a peripheral device is therefore long and complex.

Continue reading here: Block Devices

Was this article helpful?

0 0