Buffer Pages

Although the page cache and the buffer cache are different disk caches, in Version 2.4 of Linux, they are somewhat intertwined.

In fact, for reasons of efficiency, buffers are not allocated as single memory objects; instead, buffers are stored in dedicated pages called buffer pages. All the buffers within a single buffer page must have the same size; hence, on the 80 x 86 architecture, a buffer page can include from one to eight buffers, depending on the block size.

A stronger constraint, however, is that all the buffers in a buffer page must refer to adjacent blocks of the underlying block device. For instance, suppose that the kernel wants to read a 1,024-byte inode block of a regular file. Instead of allocating a single 1,024-byte buffer for the inode, the kernel must reserve a whole page storing four buffers; these buffers will contain the data of a group of four adjacent blocks on the block device, including the requested inode block.

It is easy to understand that a buffer page can be regarded in two different ways. On one hand, it is the "container" for some buffers, which can be individually addressed by means of the buffer cache. On the other hand, each buffer page contains a 4,096-byte portion of a block device file, hence, it can be included in the page cache. In other words, the portion of RAM cached by the buffer cache is always a subset of the portion of RAM cached by the page cache. The benefit of this mechanism consists of dramatically reducing the synchronization problems between the buffer cache and the page cache.

In the 2.2 version of the kernel, the two disk caches were not intertwined. A given physical block could have two images in RAM: one in the page cache and the other in the buffer cache. To avoid data loss, whenever one of the two block's memory images is modified, the 2.2 kernel must also find and update the other memory image. As you might imagine, this is a costly operation.

By way of contrast, in Linux 2.4, modifying a buffer implies modifying the page that contains it, and vice versa. The kernel must only pay attention to the "dirty" flags of both the buffer heads and the page descriptors. For instance, whenever a buffer head is marked as "dirty," the kernel must also set the PG_dirty flag of the page that contains the corresponding buffer.

Buffer heads and page descriptors include a few fields that define the link between a buffer page and the corresponding buffers. If a page acts as a buffer page, the buffers field of its page descriptor points to the buffer head of the first buffer included in the page; otherwise the buffers field is null. In turn, the b_this_page field of each buffer head implements a simply linked circular list that includes all buffer heads of the buffers stored in the buffer page. Moreover, the b_page field of each buffer head points to the page descriptor of the corresponding buffer page. Figure 14-1 shows a buffer page containing four buffers and the corresponding buffer heads.

Figure 14-1. A buffer page including four buffers and their buffer heads

Figure 14-1. A buffer page including four buffers and their buffer heads

There is a special case: if a page has been involved in a page I/O operation (see Section 13.4.8.2), the kernel might have allocated some asynchronous buffer heads and linked them to the page by means of the buffers and b_this_page fields. Thus, a page could act as a buffer page under some circumstances, even though the corresponding buffer heads are not in the buffer cache.

14.2.2.1 Allocating buffer pages

The kernel allocates a new buffer page when it discovers that the buffer cache does not include data for a given block. In this case, the kernel invokes the grow_buffers( ) function, passing to it three parameters that identify the block:

• The block device number — the major and minor numbers of the device

• The logical block number — the position of the block inside the block device

The function essentially performs the following actions:

1. Computes the offset index of the page of data within the block device that includes the requested block.

2. Gets the address bdev of the block device descriptor (see Section 13.4.1).

3. Invokes grow_dev_page( ) to create a new buffer page, if necessary. In turn, this function performs the following substeps:

a. Invokes find_or_create_page( ) , passing to it the address_space object of the block device (bdev->bd_inode->i_mapping) and the page offset index. As described in the earlier section Section 14.1.3, find_or_create_page( ) looks for the page in the page cache and, if necessary, inserts a new page in the cache.

b. Now the page cache is known to include a descriptor for our page. The function checks its buffers field; if it is null, the page has not yet been filled with buffers and the function jumps to Step 3e.

c. Checks whether the size of the buffers on the page is equal to the size of the requested block; if so, returns the address of the page descriptor (the page found in the page cache is a valid buffer page).

d. Otherwise, checks whether the buffers found in the page can be released by invoking try_to_free_buffers( If the function fails, presumably because some process is using the buffers, grow_dev_page( ) returns null (it was not able to allocate the buffer page for the requested block).

[3] This can happen when the page was previously involved in a page I/O operation using a different block size, and the corresponding asynchronous buffer heads are still allocated.

4. Invokes the create_buffers( ) function to allocate the buffer heads for the blocks of the requested size within the page. The address of the buffer head for the first buffer in the page is stored in the buffers field of the page descriptor, and all buffer heads are inserted into the simply linked circular list implemented by the b_this_page fields of the buffer heads. Moreover, the b_page fields of the buffer heads are initialized with the address of the page descriptor.

5. Returns the page descriptor address.

• If grow_dev_page( ) returned null, returns 0 (failure).

• Invokes the hash_page_buffers( ) function to initialize the fields of all buffer heads in the simply linked circular list of the buffer page and insert them into the buffer cache.

• Unlocks the page (the page was locked by find_or_create_page( ))

• Decrements the page's usage counter (again, the counter was incremented by find_or_create_page( ))

• Increments the buffermem_pages variable, which stores the total number of buffer pages — that is, the memory currently cached by the buffer cache in page-size units.

Was this article helpful?

0 0

Post a comment