Buffer Head Data Structures

Last Updated on Wed, 06 Jan 2021 | Linux Kernel Reference

As mentioned in Section 13.4.4, each buffer head is stored in a data structure of type buffer_head. These data structures have their own slab allocator cache called bh_cachep, which should not be confused with the buffer cache itself. The slab allocator cache is a memory cache (see Section 3.2.2) for the buffer head objects, meaning that it has no interaction with disks and is simply a way of managing memory efficiently. In contrast, the buffer cache is a disk cache for the data in the buffers.

Each buffer used by a block device driver must have a corresponding buffer head that describes the buffer's current status. The converse is not true: a buffer head may be unused, which means it is not bound to any buffer. The kernel keeps a certain number of unused buffer heads to avoid the overhead of constantly allocating and deallocating memory.

In general, a buffer head may be in any one of the following states:

Unused buffer head

The object is available; the values of its fields are meaningless, except for the b_dev field that stores the value b_free (0xffff).

Buffer head for a cached buffer

Its b_data field points to a buffer stored in the buffer cache. The b_dev field identifies a block device, and the BH_Mapped flag is set. Moreover, the buffer could be one of the following:

Not up-to-date (BH_Uptodate flag is clear)

The data in the buffer is not valid (for instance, the data has yet to be read from disk).

Dirty (BH_Dirty flag set)

The data in the buffer has been modified, and the corresponding block on disk needs to be updated. Locked (BH_Lock flag set)

An I/O data transfer on the buffer contents is in progress.

Asynchronous buffer head

Its b_data field points to a buffer inside a page that is involved in a page I/O operation (see Section 13.4.8.2); in this case, the BH_Async flag is set. As soon as the page I/O operation terminates, the BH_Async flag is cleared, but the buffer head is not freed; rather, it remains allocated and inserted into a simply linked circular list of the page (see the later section Section 14.2.2). Thus, it can be reused without the overhead of always allocating new ones. Asynchronous buffer heads are released whenever the kernel tries to reclaim some memory (see Chapter 16).

Strictly speaking, the buffer cache data structures include only pointers to buffer heads for a cached buffer. For the sake of completeness, we shall examine the data structures and the methods used by the kernel to handle all kinds of buffer heads, not just those in the buffer cache.

14.2.1.1 The list of unused buffer heads

All unused buffer heads are collected in a simply linked list, whose first element is addressed by the unused_list variable. Each buffer head stores the address of the next list element in the b_next_free field. The current number of elements in the list is stored in the nr_unused_buffer_heads variable. The unused_list_lock spin lock protects the list against concurrent accesses in multiprocessor systems.

The list of unused buffer heads acts as a primary memory cache for the buffer head objects, while the bh_cachep slab allocator cache is a secondary memory cache. When a buffer head is no longer needed, it is inserted into the list of unused buffer heads. Buffer heads are released to the slab allocator (a preliminary step to letting the kernel free the memory associated with them altogether) only when the number of list elements exceeds max_unused_buffers (usually 100 elements). In other words, a buffer head in this list is considered an allocated object by the slab allocator and an unused data structure by the buffer cache.

A subset of nr_reserved (usually 80) elements in the list is reserved for page I/O operations. This is done to prevent nasty deadlocks caused by the lack of free buffer heads. As we shall see in Chapter 16, if free memory is scarce, the kernel can try to free a page frame by swapping out some page to disk. To do this, it requires at least one additional buffer head to perform the page I/O file operation. If the swapping algorithm fails to get a buffer head, it simply keeps waiting and lets writes to files proceed to free up buffers, since at least nr_reserved buffer heads are going to be released as soon as the ongoing file operations terminate.

The get_unused_buffer_head( ) function is invoked to get a new buffer head. It essentially performs the following operations:

1. Acquires the unused_list_lock spin lock.

2. If the list of unused buffer heads has more than nr_reserved elements, removes one of them from the list, releases the spin lock, and returns the address of the buffer head.

3. Otherwise, releases the spin lock and invokes kmem_cache_alloc( ) to allocate a new buffer head from the bh_cachep slab allocator cache with priority gfp_nofs (see Section 7.1.5); if the operation succeeds, returns its address.

4. No free memory is available. If the buffer head has been requested for a block I/O operation, returns null (failure).

5. If this point is reached, the buffer head has been requested for a page I/O operation. If the list of unused buffer heads is not empty, acquires the unused_list_lock spin lock, removes one element from the list, releases the spin lock, and returns the address of the buffer head.

6. Otherwise (if the list is empty), returns null (failure).

The put_unused_buffer_head( ) function performs the reverse operation, releasing a buffer head. It inserts the object in the list of unused buffer heads if that list has fewer than max_unused_buffers elements; otherwise, it releases the object to the slab allocator by invoking kmem_cache_free( ) on the buffer head.

14.2.1.2 Lists of buffer heads for cached buffers

When a buffer belongs to the buffer cache, the flags of the corresponding buffer head describe its current status (see Section 13.4.4). For instance, when a block not present in the cache must be read from disk, a new buffer is allocated and the BH_Uptodate flag of the buffer head is cleared because the buffer's contents are meaningless. While filling the buffer by reading from disk, the BH_Lock flag is set to protect the buffer from being reclaimed. If the read operation terminates successfully, the BH_Uptodate flag is set and the BH_Lock flag is cleared. If the block must be written to disk, the buffer content is modified and the BH_Dirty flag is set; the flag is cleared only after the buffer is successfully written to disk.

Any buffer head associated with a used buffer is contained in a doubly linked list, implemented by means of the b_next_free and b_prev_free fields. There are three different lists, identified by an index defined as a macro (buf_clean, buf_locked, and buf_dirty). We'll define these lists in a moment.

The three lists are introduced to speed up the functions that flush dirty buffers to disk (see Section 14.2.4 later in this chapter). For reasons of efficiency, a buffer head is not moved right away from one list to another when it changes status; this makes the following description a bit murky.

BUF_CLEAN

This list collects buffer heads of nondirty buffers (BH_Dirty flag is off). Notice that buffers in this list are not necessarily up to date — that is, they don't necessarily contain valid data. If the buffer is not up to date, it could even be locked (BH_Lock is on) and selected to be read from the physical device while being on this list. The buffer heads in this list are guaranteed only to be not dirty; in other words, the corresponding buffers are ignored by the functions that flush dirty buffers to disk.

BUF DIRTY

This list mainly collects buffer heads of dirty buffers that have not been selected to be written into the physical device — that is, dirty buffers that have not yet been included in a block request for a block device driver (BH_Dirty is on and BH_Lock is off). However, this list could also include nondirty buffers, since in a few cases, the BH_Dirty flag of a dirty buffer is cleared without flushing it to disk and without removing the buffer head from the list (for instance, whenever a floppy disk is removed from its drive without unmounting—an event that most likely leads to data loss, of course).

BUF_LOCKED

This list mainly collects buffer heads of buffers that have been selected to be read from or written to the block device (BH_Lock is on; BH_Dirty is clear because the add_request( ) function resets it before including the buffer head in a block request). However, when a block I/O operation for a locked buffer is completed, the low-level block device handler clears the BH_Lock flag without removing the buffer head from the list (see Section 13.4.7). The buffer heads in this list are guaranteed only to be not dirty, or dirty but selected to be written.

For any buffer head associated with a used buffer, the b_list field of the buffer head stores the index of the list containing the buffer. The lru_list array^2! stores the address of the first element in each list, the nr_buffers_type array stores the number of elements in each list, and the size_buffers_type array stores the total capacity of the buffers in each list (in byte). The lru_list_lock spin lock protects these arrays from concurrent accesses in multiprocessor systems.

Continue reading here: [2 The name of the array derives from the abbreviation for Least Recently Used In earlier versions of Linux these lists were ordered according to the time each buffer was last accessed

Was this article helpful?