/* A control structure which tells the writeback code what to do. */ struct writeback_control {

struct backing_dev_info *bdi; /* If !NULL, only write back this queue */

enum writeback_sync_modes sync_mode;

unsigned long *older_than_this; /* If !NULL, only write back inodes older than this */ long nr_to_write; /* Write this many pages, and decrement this for each page written */ long pages_skipped; /* Pages which were not written */

loff_t range_start; loff_t range_end;

unsigned nonblocking:1; /* Don't get stuck on request queues */

unsigned encountered_congestion:1; /* An output: a queue is full */ unsigned for_kupdate:1; /* A kupdate writeback */

unsigned for_reclaim:1; /* Invoked from the page allocator */

unsigned for_writepages:1; /* This is a writepages() call */

unsigned range_cyclic:1; /* range_start is cyclic */

The meanings of the structure elements are as follows:

□ bdi points to a structure of type backing_dev_info, which summarizes information on the underlying storage medium. This structure is discussed briefly in Chapter 16. Two things interest us here. First, the structure provides a variable to hold the status of the writeback queue (this means, e.g., that congestion can be signaled if there are too many write requests), and second, it allows RAM-based filesystems that do not have a (block device) backing store to be labeled — writeback operations to systems of this kind make no sense.

□ sync_mode distinguishes between three different synchronization modes: <writeback.h>

enum writeback_sync_modes {

WB_SYNC_NONE, /* Don't wait on anything */ WB_SYNC_ALL, /* Wait on every mapping */

WB_SYNC_HOLD, /* Hold the inode on sb_dirty for sys_sync() */

To synchronize data, the kernel needs to pass a corresponding write request to the underlying block device. Requests to block devices are asynchronous by nature. If the kernel wants to ensure that the data have safely reached the device, it needs to wait for completion after the request has been issued. This behavior is mandated with wb_sync_all. Waiting for writeback to complete is performed in_sync_single_inode discussed below; recall from Figure 17-1 that it sits at the bottom of the mechanism, where it is responsible to delegate synchronization of a single inode to the filesystem-specific methods. All functions that wait on inodes because wb_sync_all is set are marked in Figure 17-1.

Notice that writeback with wb_sync_all set is referred to as data integrity writeback. If a system crash happens immediately after writeback in this mode has been finished, no data are lost because everything is synchronized with the underlying block devices.

If wb_sync_none is used, the kernel will send the request, but continue with the remaining synchronization work immediately afterward. This mode is also referred to as flushing writeback.

wb_sync_hold is a special form used for the sync system call that works similarly to wb_sync_none. The exact differences are subtle and are discussed in Section 17.15.

□ When the kernel performs writeback, it must decide which dirty cache data need to be synchronized with the backing store. It uses the older_than_this and nr_to_write elements for this purpose. Data are written back if they have been dirty for longer than specified by older_than_this.

older_than_this is defined as a pointer type, which is unusual for a single long value. Its numeric value, which can be obtained by appropriate de-referencing, is of interest. If the pointer is null, then age checking is not performed, and all objects are synchronized irrespective of when they became dirty. Setting nr_to_write to 0 likewise disables any upper limit on the number of pages that are supposed to be written back.

□ nr_to_write can restrict the maximal number of pages that should be written back. The upper bound for this is given by max_writeback_pages, which is usually set to 1,024.

□ If pages were selected to be written back, functions from lower layers perform the required operations. However, they can fail for various reasons, for instance, because the page is locked from some other part of the kernel. The number of skipped pages can be reported to higher layers via the counter pages_skipped.

□ The nonblocking flag specifies whether writeback queues block or not in the event of congestion (more pending write operations than can be effectively satisfied). If they are blocked, the kernel waits until the queue is free. If not, it relinquishes control. The write operation is then resumed later.

□ encountered_congestion is also a flag to signal to higher layers that congestion has occurred during data writeback. It is a Boolean variable and accepts the values 1 or 0.

□ for_kupdated is set to 1 if the write request was issued by the periodic mechanism. Otherwise, its value is 0. for_reclaim and for_writepages are used in a similar manner: They are set if the writeback operation was initiated from memory reclaim from the do_writepages function, respectively.

□ If range_cyclic is set to 0, the writeback mechanism is restricted to operate on the range given by range_start and range_end. The limits refer to the mapping for which the writeback was initiated.

If range_cyclic is set to 1, the kernel may iterate many times over the pages associated with a mapping, thus the name of the element.

Continue reading here: Adjustable Parameters

Was this article helpful?

0 0