There is a clear relationship between flushing, swapping, and releasing pages. Not only the state of memory pages but also the size of free memory needs checking regularly. When this is done, unused or seldom used pages are swapped out automatically but not before the data they hold have been synchronized with the backing store to prevent data loss. In the case of dynamically generated pages, the system swap areas act as the backing stores. The swap areas for pages mapped from files are the corresponding sections in the underlying filesystems. If there is an acute scarcity of memory, flushing of dirty data must be enforced in order to obtain clean pages.

Synchronization between memory/cache and backing store is split into two conceptually different parts:

□ Policy routines control when data are exchanged. System administrators can set various parameters to help the kernel decide when to exchange data as a function of system load.

□ The technical implementation deals with the hardware-related details of synchronization between cache and backing store and ensures that the instructions issued by the policy routines are carried out.

Synchronization and swapping must not be confused with each other. Whereas synchronization simply aligns the data held in RAM and in the backing store, swapping results in the flushing of data from RAM to free space for higher-priority items. Before data are cleared from RAM, they are synchronized with the data in the associated backing store.

The mechanisms for flushing data are triggered for different reasons and at different times:

□ Periodic kernel threads scan the lists of dirty pages and pick some to be written back based on the time at which they became dirty. If the system is not too busy with write operations, there is an acceptable ratio between the number of dirty pages and the load imposed on the system by the hard disk access operations needed to flush the pages.

□ If there are too many dirty pages in the system as a result, for example, of a massive write operation, the kernel triggers further mechanisms to synchronize pages with the backing store until the number of dirty pages returns to an acceptable level. What is meant by ''too many dirty pages'' and ''acceptable level'' is a moot point, discussed below.

□ Various components of the kernel require that data must be synchronized when a special event has happened, for instance, when a filesystem is re-mounted.

The first two mechanisms are implemented by means of the kernel thread pdflush which executes the synchronization code, while the third alternative can be triggered from many points in the kernel.

Since the implementation of data synchronization consists of an unusually large number of interconnected functions, an overview of what lies ahead of us precedes a detailed discussion of everything in detail. Figure 17-1 show the dependence among the functions that constitute the implementation. The figure is not a proper code flow diagram, but just shows how the functions are related to each other and which code paths are possible. The diagram concentrates on synchronization operations originating from the pdflush thread, system calls, and explicit requests from filesystem-related kernel components.

The kernel can start to synchronize data from various different places, but all paths save one end up in sync_sb_inodes. The function is responsible to synchronize all dirty inodes belonging to a given superblock, and writeback_single_inode is used for each inode. Both the sync system call and numerous generic kernel layers (like the partition code or the block layer) make use of this possibility.

On the other hand, the need to synchronize the dirty inodes of all superblocks in the system can also arise. This is especially required for periodic and forced writeback. When dirtying data in filesystem code, the kernel additionally ensures that the number of dirty pages does not get out of hand by starting synchronization before this happens.

Data integrity synchronization Flushing synchronization i-1 i-1

pdflush i-1

Data integrity synchronization Flushing synchronization i-1 i-1

pdflush i-1

Synchronizing all dirty inodes of a superblock is often much too coarse grained for filesystems. They often require synchronizing a single dirty inode and thus use writeback_single_inode directly.

Even if the synchronization implementation is centered around inodes, this does not imply that the mechanisms just work for data contained in mounted filesystems. Recall that raw block devices are represented by inodes via the bdev pseudo-filesystem as discussed in Section 10.2.4. The synchronization methods therefore also affect raw block devices in the same way as regular filesystem objects — good news for everyone who wants to access data directly.

One remark on terminology: When I talk about inode synchronization in the following, I always mean synchronization of both the inode metadata and the raw data managed by the inode. For regular files, this means that the synchronization code's aim is to both transfer time stamps, attributes, and the like, as well as the contents of the file to the underlying block device.

Continue reading here: The pdflush Mechanism

Was this article helpful?

0 0