Characterization of Swap Areas

struct swap_info_struct describes a swap area and is defined as follows: <swap.h>

struct swap_info_struct {

unsigned int flags;

struct file *swap_file;

struct block_device *bdev;

struct list_head extent_list;

struct swap_extent *curr_swap_extent;

unsigned short * swap_map;

unsigned int lowest_bit;

unsigned int highest_bit;

unsigned int cluster_next;

unsigned int cluster_nr;

unsigned int pages;

unsigned int max;

unsigned int inuse_pages;

The main data on the swap state can be quickly queried with the help of the proc filesystem:

[email protected]> cat /proc/swaps





















^During the development of kernel 2.6.18, the ability to migrate pages physically between NUMA nodes while keeping their virtual addresses has been added. This requires using two swap_info entries to handle pages that are currently under migration, so the number of possible swap files is reduced. The configuration option MIGRATION is required to include the page migration code. This is, for instance, helpful on NUMA systems, where pages can be moved nearer to processors using them, or for memory hot remove. Page migration, however, is not considered in detail in this book.

A dedicated partition and two files are used to accommodate the swap areas in this example. The swap partition has the highest priority and is therefore used preferentially by the kernel. Both files have priority 0 and are used on the basis of a round robin process when no more space is available on the partition with priority 0. (How it can nevertheless occur that there are data in the swap files although the swap partition is not completely full, as indicated by the proc output, is explained below.)

What is the meaning of the various elements in the swap_info_struct structure? The first entries are used to hold the classical management data required for swap areas:

□ The state of the swap area can be described with various flags stored in the flags element. SWP_USED specifies that the entry in the swap array is used. Since the array is otherwise filled with zeros, a distinction can easily be made between used and unused elements. SWP_WRITEOK specifies that the swap area may be written to. Both flags are set after a swap area has been inserted into the kernel; the abbreviation for this state is SWP_ACTIVE.

□ swap_file points to the file structure associated with the swap area (the layout and contents of the structure are discussed in Chapter 8). With swap partitions, there is a pointer to the device file of the partition on the block device (in our example, /dev/hda5). With swap files, this pointer is to the file instance of the relevant file, that is, /mnt/swap1 or /tmp/swap2 in our example.

□ bdev points to the block_device structure of the underlying block device.

Even if all swap areas in our example are located on the same block device (/dev/hda), all three entries point to different instances of the data structure. This is because the two files are on different partitions of the hard disk and the swap partition is a separate partition anyway. Since, in structural terms, the kernel manages partitions essentially as if they were autonomous block devices, this results in three different pointers to the three swap areas, although all are located on the same disk.

□ The relative priority of a swap area is held in the prio element. Since this is a signed data type, both positive and negative priorities are possible. As already noted, the higher a swap partition's priority is, the more important the swap partition is.

□ The total number of usable page slots, each of which can store a complete memory page, is held in pages. For example, the swap partition in our sample mapping has space for 34,128 pages, which, given a page size of 4 KiB in the IA-32 system used for the mapping, corresponds to a memory volume of ^ 128 MiB.

□ max yields the total number of page slots that the swap area contains. In contrast to pages, not just usable pages but all pages are counted here — including those that (owing to block device faults, e.g.) are defective or are used for management purposes. Because defective blocks are extremely rare on state-of-the-art hard disks (and swap partitions need not necessarily be created in such an area), max is typically only 1 greater than pages, as is the case with all three swap areas in the example above. There are two reasons for this one-page difference. First, the very first page of a swap area is used by the kernel for identification purposes (after all, totally random parts of the disk should not be overwritten with swap data). Second, the kernel also uses the first slot to store state information, such as the size of the area and a list of defective section, and this information must be permanently retained.

□ swap_map is a pointer to an array of short integers (which is unsurprisingly referred to as swap map in the following) that contains as many elements as there are page slots in the swap area. It is used as an access counter for the individual slots to indicate the number of processes that share the swapped-out pages.

□ The kernel uses a somewhat unusual method to link the various elements in the swap list according to priority. Since the data of the various areas are arranged in the elements of a linear array, the next variable is defined to create a relative order between the areas despite the fixed array positions. next is used as an index for swap_info[]. This enables the kernel to track the individual entries according to their priority.

But how is it possible to determine which swap area is to be used first? Since this area is not necessarily located at the first array position, the kernel also defines the global variable swap_list in mm/swapfile.c. It is an instance of the swap_list_t data type defined specifically for the purpose of finding the first swap area:

Continue reading here: Info

Was this article helpful?

0 0