Calculation of Zone Watermarks

Before calculating the various watermarks, the kernel first determines the minimum memory space that must remain free for critical allocations. This value scales nonlinearly with the size of the available RAM. It is stored in the global variable min_free_kbytes. Figure 3-4 provides an overview of the scaling behavior, and the inset — which does not use a logarithmic scale for the main memory size in contrast to the main graph — shows a magnification of the region up to 4 GiB. Some exemplary values to provide a feeling for the situation on systems with modest memory that are common in desktop environments are collected in Table 3-1. An invariant is that not less than 128 KiB but not more than 64 MiB may be used. Note, however, that the upper bound is only necessary on machines equipped with a really satisfactory amount of main memory.3 The file /proc/sys/vm/min_free_kbytes allows reading and adapting the value from userland.

Filling the watermarks in the data structure is handled by init_per_zone_pages_min, which is invoked during kernel boot and need not be started explicitly.4

setup_per_zone_pages_min sets the pages_min, pages_low, and pages_high elements of struct zone. After the total number of pages outside the highmem zone has been calculated (and stored in lowmem_ pages), the kernel iterates over all zones in the system and performs the following calculation:

mm/page_alloc.c void setup_per_zone_pages_min(void) {

unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10); unsigned long lowmem_pages = 0; struct zone *zone; unsigned long flags;

3 In practice, it will be unlikely that such an amount of memory is installed on a machine with a single NUMA node, so it will be hard to actually reach the point where the cutoff is required.

4The functions are not only called from here, but are also invoked each time one of the control parameters is modified via the proc filesystem.

for_each_zone(zone) { u64 tmp;

tmp = (u64)pages_min * zone->present_pages; do_div(tmp,lowmem_pages); if (is_highmem(zone)) { int min_pages;

min_pages = zone->present_pages / 1024; if (min_pages < SWAP_CLUSTER_MAX)

min_pages = SWAP_CLUSTER_MAX; if (min_pages > 128)

zone->pages_low = zone->pages_min + (tmp >> 2); zone->pages_high = zone->pages_min + (tmp >> 1);

Zone memory [GiB]

Figure 3-4: Minimum memory size for critical allocations and zone watermarks depending on the main memory size of a machine (pages_min is nothing other than min_free_kbytes in units of pages).

Table 3-1: Correlation between Main Memory Size and Minimum Memory Available for Critical Allocations.

Main memory


16 MiB

512 KiB

32 MiB

724 KiB

64 MiB

1024 KiB

128 MiB

1448 KiB

256 MiB

2048 KiB

512 MiB

2896 KiB

1024 MiB

4096 KiB

2048 MiB

5792 KiB

4096 MiB

8192 KiB

8192 MiB


16384 MiB

16384 KiB

Figure 3-5: Code flow diagram for init_per_zone_pages_min.

Figure 3-5: Code flow diagram for init_per_zone_pages_min.

The lower bound for highmem zones, swap_cluster_max, is an important quantity for the whole page reclaim subsystem as discussed in Chapter 17. The code there often operates batchwise on page clusters, and swap_cluster_max defines the size of such clusters. Figure 3-4 shows the outcome of the calculations for various main memory sizes. Since high memory is not very relevant anymore these days (most machines with large amounts of RAM use 64-bit CPUs), I have restricted the graph to show the outcomes for regular zones.

Computing lowmem_reserve is done in setup_per_zone_lowmem_reserve. The kernel iterates over all nodes of the system and calculates the minimum reserve for each zone of the node by dividing the total number of page frames in the zone by sysctl_lowmem_reserve_ratio[zone]. The default settings for the divisor are 256 for low memory and 32 for high memory.

+1 -1

Post a comment