referenced = page_referenced(page, 1);

/* In active use or really unfreeable? Activate it. */

if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&

referenced && page_mapping_inuse(page)) /* Set PG_active flag and keep page */

page_referenced checks (as discussed above) if the page was recently referenced by any of its users. This alone, however, is not sufficient to prevent reclaiming the page. Additionally, the allocation order for which the current reclaim pass works must be below or equal to page_alloc_costly_order, that is, less than or equal to eight pages. Besides, the page must fulfill one of the following conditions:

□ The page is mapped into a page table as checked by page_mapped — see Section 4.8.3 — or is used in a user-mode virtual address space.

□ The page is contained in the swap cache.

□ The page is contained in an anonymous mapping.

□ The page is mapped into userland via a file mapping. This case is not checked with the help of page tables, but by mapping->i_mmap and mapping_i_map_nonlinear, which contain the mapping information for regular and nonlinear mappings.

page_mapping_in_use checks for these conditions. Fulfilling any of them does not mean that the page cannot be reclaimed at all — the pressure from high allocation orders that wait to be fulfilled just needs to be large enough.

Recall that shrink_inactive_list can call shrink_page_list twice: first in asynchronous and then in synchronous writeback mode. Therefore, it can happen that the considered page is currently under writeback as indicated by the page flag PG_writeback. If the current pass requests synchronous writeback, then wait_on_page_writeback is used to wait until all pending I/O operations on the page have been finished.

If the page currently being considered by shrink_page_list is not associated with a backing store, then the page has been generated anonymously by a process. When pages of this type must be reclaimed, their data are written into the swap area. When a page of this type is encountered and no swap slot has been reserved yet, add_to_swap is invoked to reserve a slot and add the page to the swap cache. At the same time, the relevant page instance is provided with swapper_space (see Section 18.4.2) as a mapping so that it can be handled in the same way as all other pages that already have a mapping.

If the page is mapped into the address tables of one or more processes (as before, checked using page_mapped), the page table entries that point to the page must be removed from the page tables of all processes that reference it. The rmap subsystem provides the try_to_unmap function for this purpose; it unmaps the page from all processes that use it (we do not examine this function in detail because its implementation is not particularly interesting). In addition, the architecture-specific page table entries are replaced with a reference indicating where the data can now be found. This is done in try_to_unmap_one. The necessary information is obtained from the page's address space structure, which contains all backing store data. It is important that two bits are not set in the new page table entry:

□ A missing _page_present bit indicates that the page has been swapped out. This is important when a process accesses the page: A page fault is generated, and the kernel needs to detect that the page has been swapped out.

□ A missing _page_file bit indicates that the page is in the swap cache. Recall from Section 4.7.3 that page table entries used for nonlinear mappings also lack _page_present, but can be distinguished from swap pages by a set _page_file bit.

Clearing the page table entry with ptep_clear_flush delivers a copy of the previous page table entry (PTE). If it contains the dirty bit, then the page was modified by some user during the reverse mapping process. It needs to be synchronized with the backing store (in this case the swap space) in shrink_page_list afterward. Therefore the dirty bit is transferred from the PTE to the page bit PG_dirty.

Let's turn our attention back to shrink_page_list. What now follows is a series of queries that, depending on page state, trigger all the operations needed to reclaim the page.

PageDirty checks whether the page is dirty and must therefore be synchronized with the underlying storage medium. This also includes pages that live in the swap address space. If the page is dirty, this requires a few actions that are represented by Part 2 in Figure 18-17. They are better discussed by looking at the code itself.

□ The kernel ensures that the data are written back by invoking the writepage address space routine (which is called by the pageout helper function that supplies all the required arguments). If the data were mapped from a file in the filesystem, a filesystem-specific routine handles the appropriate synchronization, and swap pages are inserted in their assigned page slot using swap_writepage.

□ Depending on the result of pageout, different actions are required: mm/vmscan.c

/* Page is dirty, try to write it out here */ switch (pageout(page, mapping, sync_writeback)) { case PAGE_KEEP:

goto keep_locked; case PAGE_ACTIVATE:

goto activate_locked; case PAGE_SUCCESS:

if (PageWriteback(page) || PageDirty(page)) goto keep;


The sync_writeback parameter to pageout denotes the writeback mode in which shrink_page_list is operating.

The most desirable return code is page_clean: The data are synchronized with the backing store and the memory can be reclaimed — this happens in Part 3 of the code flow diagram.

If a write request was successfully issued to the block layer, then page_success is returned. In asynchronous writeback mode, the page will usually still be under writeback when pageout returns, and jumping to the label keep just keeps the page on the local page list, which is returned to the shrink_list — they will be returned to the LRU lists there. Once the write operation has been performed, the page contents are synchronized with the backing store so that the page is no longer dirty the next time shrink_list is invoked and can therefore be swapped out.

If the write operation was already finished when pageout returned, the data have been written back, and the kernel can continue with Step 3.

If an error has occurred during writeback, the result is either page_keep or page_keep_activate. Both make the function keep the page on the aforementioned return list, but page_keep_ activate additionally sets the page state to PG_active (this can, e.g., happen if the page's address space does not provide a writeback method, which makes trying to synchronize the page useless).

Figure 18-18 shows the code flow diagram for the case that the page is not dirty. Keep in mind that the kernel can also reach this path coming from Step 2.

Figure 18-18: Code flow diagram for shrink_list (Part 3).

□ try_to_release is invoked if the page has private data and buffers are therefore associated with the page (this is typically the case with pages that contain filesystem metadata). This function attempts either to release the page using the releasepage operation in the address space structure or, if there is no mapping, to free the data using try_to_free_buffers.

□ The kernel then detaches the page from its address space. The auxiliary function remove_mapping is provided for this purpose.

If the page is held in the swap cache, it is certain that the data are by now present both in the swap space and in the swap cache. Since the page has been swapped out, the swap cache has fulfilled its duty, and the page can be removed from there with_delete_from_swap_cache.

The kernel additionally uses swap_free to decrement the usage counter of the page in the swap area. This is necessary to reflect the fact that there is no longer a reference to the page in the swap cache.

□ If the page is not in the swap cache, it is removed from the general page cache using _remove_from_page_cache.

It is now guaranteed that the processed page is not present in the kernel's data structures. Nevertheless, the main issue has not been resolved — the RAM memory occupied by the page has not yet been freed. The kernel does this in chunks using page vectors. The page to be freed is inserted in the local freed_pvec page vector using pagevec_add. When this vector is full, all its elements are released collectively by means of_pagevec_release_nonlru. As discussed in Section 18.6.2, the function returns the memory space occupied by the pages to the buddy system. The memory reclaimed in this way can be used for more important tasks — and this is precisely the purpose of swapping and page reclaim.

A few trivial points need to be cleared up once shrink_list has iterated over all the pages passed:

□ The kernel's swapping statistics are updated.

□ The number of freed pages is returned as an integer result.

Continue reading here: The Swap Token

Was this article helpful?

0 0