Demand Allocation Paging

Allocation of pages on demand is delegated to do_linear_fault, which is defined in mm/memory.c. After some parameter conversion, the work is delegated to_do_fault, and the code flow diagram of this function is shown in Figure 4-19.

First of all, the kernel has to make sure that the required data are read into the faulting page. How this is handled depends on the file that is mapped into the faulting address space, and therefore a file-specific method is invoked to obtain the data. Usually, it is stored in vm->vm_ops->fault. Since earlier kernel versions used a method with a different calling convention, the kernel must account for the situation in which some code has not yet been updated to stick to the new convention. Therefore, the old variant vm->vm_ops->nopage is invoked if no fault method is registered.

Most files use filemap_fault to read in the required data. The function not only reads in the required data, but also implements readahead functionality, which reads in pages ahead of time that will most likely be required in the future. The mechanisms needed to do this are introduced in Chapter 16, which discusses the function in greater length. At the moment, all we need to know is that the kernel reads the data from the backing store into a physical memory page using the information in the address_space object.

Flow Chart Halo
Figure 4-19: Code flow diagram for_do_fault.

Given the vm_area_struct region involved, how can the kernel choose which method to use to read the page?

1. The mapped file object is found using vm_area_struct->vm_file.

2. A pointer to the mapping itself can be found in file->f_mapping.

3. Each address space has special address space operations from which the readpage method can be selected. The data are transferred from the file into RAM memory using mapping-> a_ops->readpage(file, page).

If write access is required, the kernel has to distinguish between shared and private mappings. For private mappings, a copy of the page has to be prepared.

mm/memory.c static int _do_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, pgoff_t pgoff, unsigned int flags, pte_t orig_pte)

if (!(vma->vm_flags & VM_SHARED)) { anon = 1;

if (unlikely(anon_vma_prepare(vma))) { ret = VM_FAULT_OOM; goto out;

page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);

copy_user_highpage(page, vmf.page, address, vma);

A new page must be allocated once a new anon_vma instance has been created for the region with anon_vma_prepare (the pointer to the old region is redirected to the new region in anon_vma_prepare). The high memory area is preferably used for this purpose as it presents no problems for userspace pages. copy_user_highpage then creates a copy of the data (routines for copying data between kernel and userspace are discussed in Section 4.13).

Now that the position of the page is known, it must be added to the page table of the process and incorporated in the reverse mapping data structures. Before this is done, a check is made to ensure that the page contents are visible in userspace by updating the caches with flush_icache_page. (Most processors don't need to do this and define an empty operation.)

A page table entry that normally points to a read-only page is generated using the mk_pte function discussed in Section 3.3.2. If a page with write access is created, the kernel must explicitly set write permission with pte_mkwrite.

How pages are integrated into the reverse mapping depends on their type. If the page generated when handling the write access is anonymous, it is added to the active area of the LRU cache using lru_cache_ add_active (Chapter 16 examines the caching mechanisms used in more detail) and then integrated into the reverse mapping with page_add_new_anon_rmap. page_add_file_rmap is invoked for all other pages associated with a file-based mapping. Both functions are discussed in Section 4.8. Finally, the MMU cache of the processor has to be updated if required because the page tables have been modified.

0 0

Post a comment