Correction of Userspace Page Faults

Once the architecture-specific analysis of the page fault has been concluded and it has been established that the fault was triggered at a permitted address, the kernel must decide on the appropriate method to read the required data into RAM memory. This task is delegated to handle_mm_fault, which is no longer dependent on the underlying architecture but is implemented system-independently within the memory management framework. The function ensures that page table entries for all directory levels that lead to the faulty PTE are present. The function handle_pte_fault analyzes the reason for the page fault. entry is a pointer to the relevant page table element (pte_t).

mm/memory.c static inline int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *pte, pmd_t *pmd, int write_access)

pte_t entry; spinlock_t *ptl;

return do_linear_fault(mm, vma, address, pte, pmd, write_access, entry);

return do_anonymous_page(mm, vma, address, pte, pmd, write_access);

if (pte_file(entry))

return do_nonlinear_fault(mm, vma, address, pte, pmd, write_access, entry); return do_swap_page(mm, vma, address, pte, pmd, write_access, entry);

Three cases must be distinguished if the page is not present in physical memory [ !pte_present(entry) ].

1. If no page table entry is present (page_none), the kernel must load the page from scratch — this is known as demand allocation for anonymous mappings and demand paging for file-based mappings. This does not apply if there is no vm_operations_struct registered in vm_ops — in this case, the kernel must return an anonymous page using do_ anonymous_page.

2. If the page is marked as not present but information on the page is held in the page table, this means that the page has been swapped out and must therefore be swapped back in from one of the system swap areas (swap-in or demand paging).

3. Parts of nonlinear mappings that have been swapped out cannot be swapped in like regular pages because the nonlinear association must be restored correctly. The function pte_file allows for checking if the PTE belongs to a nonlinear mapping, and do_nonlinear_fault handles the fault.

A further potential case arises if the region grants write permission for the page but the access mechanisms of the hardware do not (thus triggering the fault). Notice that since the page is present in this case, the above if case is executed and the kernel drops right through to the following code:

mm/memory.c if (write_access) {

return do_wp_page(mm, vma, address, pte, pmd, ptl, entry); entry = pte_mkdirty(entry);

do_wp_page is responsible for creating a copy of the page and inserting it in the page tables of the process — with write access permission for the hardware. This mechanism is referred to as copy on write (COW, for short) and is discussed briefly in Chapter 1. When a process forks, the pages are not copied immediately but are mapped into the address space of the process as "read-only" copies so as not to spend too much time in the (wasteful) copying of information. A separate copy of the page is not created for the process until write access actually takes place.

The sections below take a closer look at the implementation of the fault handler routines invoked during page fault correction. They do not cover how pages are swapped in from a swap area by means of do_swap_page, as this topic is discussed separately in Chapter 18 and requires additional knowledge of the structure and organization of the swap layer.

