[6 However this case should never happen since the kernel does not assign privileged page frames to the processes

If the memory region access rights match the access type that caused the exception, the handle_mm_fault( ) function is invoked to allocate a new page frame:

survive:

ret = handle mm fault(tsk->mm, vma, address, write); if (ret == 1 || ret == 2) {

if (ret == 1) tsk->min flt++; else tsk->maj flt++;

return;

The handle_mm_fault( ) function returns 1 or 2 if it succeeded in allocating a new page frame for the process. The value 1 indicates that the Page Fault has been handled without blocking the current process; this kind of Page Fault is called minor fault. The value 2 indicates that the Page Fault forced the current process to sleep (most likely because time was spent while filling the page frame assigned to the process with data read from disk); a Page Fault that blocks the current process is called a major fault. The function can also returns -1 (for not enough memory) or 0 (for any other error).

If handle_mm_fault( ) returns the value 0, a sigbus signal is sent to the process:

up read(&tsk->mm->mmap sem); tsk->thread.cr2 = address; tsk->thread.error code = error code; tsk->thread.trap no = 14; info.si signo = SIGBUS; info.si_errno = 0; info.si_code = BUS_ADRERR; info.si_addr = (void *) address; force sig info(SIGBUS, &info, tsk); if (!(error code & 4)) /* Kernel Mode */ goto no context;

If handle_mm_fault( ) cannot allocate the new page frame, the kernel usually kills the current process. However, if current is the init process, it is just put at the end of the run queue and the scheduler is invoked; once init resumes its execution, handle_mm_fault( ) is executed again:

up read(&tsk->mm->mmap sem); if (tsk->pid != 1) {

do_exit(SIGKILL); goto no context;

down read(&tsk->mm->mmap sem); goto survive;

The handle_mm_fault( ) function acts on four parameters:

A pointer to the memory descriptor of the process that was running on the CPU when the exception occurred vma

A pointer to the descriptor of the memory region, including the linear address that caused the exception

The linear address that caused the exception write access

Set to 1 if tsk attempted to write in address and to 0 if tsk attempted to read or execute it

The function starts by checking whether the Page Middle Directory and the Page Table used to map address exist. Even if address belongs to the process address space, the corresponding Page Tables might not have been allocated, so the task of allocating them precedes everything else:

spin lock(&mm->page table lock); pgd = pgd offset(mm, address); pmd = pmd alloc(mm, pgd, address); if (pmd) {

return handle pte fault(mm, vma, address, write access, pte);

spin unlock(&mm->page table lock); return -1;

The pgd local variable contains the Page Global Directory entry that refers to address; pmd_alloc( )

is invoked to allocate, if needed, a new Page Middle Directory. 121 pte_alloc( ) is then invoked to allocate, if needed, a new Page Table. If both operations are successful, the pte local variable points to the Page Table entry that refers to address.

[2] On 80 x 86 microprocessors, this kind of allocation never occurs since the Page Middle Directories are either included in the Page Global Directory (PAE not enabled) or allocated together with the Page Global Directory (PAE enabled).

The handle_pte_fault( ) function is then invoked to inspect the Page Table entry corresponding to address and to determine how to allocate a new page frame for the process:

• If the accessed page is not present—that is, if it is not already stored in any page frame—the kernel allocates a new page frame and initializes it properly; this technique is called demand paging.

• If the accessed page is present but is marked read only—i.e., if it is already stored in a page frame—the kernel allocates a new page frame and initializes its contents by copying the old page frame data; this technique is called Copy On Write.

8.4.3 Demand Paging

The term demand paging denotes a dynamic memory allocation technique that consists of deferring page frame allocation until the last possible moment—until the process attempts to address a page that is not present in RAM, thus causing a Page Fault exception.

The motivation behind demand paging is that processes do not address all the addresses included in their address space right from the start; in fact, some of these addresses may never be used by the process. Moreover, the program locality principle (see Section 2.4.2) ensures that, at each stage of program execution, only a small subset of the process pages are really referenced, and therefore the page frames containing the temporarily useless pages can be used by other processes. Demand paging is thus preferable to global allocation (assigning all page frames to the process right from the start and leaving them in memory until program termination) since it increases the average number of free page frames in the system and therefore allows better use of the available free memory. From another viewpoint, it allows the system as a whole to get a better throughput with the same amount of RAM.

The price to pay for all these good things is system overhead: each Page Fault exception induced by demand paging must be handled by the kernel, thus wasting CPU cycles. Fortunately, the locality principle ensures that once a process starts working with a group of pages, it sticks with them without addressing other pages for quite a while. Thus, Page Fault exceptions may be considered rare events.

An addressed page may not be present in main memory for the following reasons:

• The page was never accessed by the process. The kernel can recognize this case since the Page Table entry is filled with zeros—i.e., the pte_none macro returns the value 1.

• The page was already accessed by the process, but its content is temporarily saved on disk. The kernel can recognize this case since the Page Table entry is not filled with zeros (however, the Present flag is cleared since the page is not present in RAM).

The handle_ pte_fault( ) function distinguishes the two cases by inspecting the Page Table entry that refers to address:

return do no page(mm, vma, address, write access, pte); return do swap page(mm, vma, address, pte, entry, write access);

We'll examine the case in which the page is saved on disk (using the do_swap_ page( ) function) in Section 16.6.

In the other situation, when the page was never accessed, the do_no_page( ) function is invoked. There are two ways to load the missing page, depending on whether the page is mapped to a disk file. The function determines this by checking the nopage method of the vma memory region object, which points to the function that loads the missing page from disk into RAM if the page is mapped to a file. Therefore, the possibilities are:

• The vma->vm_ops->nopage field is not null. In this case, the memory region maps a disk file and the field points to the function that loads the page. This case is covered in Section 15.2.4 and in Section 19.3.5.

• Either the vm_ops field or the vma->vm_ops->nopage field is null. In this case, the memory region does not map a file on disk—i.e., it is an anonymous mapping. Thus, do_no_ page( ) invokes the do_anonymous_page( ) function to get a new page frame:

return do anonymous page(mm, vma, page table, write access, address);

The do_anonymous_page( ) function handles write and read requests separately:

spin unlock(&mm->page table lock); page = alloc_page(GFP_HIGHUSER); addr = kmap atomic(page, KM USER0); memset((void *)(addr), 0, PAGE_SIZE); kunmap_atomic(addr, KM_USER0); spin lock(&mm->page table lock); mm->rss++;

entry = pte mkwrite(pte mkdirty(mk pte(page, vma->vm page prot))); lru cache add(page); mark page accessed(page);

set pte(page table, entry);

spin unlock(&mm->page table lock);

return 1;

When handling a write access, the function invokes alloc_page( ) and fills the new page frame with zeros by using the memset macro. The function then increments the min_flt field of tsk to keep track of the number of minor Page Faults caused by the process. Next, the function increments the rss field of the memory descriptor to keep track of the number of page frames allocated to the process. [8] The Page Table entry is then set to the physical address of the page frame, which is marked as writable and dirty. The lru_cache_add( ) and mark_page_accessed( ) functions insert the new page frame in the swap-related data structures; we discuss them in Chapter 16.

Continue reading here: [8 Linux records the number of minor and major Page Faults for each process This information together with several other statistics may be used to tune the system

Was this article helpful?

0 0