Figure 85 The flow diagram of the Page Fault handler

Dues the ¿deists belongToEhe prate» addresi space'

Kernel Flow Images
The identifiers vmalloc_fault, good_area, bad_area, and no_context are labels appearing in do_page_fault( ) that should help you to relate the blocks of the flow diagram to specific lines of code.

The do_ page_fault( ) function accepts the following input parameters:

• The regs address of a pt_regs structure containing the values of the microprocessor registers when the exception occurred.

• A 3-bit error_code, which is pushed on the stack by the control unit when the exception occurred (see Section 4.2.4). The bits have the following meanings.

o If bit 0 is clear, the exception was caused by an access to a page that is not present (the Present flag in the Page Table entry is clear); otherwise, if bit 0 is set, the exception was caused by an invalid access right. o If bit 1 is clear, the exception was caused by a read or execute access; if set, the exception was caused by a write access. o If bit 2 is clear, the exception occurred while the processor was in Kernel Mode; otherwise, it occurred in User Mode.

The first operation of do_ page_fault( ) consists of reading the linear address that caused the Page Fault. When the exception occurs, the CPU control unit stores that value in the cr2 control register:

asm("movl %%cr2,%0":"=r" (address)); if (regs->eflags & 0x00000200)

local irq enable(); tsk = current;

The linear address is saved in the address local variable. The function also ensures that local interrupts are enabled if they were enabled before the fault and saves the pointers to the process descriptor of current in the tsk local variable.

As shown at the top of Figure 8-5, do_ page_fault( ) checks whether the faulty linear address belongs to the fourth gigabyte and the exception was caused by the kernel trying to access a nonexisting page frame:

if (address >= TASK_SIZE && !(error_code & 0x101)) goto vmalloc fault;

The code at label vmalloc_fault takes care of faults that were likely caused by accessing a noncontiguous memory area in Kernel Mode; we describe this case in the later section Section 8.4.5.

Next, the handler checks whether the exception occurred while handling an interrupt or executing a kernel thread (remember that the mm field of the process descriptor is always NULL for kernel threads):

info.i_code = SEGV_MAPERR; if (in interrupt( ) || !tsk->mm) goto no context;

In both cases, do_ page_fault( ) does not try to compare the linear address with the memory regions of current, since it would not make any sense: interrupt handlers and kernel threads never use linear addresses below task_size, and thus never rely on memory regions. (See the next section for information on the info local variable and a description of the code at the no_context label.)

Let's suppose that the Page Fault did not occur in an interrupt handler or in a kernel thread. Then the function must inspect the memory regions owned by the process to determine whether the faulty linear address is included in the process address space:

down read(&tsk->mm->mmap sem); vma = find vma(tsk->mm, address); if (!vma)

goto bad area; if (vma->vm start <= address) goto good_area;

If vma is NULL, there is no memory region ending after address, and thus the faulty address is certainly bad. On the other hand, the first memory region ending after address might not include address; if it does, the function jumps to the code at label good_area.

If none of the two "if" conditions are satisfied, the function has determined that address is not included in any memory region; however, it must perform an additional check, since the faulty address may have been caused by a push or pusha instruction on the User Mode stack of the process.

Let's make a short digression to explain how stacks are mapped into memory regions. Each region that contains a stack expands toward lower addresses; its vm_growsdown flag is set, so the value of its vm_end field remains fixed while the value of its vm_start field may be decreased. The region boundaries include, but do not delimit precisely, the current size of the User Mode stack. The reasons for the fuzz factor are:

• The region size is a multiple of 4 KB (it must include complete pages) while the stack size is arbitrary.

• Page frames assigned to a region are never released until the region is deleted; in particular, the value of the vm_start field of a region that includes a stack can only decrease; it can never increase. Even if the process executes a series of pop instructions, the region size remains unchanged.

It should now be clear how a process that has filled up the last page frame allocated to its stack may cause a Page Fault exception: the push refers to an address outside of the region (and to a nonexistent page frame). Notice that this kind of exception is not caused by a programming error; thus it must be handled separately by the Page Fault handler.

We now return to the description of do_ page_fault( ), which checks for the case described previously:

goto bad area; if (error code & 4

&& address + 32 < goto bad area; if (expand stack(vma, goto bad area; goto good area;



If the vm_growsdown flag of the region is set and the exception occurred in User Mode, the function checks whether address is smaller than the regs->esp stack pointer (it should be only a little smaller). Since a few stack-related assembly language instructions (like pusha) perform a decrement of the esp register only after the memory access, a 32-byte tolerance interval is granted to the process. If the address is high enough (within the tolerance granted), the code invokes the expand_stack( ) function to check whether the process is allowed to extend both its stack and its address space; if everything is OK, it sets the vm_start field of vma to address and returns 0; otherwise, it returns 1.

Note that the preceding code skips the tolerance check whenever the vm_growsdown flag of the region is set and the exception did not occur in User Mode. These conditions mean that the kernel is addressing the User Mode stack and that the code should always run expand_stack( ).

0 0

Post a comment