Demand Paging for Memory Mapping

For reasons of efficiency, page frames are not assigned to a memory mapping right after it has been created at the last possible moment—that is, when the process attempts to address one of its pages, thus causing a Page Fault exception.

We saw in Section 8.4 how the kernel verifies whether the faulty address is included in some memory region of the process; if so, the kernel checks the Page Table entry corresponding to the faulty address and invokes the do_no_page( ) function if the entry is null (see Section 8.4.3).

The do_no_page( ) function performs all the operations that are common to all types of demand paging, such as allocating a page frame and updating the Page Tables. It also checks whether the nopage method of the memory region involved is defined. In Section 8.4.3, we described the case in which the method is undefined (anonymous memory region); now we complete the description by discussing the actions performed by the function when the method is defined:

1. Invokes the nopage method, which returns the address of a page frame that contains the requested page.

2. If the process is trying to write into the page and the memory mapping is private, avoids a future Copy On Write fault by making a copy of the page just read and inserting it into the inactive list of pages (see Chapter 16). In the following steps, the function uses the new page instead of the page returned by the nopage method so that the latter is not modified by the User Mode process.

3. Increments the rss field of the process memory descriptor to indicate that a new page frame has been assigned to the process.

4. Sets up the Page Table entry corresponding to the faulty address with the address of the page frame and the page access rights included in the memory region vm page prot field.

5. If the process is trying to write into the page, forces the Read/Write and Dirty bits of the Page Table entry to 1. In this case, either the page frame is exclusively assigned to the process, or the page is shared; in both cases, writing to it should be

The core of the demand paging algorithm consists of the memory region's nopage method.

Generally speaking, it must return the address of a page frame that contains the page accessed by the process. Its implementation depends on the kind of memory region in which the page is included.

When handling memory regions that map files on disk, the nopage method must first search for the requested page in the page cache. If the page is not found, the method must read it from disk. Most filesystems implement the nopage method by means of the filemap_nopage( ) function, which receives three parameters:


Descriptor address of the memory region, including the required page.


Linear address of the required page.


Parameter of the nopage method that is not used by filemap_nopage( ). The filemap_nopage( ) function executes the following steps:

1. Gets the file object address file from area->vm_file field. Derives the address space object address from file->f dentry->d inode->i mapping. Derives the inode object address from the host field of the address_space object.

2. Uses the vm_start and vm_pgoff fields of area to determine the offset within the file of the data corresponding to the page starting from address.

3. Checks whether the file offset exceeds the file size. When this happens, returns null, which means failure in allocating the new page, unless the Page Fault was caused by a debugger tracing another process through the ptrace( ) system call. We are not going to discuss this special case.

4. Invokes find_get_page( ) to look in the page cache for the page identified by the address_space object and the file offset.

5. If the page is not in the page cache, checks the value of the vm_rand_read flag of the memory region. The value of this flag can be changed by means of the madvise( ) system call; when the flag is set, it indicates that the user application is not going to read more pages of the file than those just accessed.

o If the VM_RAND_READ flag is set, invokes page_cache_read( ) to read just o If the vm_rand_read flag is cleared, invokes page_cache_read( ) several times to read a cluster of adjacent pages inside the memory region, including the requested page. The length of the cluster is stored in the page_request variable; its default value is three pages, but the system administrator may tune its value by writing into the /proc/sys/vm/page-cluster special file.

Then the function jumps back to Step 4 and repeats the page cache lookup operation (the process might have been blocked while executing the page_cache_read( ) function).

6. The page is inside the page cache. Checks its PG_uptodate flag. If the flag is not set (page not up to date), the function performs the following substeps:

a. Locks up the page by setting the PG_locked flag, sleeping if necessary.

b. Invokes the readpage method of the address_space object to trigger the I/O data transfer.

c. Invokes wait_on_page( ) to sleep until the I/O transfer completes.

7. The page is up to date. The function checks the vm_seq_read flag of the memory region. The value of this flag can be changed by means of the madvise( ) system call; when the flag is set, it indicates that the user application is going to reference the pages of the mapped file sequentially, thus the pages should be aggressively read in advance and freed after they are accessed. If the flag is set, it invokes nopage_sequential_readahead( ). This function uses a large, fixed-size read-ahead window, whose length is approximately the maximum read-ahead window size of the underlying block device (see the earlier section Section 15.1.2). The vm_raend field of the memory region descriptor stores the ending position of the current read-ahead window. The function shifts the read-ahead windows forward (by reading in advance the corresponding pages) whenever the requested page falls exactly in the middle point of the current read-ahead window. Moreover, the function should release the pages in the memory region that are far behind the requested page; if the function reads the nth read-ahead window of the memory region, it flushes to disk the pages belonging to the (n-3)th window (however, the kernel Version 2.4.18 doesn't release them; see the next section).

8. Invokes mark_page_accessed( ) to mark the requested page as accessed (see Chapter 16).

Continue reading here: Returns the address of the requested page 1525 Flushing Dirty Memory Mapping Pages to Disk

Was this article helpful?

0 -1