PTESpecific Entries

Each final entry in the page table not only yields a pointer to the memory location of the page, but also holds additional information on the page in the superfluous bits mentioned above. Although these data are CPU-specific, they usually provide at least some information on page access control. The following elements are found in most CPUs supported by the Linux kernel:

□ _page_present specifies whether the virtual page is present in RAM memory. This need not necessarily be the case because pages may be swapped out into a swap area as noted briefly in Chapter 1.

The structure of the page table entry is usually different if the page is not present in memory because there is no need to describe the position of the page in memory. Instead, information is needed to identify and find the swapped-out page.

9The definitions for IA-32 are similar. However, only pte_t and pgd_t, which are defined as unsigned long, make an effective contribution. I use the code example for AMD64 because it is more regular.

10When IA-32 processors use PAE mode, they define pte_t as, for example, typedef struct { unsigned long pte_low, pte_high; }. 32 bits are then no longer sufficient to address the complete memory because more than 4 GiB can be managed in this mode. In other words, the available amount of memory can be larger than the processor's address space.

Since pointers are, however, still only 32 bits wide, an appropriate subset of the enlarged memory space must be chosen for userspace applications that do still only see 4 GiB each.

□ _page_accessed is set automatically by the CPU each time the page is accessed. The kernel regularly checks the field to establish how actively the page is used (infrequently used pages are good swapping candidates). The bit is set after either read or write access.

□ _page_dirty indicates whether the page is ''dirty,'' that is, whether the page contents have been modified.

□ _page_file has the same numerical value as _page_dirty, but is used in a different context, namely, when a page is not present in memory. Obviously, a page that is not present cannot be dirty, so the bit can be reinterpreted: If it is not set, then the entry points to the location of a swapped-out page (see Chapter 18). A set _page_file is required for entries that belongs to nonlinear file mappings which are discussed in Section 4.7.3.

□ If _page_user is set, userspace code is allowed to access the page. Otherwise, only the kernel is allowed to do this (or when the CPU is in system mode).

□ _page_read, _page_write, and _page_execute specify whether normal user processes are allowed to read the page, write to the page, or execute the machine code in the page.

Pages from kernel memory must be protected against writing by user processes.

There is, however, no assurance that even pages belonging to user processes can be written to, for example, if the page contains executable code that may not be modified — either intentionally or unintentionally.

Architectures that feature less finely grained access rights define the _PAGE_RW constant to allow or disallow read and write access in combination if no further criterion is available to distinguish between the two.

□ IA-32 and AMD64 provide _page_bit_nx to label the contents of a page as not executable (this protection bit is only available on IA-32 systems if the page address extensions for addressing 64 GiB memory are enabled). It can prevent, for example, execution of code on stack pages that can result in security gaps in programs because of intentionally provoked buffer overflows if malicious code has been introduced. The NX bit cannot prevent buffer overflow but can suppress its effects because the process refuses to run the malicious code. Of course, the same result can also be achieved if the architectures themselves provide a good set of access authorization bits for memory pages, as is the case with some (unfortunately not very common) processors.

Each architecture must provide two things to allow memory management to modify the additional bits in pte_t entries — the data type_pgprot in which the additional bits are held, and the pte_modify function to modify the bits. The above pre-processor symbols are used to select the appropriate entry.

The kernel also defines various functions to query and set the architecture-dependent state of memory pages. Not all functions can be defined by all processors because of lack of hardware support for a given feature.

□ pte_present checks if the page to which the page table entry points is present in memory. This function can, for instance, be used to detect if a page has been swapped out.

□ pte_dirty checks if the page associated with the page table entry is dirty, that is, its contents have been modified since the kernel checked last time. Note that this function may only be called if pte_present has ensured that the page is available.

□ pte_write checks if the kernel may write to the page.

□ pte_file is employed for nonlinear mappings that provide a different view on file contents by manipulating the page table (this mechanism is discussed in more detail in Section 4.7.3). The function checks if a page table entry belongs to such a mapping.

Continue reading here: Ptef ile may only be invoked if ptepresent returns false that is the page associated with the page table entry is not present in memory

Was this article helpful?

0 0