Kernel Mappings of High Memory Page Frames

Page frames above the 896 MB boundary are not mapped in the fourth gigabyte of the kernel linear address spaces, so they cannot be directly accessed by the kernel. This implies that any page allocator function that returns the linear address of the assigned page frame doesn't work for the high memory.

For instance, suppose that the kernel invoked__get_free_pages(GFP_HiGHMEM,0) to allocate a page frame in high memory. If the allocator assigned a page frame in high memory,__get_free_pages( )

cannot return its linear address because it doesn't exist; thus, the function returns NuLL. In turn, the kernel cannot use the page frame; even worse, the page frame cannot be released because the kernel has lost track of it.

In short, allocation of high-memory page frames must be done only through the alloc_pages( ) function and its alloc_page( ) shortcut, which both return the address of the page descriptor of the first allocated page frame. Once allocated, a high-memory page frame has to be mapped into the fourth gigabyte of the linear address space, even though the physical address of the page frame may well exceed 4 GB.

To do this, the kernel may use three different mechanisms, which are called permanent kernel mappings, temporary kernel mappings, and noncontiguous memory allocation. In this section, we focus on the first two techniques; the third one is discussed in Section 7.3 later in this chapter.

Establishing a permanent kernel mapping may block the current process; this happens when no free Page Table entries exist that can be used as "windows" on the page frames in high memory (see the next section). Thus, a permanent kernel mapping cannot be used in interrupt handlers and deferrable functions. Conversely, establishing a temporary kernel mapping never requires blocking the current process; its drawback, however, is that very few temporary kernel mappings can be established at the same time.

Of course, none of these techniques allow addressing the whole RAM simultaneously. After all, only 128 MB of linear address space are left for mapping the high memory, while PAE supports systems having up to 64 GB of RAM. Permanent kernel mappings

Permanent kernel mappings allow the kernel to establish long-lasting mappings of high-memory page frames into the kernel address space. They use a dedicated Page Table whose address is stored in the pkmap_page_table variable. The number of entries in the Page Table is yielded by the last_pkmap macro. As usual, the Page Table includes either 512 or 1,024 entries, according to whether PAE is enabled or disabled (see Section 2.4.6); thus, the kernel can access at most 2 or 4 MB of high memory at once.

The Page Table maps the linear addresses starting from pkmap_base (usually 0xfe000000). The address of the descriptor corresponding to the first page frame in high memory is stored in the highmem_start_page variable.

The pkmap_count array includes last_pkmap counters, one for each entry of the pkmap_page_table Page Table. We distinguish three cases:

The counter is 0

The corresponding Page Table entry does not map any high-memory page frame and is usable. The counter is 1

The corresponding Page Table entry does not map any high-memory page frame, but it cannot be used because the corresponding TLB entry has not been flushed since its last usage.

The counter is n (greater than 1)

The corresponding Page Table entry maps a high-memory page frame, which is used by exactly n-1 kernel components.

The kmap( ) function establishes a permanent kernel mapping. It is essentially equivalent to the following code:

if (page < highmem_page_start)

return page->virtual; return kmap_high(page);

The virtual field of the page descriptor stores the linear address in the fourth gigabyte mapping the page frame, if any. Thus, for any page frame below the 896 MB boundary, the field always includes the physical address of the page frame plus page_offset. Conversely, if the page frame is in high memory, the virtual field has a non-null value only if the page frame is currently mapped, either by the permanent or the temporary kernel mapping.

The kmap_high( ) function is invoked if the page frame really belongs to the high memory. The function is essentially equivalent to the following code:

unsigned long vaddr; spin_lock(&kmap_lock);

vaddr = (unsigned long) page->virtual; if (!vaddr)

vaddr = map_new_virtual(page); pkmap_count[(vaddr-PKMAP_BASE) >> PAGE_SHIFT]++; spin_unlock(&kmap_lock); return (void *) vaddr;

The function gets the kmap_lock spin lock to protect the Page Table against concurrent accesses in multiprocessor systems. Notice that there is no need to disable the interrupts because kmap( ) cannot be invoked by interrupt handlers and deferrable functions. Next, the kmap_high( ) function checks whether the virtual field of the page descriptor already stores a non-null linear address. If not, the function invokes the map_new_virtual( ) function to insert the page frame physical address in an entry of pkmap_page_table. Then kmap_high( ) increments the counter corresponding to the linear address of the page frame by 1 because another kernel component is going to access the page frame. Finally, kmap_high( ) releases the kmap_lock spin lock and returns the linear address that maps the page.

The map_new_virtual( ) function essentially executes two nested loops:

int count;

DECLARE_WAITQUEUE(wait, current);

for (count = LAST_PKMAP; count >= 0; —count) {

last_pkmap_nr = (last_pkmap_nr + 1) & (LAST_PKMAP - 1); if (!last_pkmap_nr) {

flush_all_zero_pkmaps( ); count = LAST_PKMAP;

unsigned long vaddr = PKMAP_BASE + (last_pkmap_nr << PAGE_SHIFT) set_pte(&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, 0x63)); pkmap_count[last_pkmap_nr] = 1; page->virtual = (void *) vaddr; return vaddr;

current->state = TASK_UNITERRUPTIBLE; add_wait_queue(&pkmap_map_wait, &wait); spin_unlock(&kmap_lock); schedule( );

remove_wait_queue(&pkmap_map_wait, &wait); spin_lock(&kmap_lock); if (page->virtual)

return (unsigned long) page->virtual;

In the inner loop, the function scans all counters in pkmap_count that are looking for a null value. The last_pkmap_nr variable stores the index of the last used entry in the pkmap_page_table Page Table. Thus, the search starts from where it was left in the last invocation of the map_new_virtual( ) function.

When the last counter in pkmap_count is reached, the search restarts from the counter at index 0. Before continuing, however, map_new_virtual( ) invokes the flush_all_zero_pkmaps( ) function, which starts another scanning of the counters looking for the value 1. Each counter that has value 1 denotes an entry in pkmap_page_table that is free but cannot be used because the corresponding TLB entry has not yet been flushed. flush_all_zero_pkmaps( ) issues the TLB flushes on such entries and resets their counters to zero.

If the inner loop cannot find a null counter in pkmap_count, the map_new_virtual( ) function blocks the current process until some other process releases an entry of the pkmap_page_table Page Table. This is achieved by inserting current in the pkmap_map_wait wait queue, setting the current state to task_uninterruptible and invoking schedule( ) to relinquish the CPU. Once the process is awoken, the function checks whether another process has mapped the page by looking at the virtual field of the page descriptor; if some other process has mapped the page, the inner loop is restarted.

When a null counter is found by the inner loop, the map_new_virtual( ) function:

1. Computes the linear address that corresponds to the counter.

2. Writes the page's physical address into the entry in pkmap_page_table. The function also sets the bits Accessed, Dirty, Read/Write, and Present (value 0x63) in the same entry.

3. Sets to 1 the pkmap_count counter.

4. Writes the linear address into the virtual field of the page descriptor.

5. Returns the linear address.

The kunmap( ) function destroys a permanent kernel mapping. If the page is really in the high memory zone, it invokes the kunmap_high( ) function, which is essentially equivalent to the following code:

void kunmap_high(struct page * page) {


if ((—pkmap_count[((unsigned long) page->virtual-PKMAP_BASE)>>PAGE_SHIFT])==1)

wake_up(&pkmap_map_wait); spin_unlock(&kmap_lock);

Notice that if the counter of the Page Table entry becomes equal to 1 (free), kunmap_high( ) wakes up the processes waiting in the pkmap_map_wait wait queue. Temporary kernel mappings

Temporary kernel mappings are simpler to implement than permanent kernel mappings; moreover, they can be used inside interrupt handlers and deferrable functions because they never block the current process.

Any page frame in high memory can be mapped through a window in the kernel address space—namely, a Page Table entry that is reserved for this purpose. The number of windows reserved for temporary kernel mappings is quite small.

Each CPU has its own set of five windows whose linear addresses are identified by the enum km_type data structure:

enum km_type {







The kernel must ensure that the same window is never used by two kernel control paths at the same time. Thus, each symbol is named after the kernel component that is allowed to use the corresponding window. The last symbol, km_type_nr, does not represent a linear address by itself, but yields the number of different windows usable by every CPU.

Each symbol in km_type, except the last one, is an index of a fix-mapped linear address (see Section 2.5.6). The enum fixed_addresses data structure includes the symbols fix_kmap_begin and fix_kmap_end; the latter is assigned to the index fix_kmap_begin+(km_type_nr*nr_cpus)-1. In this manner, there are km_type_nr fix-mapped linear addresses for each CPU in the system. Furthermore, the kernel initializes the kmap_pte variable with the address of the Page Table entry corresponding to the

To establish a temporary kernel mapping, the kernel invokes the kmap_atomic( ) function, which is essentially equivalent to the following code:

void * kmap_atomic(struct page * page, enum km_type type) {

enum fixed_addresses idx; if (page < highmem_start_page)

return page->virtual; idx = type + KM_TYPE_NR * smp_processor_id( ); set_pte(kmap_pte-idx, mk_pte(page, 0x063));


The type argument and the CPU identifier specify what fix-mapped linear address has to be used to map the request page. The function returns the linear address of the page frame if it doesn't belong to high memory; otherwise, it sets up the Page Table entry corresponding to the fix-mapped linear address with the page's physical address and the bits Present, Accessed, Read/Write, and Dirty. Finally, the TLB entry corresponding to the linear address is flushed.

To destroy a temporary kernel mapping, the kernel uses the kunmap_atomic( ) function. In the 80 x 86 architecture, however, this function does nothing.

Temporary kernel mappings should be used carefully. A kernel control path using a temporary kernel mapping must never block, because another kernel control path might use the same window to map some other high memory page.

Continue reading here: The Buddy System Algorithm

Was this article helpful?

+1 0