Representation of Regions

Each region is represented by an instance of vm_area_struct, which is defined (in simplified form) as follows:

struct vm_area_struct {

struct mm_struct * vm_mm; /* The address space we belong to. */

unsigned long vm_start; /* Our start address within vm_mm. */

unsigned long vm_end; /* The first byte after our end address within vm_mm. */

/* linked list of VM areas per task, sorted by address */ struct vm_area_struct *vm_next;

pgprot_t vm_page_prot; /* Access permissions of this VMA. */

unsigned long vm_flags; /* Flags, listed below. */

struct rb_node vm_rb;

* For areas with an address space and backing store,

* linkage into the address_space->i_mmap prio tree, or

* linkage to the list of like vmas hanging off its node, or

* linkage of vma in the address_space->i_mmap_nonlinear list.

union {

struct {

struct list_head list;

void *parent; /* aligns with prio_tree_node parent */ struct vm_area_struct *head; } vm_set;

struct raw_prio_tree_node prio_tree_node;

* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma

* list, after a COW of one of the file pages. A MAP_SHARED vma

* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack

* or brk vma (with NULL file) can only be in an anon_vma list.

struct list_head anon_vma_node; /* Serialized by anon_vma->lock */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */

/* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops;

/* Information about our backing store: */

unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE

units, *not* PAGE_CACHE_SIZE */ struct file * vm_file; /* File we map to (can be NULL). */

void * vm_private_data; /* was vm_pte (shared mem) */

The individual elements have the following meanings:

□ vm_mm is a back-pointer to the mm_struct instance to which the region belongs.

□ vm_start and vm_end specify the virtual start and end addresses of the region in userspace.

□ The linear linking of all vm_area_struct instances of a process is achieved using vm_next, whereas incorporation in the red-black tree is the responsibility of vm_rb.

□ vm_page_prot stores the access permissions for the region in the constants discussed in Section 3.3.1, which are also used for pages in memory.

□ vm_flags is a set of flags describing the region. I discuss the flags that can be set below.

□ A mapping of a file into the virtual address space of a process is uniquely determined by the interval in the file and the corresponding interval in memory. To keep track of all intervals associated with a process, the kernel uses a linked list and a red-black tree as described above.

However, it is also necessary to go the other way round: Given an interval in a file, the kernel sometimes needs to know all processes into which the interval is mapped. Such mappings are called shared mappings, and the C standard library, which is used by nearly every process in the system, is a prime example of why such mappings are necessary.

To provide the required information, all vm_area_struct instances are additionally managed in a priority tree, and the elements required for this are contained in shared. As you can easily imagine from the rather complicated definition of this structure member, this is a tricky business, which is discussed in detail in Section 4.4.3 below.

□ anon_vma_node and anon_vma are used to manage shared pages originating from anonymous mappings. Mappings that point to the same pages are held on a doubly linked list, where anon_vma_node acts as the list element.

There are several of these lists, depending on how many sets of mappings there are that share different physical pages. The anon_vma element serves as a pointer to the management structure that is associated with each list and comprises a list head and an associated lock.

□ vm_ops is a pointer to a collection of methods used to perform various standard operations on the region.

struct vm_operations_struct {

void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area);

int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf); struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int *type);

□ open and close are invoked when a region is created and deleted, respectively. They are not normally used and have null pointers.

□ However, fault is very important. If a virtual page is not present in an address space, the automatically triggered page fault handler invokes this function to read the corresponding data into a physical page that is mapped into the user address space.

□ nopage is the kernel's old method to respond to page faults that is less flexible than fault. The element is still provided for compatibility reasons, but should not be used in new code.

□ vm_pgoffset specifies an offset for a file mapping when not all file contents are to be mapped (the offset is 0 if the whole file is mapped).

The offset is not expressed in bytes but in multiples of page_size. On a system with pages of 4 KiB, an offset value of 10 equates to an actual byte offset of 40,960. This is reasonable because the kernel only supports mappings in whole-page units, and smaller values would make no sense.

□ vm_file points to the file instance that describes a mapped file (it holds a null pointer if the object mapped is not a file). Chapter 8 discusses the contents of the file structure at length.

□ Depending on mapping type, vm_private_data can be used to store private data that are not manipulated by the generic memory management routines. (The kernel ensures only that the element is initialized with a null pointer when a new region is created.) Currently, only a few sound and video drivers make use of this option.

vm_flags stores flags to define the properties of a region. They are all declared as pre-processor constants in <mm.h>.

□ vm_read, vm_write, vm_exec, and vm_shared specify whether page contents can be read, written, executed, or shared by several processes.

□ vm_mayread, vm_maywrite, vm_mayexec, and vm_mayshare determine whether the vm_* flags may be set. This is required for the mprotect system call.

□ vm_growsdown and vm_growsup indicate whether a region can be extended downward or upward (to lower/higher virtual addresses). Because the heap grows from bottom to top, vm_growsup is set in its region; vm_growsdown is set for the stack, which grows from top to bottom.

□ vm_seq_read is set if it is likely that the region will be read sequentially from start to end; vm_rand_read specifies that read access may be random. Both flags are intended as "prompts" for memory management and the block device layer to improve their optimizations (e.g., page readahead if access is primarily sequential. Chapter 8 takes a closer look at this technique).

□ If vm_dontcopy is set, the relevant region is not copied when the fork system call is executed.

□ vm_dontexpand prohibits expansion of a region by the mremap system call.

□ vm_hugetlb is set if the region is based on huge pages as featured in some architectures.

□ vm_account specifies whether the region is to be included in the calculations for the overcommit features. These features restrict memory allocations in various ways (refer to Section 4.5.3 for more details).

Continue reading here: The Priority Search Tree

Was this article helpful?

0 0