The Priority Search Tree

Priority search trees are required to establish a connection between a region in a file and all virtual address spaces into which the region is mapped. To understand how this connection is established, we need to introduce some data structures of the kernel, which will be discussed in more detail and within a more general context in the following chapters.

Additional Data Structures

Every open file (and every block device, because these can also be memory-mapped via device special files) is represented by an instance of struct file. This structure, in turn, contains a pointer to an address space object as represented by struct address_space. This object is the basis of the priority search tree (prio tree) by which the connection between mapped intervals and the address spaces into which these are mapped is established. The definition of both structures is as follows (I only show the elements required for our purposes here):

struct address_space {

struct inode *host; /* owner: inode, block_device */

struct prio_tree_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */

struct file {

struct address_space *f_mapping;

Additionally, each file and each block device are represented by an instance of struct inode. In contrast to struct file, which is the abstraction for a file opened by the open system call, the inode represents the object in the filesystem itself.

struct inode {

struct address_space *i_mapping;

Notice that only mapped file intervals are discussed below although, it is also possible to map different things, for instance, direct intervals in raw block devices, without a detour over filesystems. When a file is opened, the kernel sets file->f_mapping to inode->i_mapping. This allows multiple processes to access the same file without directly interfering with the other processes: inode is a file-specific data structure, while file is local to a given process.

These data structures are connected with each other, and Figure 4-7 provides an overview about the situation in memory. Notice that the representation of the tree is only symbolic and does not reflect the actual, complicated tree layout.

Given an instance of struct address_space, the kernel can infer the associated inode, which, in turn, allows for access to the backing store on which the file data are stored. Usually, the backing store will be a block device; the details are discussed in Chapter 9. Section 4.6 and Chapter 16 are devoted to discussing more about address spaces. Here it suffices to know that the address space is the base element of a priority tree that contains all vm_area_struct instances describing the mapping of an interval of the file associated with the inode into some virtual address space. Since each instance of struct vm_area contains a pointer to the mm_struct of the process to which it belongs, the desired connection is set up! Note that vm_area_structs can also be associated with an address space via a doubly linked list headed by i_mmap_nonlinear. This is required for nonlinear mappings, which I neglect for now. I will come back to them in Section 4.7.3, though.

struct file i_mapping struct file i_mapping struct vm_area_struct struct inode i_mapping struct file i_mapping struct address_space ¡^ a host i_mmap struct inode


struct address_space ¡^ a host i_mmap i_mmap_nonlinear struct vm_area_struct i_mmap_nonlinear

Backing device mm_struct mm_struct mm_struct

Figure 4-7: Tracking the virtual address spaces into which a given interval of a file is mapped with the help of a priority tree.

Backing device mm_struct mm_struct mm_struct

Manage vm_area_structs associated with a file

Figure 4-7: Tracking the virtual address spaces into which a given interval of a file is mapped with the help of a priority tree.

Recall that Figure 4-6 shows how vm_area_struct instances are organized in a linked list and a red-black tree. It is important to realize that these are the same vm_area_struct instances that are managed in the prio tree. While keeping vm_area_structs in two or more data structures at the same time is no problem for the kernel at all, it is nearly impossible to visualize. Therefore, keep in mind that a given instance of struct vm_area can be contained in two data structures: One establishes a connection between a region in the virtual address space of a process to the data in the underlying file, and one allows for finding all address spaces that map a given file interval.

Representing Priority Trees

Priority trees allow for management of the vm_area_struct instances that represent a particular interval of the given file. This requires that the data structure cannot only deal with overlapping, but also with identical file intervals. The situation is illustrated in Figure 4-8: Two processes map the region [7,12] of a file into their virtual address space, while a third process maps the interval [10,30].

Managing overlapping intervals is not much of a problem: The boundaries of the interval provide a unique index that allows for storing each interval in a unique tree node. I will not discuss in detail how this is implemented by the kernel because it rather similar to radix trees (see Appendix C for more details). It suffices to know that if intervals B, C, and D are completely contained in another interval A, then A will be the parent element of B, C, and D.

However, what happens if multiple identical intervals must be included in the prio tree? Each prio tree node is represented by the raw_prio_tree_node instance, which is directly included in each vm_ area_struct instance. Recall, however, that it is in a union with a vm_set. This allows for associating a list of vm_sets (and thus vm_area_structs) with a prio tree node. Figure 4-9 illustrates the situation in memory.


Process 1

Process 2


Process 3

Figure 4-8: Multiple processes can map identical or overlapping regions of a file into their virtual address space.

prio_tree_root prio_tree_root raw_prio_ vm_set tree_node raw_prio_tree_node raw_prio_ vm_set tree_node raw_prio_tree_node vm set vm set vm set

Figure 4-9: Interrelation of data structures in the management of shared identical mappings.

vm set vm set vm set

Figure 4-9: Interrelation of data structures in the management of shared identical mappings.

When an interval is inserted into the prio tree, the kernel proceeds as follows:

□ When the vm_area_struct instance is linked into the prio tree as a node, prio_tree_node is used to establish the necessary associations. To check whether there is a vm_area_struct in the tree, the kernel exploits the fact that the parent element of vm_set coincides with the last structure element of prio_tree_node — the data structures are coordinated accordingly. Since parent is not used within vm_set, the kernel can use parent != NULL to check whether the current vm_area_struct member is in a tree.

The definition of prio_tree_node also ensures that the head element of vmset does not overlap with prio_tree_node so that both can be used together, although they are actually combined in a union.

The kernel therefore uses vm_set.head to point to the first element on the list of vm_area_struct instances that belong to a shared mapping.

□ If the above list of shared mappings contains a vm_area_struct, vm_set.list is used as the list head to list all regions affected.

Section 4.5.3 discusses the technical details of how the kernel goes about inserting new regions.

Continue reading here: Operations on Regions

Was this article helpful?

0 0