The nodes of a radix tree are essentially represented by the following data structure:





struct radix_tree_node {

unsigned int height; /* Height from the bottom */

unsigned int count; struct rcu_head rcu_head;

void *slots[RADIX_TREE_MAP_SIZE];


The layout of this data structure is also very simple. slots is an array of void pointers that — depending on the level in which the node is located — point to either data elements or further nodes. count holds the number of used array entries in the node. The array is filled with entries starting at the top, and unused entries have null pointers.

Each tree node can point to 64 further nodes (or leaves) as indicated in the definition of the slot array in radix_tree_node. The direct consequence of this definition is that each node may have only an array size that is a power of two. Also, the size of all radix elements may only be defined at compilation time (of course, the maximum number of elements in a tree can change at run time). This behavior is rewarded by speed gains.


The information discussed so far — the address space and the page tree — does not, however, allow the kernel to make a direct distinction between the clean and dirty pages of a mapping. This distinction is essential when, for example, pages are to be written back to store changes permanently on the underlying block device. Earlier kernel versions provided additional lists of dirty and clean pages in address_space. In principle, the kernel could, of course, scan the entire tree and filter out the pages with the appropriate state, but this is obviously very time-consuming. For this reason, each node of the radix tree includes additional tagging information that specifies whether each page in the node has the property specified in the tag. For example, the kernel uses a tag to label nodes with dirty pages. Nodes without this tag can therefore be skipped during a scan for dirty pages. This approach is a compromise between simple, unified data structures (no explicit lists are needed to hold pages with different states) and the option of performing a quick search for pages with specific properties. Currently, two tags are supported:

1. pagecache_tag_dirty specifies whether a page is dirty.

2. pagecache_tag_writeback indicates that the page is being written back at the moment.

The tagging information is stored in a two-dimensional array (tags) that is a part of radix_tree_node. The first array dimension distinguishes between the possible tags, and the second contains a sufficient number of elements of unsigned longs so that there is a bit for each page that can be organized in the node.

radix_tree_tag_set is used to set a flag for a specific page: <radix-tree.h>

void *radix_tree_tag_set(struct radix_tree_root *root, unsigned long index, unsigned int tag);

The kernel searches for the corresponding positions in the bit list and sets the bit to 1. When this is done, the tree is scanned from top to bottom to update the information in all nodes.

In order to find all pages with a certain tag, the kernel still has to scan the entire tree, but this operation can be accelerated by first filtering out all subtrees that contain at least one page for which the flag is set. Again, this can be speeded up because the kernel does not check each bit one after the other but simply checks whether at least one of the unsigned long variables in which the bits are stored is greater than 1:

lib/radix-tree.c int radix_tree_tagged(struct radix_tree_root *root, int tag) {

int idx;

for (idx = 0; idx < RADIX_TREE_TAG_LONGS; idx++) { if (root->rnode->tags[tag][idx]) return 1;

return 0;

Accessing Radix Tree Elements

The kernel also provides the following functions to process radix trees (they are all implemented in lib/radix_tree.c):


int radix_tree_insert(struct radix_tree_root *, unsigned long, void *);

void *radix_tree_lookup(struct radix_tree_root *, unsigned long);

void *radix_tree_delete(struct radix_tree_root *, unsigned long);

int radix_tree_tag_get(struct radix_tree_root *root, unsigned long index, unsigned int tag); void *radix_tree_tag_clear(struct radix_tree_root *root, unsigned long index, unsigned int tag);

□ radix_tree_insert adds a new element to a radix tree by means of a void pointer. The tree is automatically expanded if too little capacity is available.

□ radix_tree_lookup finds a radix tree element whose key — an integer — was passed to the function as argument. The value returned is a void pointer that must be converted to the appropriate target data type.

□ radix_tree_delete removes a tree element selected by means of its integer key. A pointer to the deleted object is returned if deletion was successful.

□ radix_tree_tag_get checks if a tag is present on a radix tree node. If the tag is set, the function returns 1; otherwise, 0.

□ radix_tree_tag_clear deletes a tag in a radix tree node. The change is propagated upward in the tree; that is, if all elements on one level have no tags, then the bit is also removed in the next higher level, and so on. The address of the tagged item is returned upon success.

These functions are implemented largely by shifting numbers as described in Appendix C.

To ensure that radix trees are manipulated very quickly, the kernel uses a separate slab cache that holds instances of radix_tree_node for rapid allocation.

Continue reading here: Caution The slab cache stores only the data structures needed to create the tree This has nothing to do with the memory used for the cached pages which is allocated and managed independently

Was this article helpful?

0 0