C210 Radix Trees

The second tree implementation provided in library form in the kernel makes use of radix trees to organize data in memory. Radix trees differ from other trees because it is not necessary to compare the entire key at every branch, but only part of the key with the stored value of the node when performing search operations. This results in slightly different worst-case and average-case behavior than in other implementations, which are described in detail in the corresponding textbooks on algorithms. Also, radix trees are not particularly difficult to implement, which adds to their attraction.

The node data structure is defined as follows in the kernel sources:




struct radix_tree_node {

unsigned int height; /* Height from the bottom */ unsigned int count; struct rcu_head rcu_head;

void *slots[RADIX_TREE_MAP_SIZE];


slots is an array of pointers that, according to their position in the tree (i.e., the level on which the node is located), point either to other nodes or to data elements. count indicates the number of occupied array positions. The macros defined in the code segment specify statically how many array positions there are in each node. By default, the kernel uses 26 = 64. Empty slots are given a null pointer.

Every tree node can be associated with tags that correspond to a set or an unset bit. Per node, a maximum of radix_tree_max_tags different tags are possible, the default setting is a meager 2. This is, however sufficient for the page cache.

The RCU mechanism (described in Chapter 5) is used to allow lock-free radix tree lookups.

An array of unsigned longs is used to represent the tags, and radix_tree_tag_longs is computed by the kernel such that sufficient storage space is available to hold the tags. A long array with RADIX_TREE_MAX_TAGS*RADIX_TREE_TAG_LONGS contains enough bits to attach RADIX_TREE_MAX_TAGS tags to each slot. The functions radix_tree_tag_set and radix_tree_tag_clear are provided to set and clear tag bits, respectively. Notice that a tag is not only set in the leaf entry, but in every entry from root to bottom.

The tree root is defined by the following data structure (notice that this definition is in a public visible header file, in contrast to the definition of tree nodes):


struct radix_tree_root {

unsigned int height;

gfp_t gfp_mask;

struct radix_tree_node *rnode;

height specifies the current height of the tree, and rnode points to the first node. gfp_mask specifies the memory area from which the required data structure instances of the tree are to be taken.

The maximum number of elements that can be stored in a tree can be derived directly from the tree height — that is, from the number of node levels. The kernel provides the following function to compute the height:

lib/radix-tree.c static inline unsigned long radix_tree_maxindex(unsigned int height) {

return height_to_maxindex[height];

height_to_maxindex is an array that stores the maximum number of elements for different tree heights. The number is computed when the system is initialized as shown here:


#define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long)) #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \


lib/radix-tree.c static _init unsigned long _maxindex(unsigned int height)

unsigned int width = height * RADIX_TREE_MAP_SHIFT; int shift = RADIX_TREE_INDEX_BITS - width;

static _init void radix_tree_init_maxindex(void)

unsigned int i;

for (i = 0; i < ARRAY_SIZE(height_to_maxindex); i++) height_to_maxindex[i] = _maxindex(i);

At runtime, only simple array lookup is needed, and this can be done very quickly. This is important because the maximum number of elements for a given tree height needs to be computed frequently.

The elements contained in the tree are characterized by a descriptor that accepts continuous values from 0 up to the maximum number of elements that can currently be stored in the tree as follows:

radix_tree_insert is used to insert a new element in a radix tree as follows:

lib/radix-tree.c static inline void *radix_tree_indirect_to_ptr(void *ptr) {

return (void *)((unsigned long)ptr & ~RADIX_TREE_INDIRECT_PTR);

int radix_tree_insert(struct radix_tree_root *root, unsigned long index, void *item)

struct radix_tree_node *node = NULL, *slot; unsigned int height, shift; int offset; int error;

/* Make sure the tree is high enough. */ if (index > radix_tree_maxindex(root->height)) { error = radix_tree_extend(root, index); if (error)

return error;

slot = radix_tree_indirect_to_ptr(root->rnode) height = root->height;

offset = 0; /* uninitialised var warning */

/* Have to add a child node. */ if (!(slot = radix_tree_node_alloc(root)))

rcu_assign_pointer(node->slots[offset], slot); node->count++;

} else rcu_assign_pointer(root->rnode, radix_tree_ptr_to_indirect(slot));

offset = (index >> shift) & RADIX_TREE_MAP_MASK; node = slot;

slot = node->slots[offset]; shift -= RADIX_TREE_MAP_SHIFT; height-- ;

return -EEXIST;

rcu_assign_pointer(node->slots[offset], item)

rcu_assign_pointer(root->rnode, item);

return 0;

If the descriptor of the element is larger than the current number of elements that can be processed, the tree must be enlarged; this is described later in this section.

The code traverses the tree from top to bottom starting at the root, and the path is defined solely by the key being searched. Depending on the position in the tree, certain parts of the key are selected to find the matching entry in the slot array that leads to the next lower tree level. This corresponds exactly to the characteristics of radix trees. The tree is traversed in order to allocate tree branches not yet present. When this is done, the tree height does not change, because the tree can grow only in its width. The new entry is inserted in the matching slot once the code has reached level 0. Since the tree is protected by the RCU mechanism, the data pointers must not be assigned directly, but only via rcu_assign_pointer as discussed in Chapter 5.

The height of the tree is modified by radix_tree_extend — which is called, if needed, at the start of the function. It is defined as follows in the kernel sources:

lib/radix-tree.c static int radix_tree_extend(struct radix_tree_root *root, unsigned long index) {

struct radix_tree_node *node; unsigned int height; int tag;

/* Figure out what the height should be. */ height = root->height + 1;

while (index > radix_tree_maxindex(height)) height++;

root->height = height; goto out;

if (!(node = radix_tree_node_alloc(root))) return -ENOMEM;

node->slots[0] = radix_tree_indirect_to_ptr(root->rnode)

/* Propagate the aggregated tag info into the new root */ for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { if (root_tag_get(root, tag))

newheight = root->height+1; node->height = newheight; node->count = 1;

node = radix_tree_ptr_to_indirect(node); rcu_assign_pointer(root->rnode, node); root->height = newheight; } while (height > root->height);


return 0;

Depending on the new maximum index, it may be necessary to add more than one level to the tree.

The tree is expanded from the top because there is then no need to copy elements. An additional node is inserted between the root and the previous top node for each new level. Because node branches are allocated automatically when new elements are inserted, the kernel need not concern itself with this task.

The kernel provides the radix_tree_lookup function to find an element in a radix tree by reference to its key as shown here:

lib/radix-tree.c void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index) {

unsigned int height, shift;

struct radix_tree_node *node, **slot;

node = rcu_dereference(root->rnode); if (node == NULL)

return NULL;

if (!radix_tree_is_indirect_ptr(node)) { if (index > 0)

return NULL; return node;

node = radix_tree_indirect_to_ptr(node); height = node->height;

if (index > radix_tree_maxindex(height)) return NULL;

(node->slots + ((index>>shift) & RADIX_TREE_MAP_MASK)); node = rcu_dereference(*slot); if (node == NULL)

return NULL;

shift -= RADIX_TREE_MAP_SHIFT; height--; } while (height >0);

return node;

Logically, the algorithm for traversing the tree is identical to the one described previously for inserting new elements. However, searching is a simple operation because the kernel need not concern itself with allocating new branches. If a slot at any height in the tree has a null pointer and is therefore not present, the element being searched is not in the tree. Consequently, work can be terminated immediately and a null pointer can be returned.

Continue reading here: C3 Summary

Was this article helpful?

0 0