Figure 920 Code flow diagram for findgrouporlov

Let's first take a look at the standard situation in which a new subdirectory is to be created at some point in the directory tree (and not in the root directory). This corresponds to the right-hand branch in Figure 9-20. The kernel computes several variables used as criteria to establish the suitability of a block group to accommodate the desired directory node (I took the liberty of rearranging the code a little to make it easier to understand):

fs/ext2/ialloc.c int ngroups = sbi->s_groups_count;

int inodes_per_group = EXT2_INODES_PER_GROUP(sb);

freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter); avefreei = freei / ngroups;

free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter); avefreeb = free_blocks / ngroups;

ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);

blocks_per_dir = (le32_to_cpu(es->s_blocks_count)-free_blocks) / ndirs;

max_dirs = ndirs / ngroups + inodes_per_group / 16; min_inodes = avefreei - inodes_per_group / 4; min_blocks = avefreeb - EXT2_BLOCKS_PER_GROUP(sb) / 4;

max_debt = EXT2_BLOCKS_PER_GROUP(sb) / max(blocks_per_dir, BLOCK_COST); if (max_debt * INODE_COST > inodes_per_group)

max_debt = inodes_per_group / INODE_COST; if (max_debt > 255)

avefreei and avefreeb denote the number of free inodes and blocks (which can be read from the approximative per-CPU counters associated with the superblock) divided by the number of groups. The values thus specify the average number of free inodes and blocks per group. This explains the prefix ave.

max_dirs specifies the absolute upper limit for the number of directory inodes in a block group. min_inodes and min_blocks define the minimum number of free inodes or blocks in a group before a new directory may be created.

debt is a numeric value between 0 and 255. It is saved for each block group in the ext2_sb_info filesystem instance that makes the s_debts array available (ext2_sb_info is defined in Section 9.2.2). The value is incremented by 1 (in ext2_new_inode) each time a new directory inode is created, and is decremented by 1 when the inode is required for a different purpose — usually for a regular file. The value of debt is therefore an indication of the ratio between the number of directories and inodes in a block group.

Starting at the block group of the parent entry, the kernel iterates over all block groups until the following criteria are met:

□ There are no more than max_ndir directories.

□ No less than min_inodes inodes and min_blocks data blocks are free.

□ The debt value does not exceed max_debt; that is, the number of directories does not get out of hand.

If just one of these criteria is not satisfied, the kernel skips the current block group and checks the next: fs/ext2/ialloc.c for (i = 0; i < ngroups; i++) {

group = (parent_group + i) % ngroups; desc = ext2_get_group_desc (sb, group, NULL); if (!desc || !desc->bg_free_inodes_count) continue;

if (sbi->s_debts[group] >= max_debt) continue;

if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs) continue;

if (le16_to_cpu(desc->bg_free_inodes_count) < min_inodes) continue;

if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)

continue; goto found;

Division without a remainder (%) at the beginning of the loop ensures that the search is resumed at the first block group once the last block group of the partition is reached.

Once a suitable group is found (which is automatically as close as possible to the parent group unless the inode there has been removed), the kernel need only update the corresponding statistics counters and return the group number. If no group matches the requirements, the search is repeated with the help of a less demanding "fallback" algorithm:

fs/ext2/ialloc.c fallback:

group = (parent_group + i) % ngroups; desc = ext2_get_group_desc (sb, group, &bh); if (!desc || !desc->bg_free_inodes_count) continue;

if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei) goto found;

return -1;

Again, the kernel starts at the parent group. The directories are scanned one after the other. However, this time the kernel accepts the first group that contains more than the average number of inodes (specified by avefreei).

This method is modified slightly when a new subdirectory is created in the root directory of the system, as illustrated by the left-hand branch of the code flow diagram in Figure 9-20 above.

To spread the directory inodes across the filesystem as uniformly as possible, the immediate subdirectories of the root directory are distributed statistically over the block groups. The kernel uses get_random_bytes to select a random number that is trimmed to the maximum number of existing block groups by dividing (without remainder) by ngroups. The kernel then iterates as follows over the randomly selected groups and subsequent groups:

fs/ext2/ialloc.c get_random_bytes(&group, sizeof(group)); parent_group = (unsigned)group % ngroups; for (i = 0; i < ngroups; i++) {

group = (parent_group + i) % ngroups; desc = ext2_get_group_desc (sb, group, &bh); if (!desc || !desc->bg_free_inodes_count) continue;

if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir) continue;

if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei) continue;

if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)

continue; best_group = group;

best_ndir = le16_to_cpu(desc->bg_used_dirs_count); best_desc = desc; best_bh = bh;

While, again, the minimum number of free inodes or blocks must not be below the limit set by avefreei and avefreeb, the kernel also ensures that the number of free directories is not greater than or equal to best_ndir. The value is initially set to the value of inodes_per_group but is always updated to the lowest value encountered by the kernel during its search. The winner is the block group that has the fewest entries and that also satisfies the other two conditions.

If a suitable group is found, the kernel updates the statistics and returns the group number selected. Otherwise, the fallback mechanism comes into effect to find a less qualified block group.

Classic Directory Allocation

Kernel versions up to and including 2.4 did not use Orlov allocation, but the technique described below, called classic allocation. Ext2 filesystems can be mounted using the oldalloc option, which sets the ext2_mount_oldalloc bit in the s_mount_opt field of the superblock. The kernel then no longer uses the Orlov allocator but resorts to the classic scheme of allocation.18

How does the classic scheme work? The block groups of the system are scanned in a forward search, and particular attention is paid to two conditions:

1. Free space should still be available in the block group.

2. The number of directory inodes should be as small as possible compared to other inodes in the block group.

In this scheme, directory inodes are typically spread as uniformly as possible across the entire filesystem.

If none of the block groups satisfies requirements, the kernel restricts selection to groups with above average amounts of free space and from these chooses those with the fewest directory inodes.

Inode Allocation for Other Files

A simpler scheme known as quadratic hashing is applied when searching for an inode for regular files, links, and all file types other than directories. It is based on a forward search starting at the block group of the directory inode of the directory in which the new file is to be created. The first block group found with a free inode is reserved.

The block group in which the directory inode is located is searched first. Let's assume its group ID is start. If it does not have a free inode, the kernel scans the block group with the number start + 20, then start + 20 + 21, start + 20 + 21 + 22, and so on. A higher power of 2 is added to the group number in each step, which results in the sequence 1,1 + 2,1 + 2 + 4,1 + 2 + 4 + 8, ••• = 1,3,7,15,

Usually, this method quickly finds a free inode. However, if no free inode is returned on an (almost hopelessly) overfilled filesystem, the kernel scans all block groups in succession to ensure that every effort is made to find a free inode. Again, the first block group with a free inode is selected. If absolutely no inodes are free, the action is aborted with a corresponding error code.19

Continue reading here: Deleting Inodes

Was this article helpful?

0 0