Deleting Inodes

The ext2_free_inode( ) function deletes a disk inode, which is identified by an inode object whose address is passed as the parameter. The kernel should invoke the function after a series of cleanup operations involving internal data structures and the data in the file itself. It should come after the inode object has been removed from the inode hash table, after the last hard link referring to that inode has been deleted from the proper directory and after the file is truncated to 0 length to reclaim all its data blocks (see Section 17.6.6 later in this chapter). It performs the following actions:

1. Invokes down( ) on the s_lock semaphore included in the parent superblock to get exclusive access to the superblock object.

2. Invokes clear_inode( ) to perform the following operations:

a. Invokes invalidate_inode_buffers( ) to remove the dirty buffers that belong to the inode from its i_dirty_buffers and i_dirty_data_buffers lists (see Section 14.2.1).

b. If the i_lock flag of the inode is set, some of the inode's buffers are involved in I/O data transfers; the function suspends the current process until these I/O data transfers terminate.

c. Invokes the clear_inode method of the superblock object, if defined; the Ext2 filesystem does not define it.

d. Sets the state of the inode to i_clear (the inode object contents are no longer meaningful).

3. Computes the index of the block group containing the disk inode from the inode number and the number of inodes in each block group.

4. Invokes load_inode_bitmap( ) to get the inode bitmap.

5. Increments the bg_free_inodes_count field of the group descriptor. If the deleted inode is a directory, decrements the bg_used_dirs_count field. Marks the buffer that contains the group descriptor as dirty.

6. Increments the s_free_inodes_count field of the disk superblock and marks the buffer that contains it as dirty. Also sets the s_dirt field of the superblock object to 1.

7. Clears the bit corresponding to the disk inode in the inode bitmap and marks the buffer that contains the bitmap as dirty. Moreover, if the filesystem has been mounted with the ms_synchronize flag, invokes ll_rw_block( ) and waits until the write operation on the bitmap's buffer terminates.

8. Invokes up( ) on the s_lock semaphore included in the parent superblock object. 17.6.3 Data Blocks Addressing

Each nonempty regular file consists of a group of data blocks. Such blocks may be referred to either by their relative position inside the file (their file block number) or by their position inside the disk partition (their logical block number, explained in Section 13.4.4).

Deriving the logical block number of the corresponding data block from an offset f inside a file is a two-step process:

1. Derive from the offset f the file block number — the index of the block that contains the character at offset f.

2. Translate the file block number to the corresponding logical block number.

Since Unix files do not include any control characters, it is quite easy to derive the file block number containing the f th character of a file: simply take the quotient of f and the filesystem's block size and round down to the nearest integer.

For instance, let's assume a block size of 4 KB. If f is smaller than 4,096, the character is contained in the first data block of the file, which has file block number 0. If f is equal to or greater than 4,096 and less than 8,192, the character is contained in the data block that has file block number 1, and so on.

This is fine as far as file block numbers are concerned. However, translating a file block number into the corresponding logical block number is not nearly as straightforward, since the data blocks of an Ext2 file are not necessarily adjacent on disk.

The Ext2 filesystem must therefore provide a method to store the connection between each file block number and the corresponding logical block number on disk. This mapping, which goes back to early versions of Unix from AT&T, is implemented partly inside the inode. It also involves some specialized blocks that contain extra pointers, which are an inode extension used to handle large files.

The i_block field in the disk inode is an array of ext2_n_blocks components that contain logical block numbers. In the following discussion, we assume that ext2_n_blocks has the default value, namely 15. The array represents the initial part of a larger data structure, which is illustrated in Figure 17-5. As can be seen in the figure, the 15 components of the array are of 4 different types:

• The first 12 components yield the logical block numbers corresponding to the first 12 blocks of the file—to the blocks that have file block numbers from 0 to 11.

• The component at index 12 contains the logical block number of a block that represents a second-order array of logical block numbers. They correspond to the file block numbers ranging from 12 to b/4+11, where b is the filesystem's block size (each logical block number is stored in 4 bytes, so we divide by 4 in the formula). Therefore, the kernel must look in this component for a pointer to a block, and then look in that block for another pointer to the ultimate block that contains the file contents.

• The component at index 13 contains the logical block number of a block containing a second-order array of logical block numbers; in turn, the entries of this second-order array point to third-order arrays, which store the logical block numbers that correspond to the file block numbers ranging from b/4+12 to (b/4)2+(b/4) + 11.

• Finally, the component at index 14 uses triple indirection: the fourth-order arrays store the logical block numbers corresponding to the file block numbers ranging from (b/4)2+(b/4)+12 to (b/4)3+(b/4)2+(b/4)+11 upward.

Figure 17-5. Data structures used to address the file's data blocks

Figure 17-5. Data structures used to address the file's data blocks

In Figure 17-5, the number inside a block represents the corresponding file block number. The arrows, which represent logical block numbers stored in array components, show how the kernel finds its way to reach the block that contains the actual contents of the file.

Notice how this mechanism favors small files. If the file does not require more than 12 data blocks, any data can be retrieved in two disk accesses: one to read a component in the i_block array of the disk inode and the other to read the requested data block. For larger files, however, three or even four consecutive disk accesses may be needed to access the required block. In practice, this is a worst-case estimate, since dentry, buffer, and page caches contribute significantly to reduce the number of real disk accesses.

Notice also how the block size of the filesystem affects the addressing mechanism, since a larger block size allows the Ext2 to store more logical block numbers inside a single block. Table 17-10 shows the upper limit placed on a file's size for each block size and each addressing mode. For instance, if the block size is 1,024 bytes and the file contains up to 268 kilobytes of data, the first 12 KB of a file can be accessed through direct mapping and the remaining 13-268 KB can be addressed through simple indirection. Files larger than 2 GB must be opened on 32-bit architectures by specifying the o_largefile opening flag. In any case, the Ext2 filesystem puts an upper limit on the file size equal to 2 TB minus 4,096 bytes.

Table 17-10. File size upper limits for data block addressing

Block Size






12 KB

268 KB

64.26 MB

16.06 GB


24 KB

1.02 MB

513.02 MB

256.5 GB


48 KB

4.04 MB

4 GB

~ 2 TB

Continue reading here: File Holes

Was this article helpful?

0 0