The Common File Model
The key idea behind the VFS consists of introducing a common file model capable of representing all supported filesystems. This model strictly mirrors the file model provided by the traditional Unix filesystem. This is not surprising, since Linux wants to run its native filesystem with minimum overhead. However, each specific filesystem implementation must translate its physical organization into the VFS's common file model.
For instance, in the common file model, each directory is regarded as a file, which contains a list of files and other directories. However, several non-Unix disk-based filesystems use a File Allocation Table (FAT), which stores the position of each file in the directory tree. In these filesystems, directories are not files. To stick to the VFS's common file model, the Linux implementations of such FAT-based filesystems must be able to construct on the fly, when needed, the files corresponding to the directories. Such files exist only as objects in kernel memory.
More essentially, the Linux kernel cannot hardcode a particular function to handle an operation such as read( ) or ioctl( ). Instead, it must use a pointer for each operation; the pointer is made to point to the proper function for the particular filesystem being accessed.
Let's illustrate this concept by showing how the read( ) shown in Figure 12-1 would be translated by the kernel into a call specific to the MS-DOS filesystem. The application's call to read( ) makes the kernel invoke the corresponding sys_read( ) service routine, just like any other system call. The file is represented by a file data structure in kernel memory, as we shall see later in this chapter. This data structure contains a field called f_op that contains pointers to functions specific to MS-DOS files, including a function that reads a file. sys_read( ) finds the pointer to this function and invokes it. Thus, the application's read( ) is turned into the rather indirect call:
Similarly, the write( ) operation triggers the execution of a proper Ext2 write function associated with the output file. In short, the kernel is responsible for assigning the right set of pointers to the file variable associated with each open file, and then for invoking the call specific to each filesystem that the f_op field points to.
One can think of the common file model as object-oriented, where an object is a software construct that defines both a data structure and the methods that operate on it. For reasons of efficiency, Linux is not coded in an object-oriented language like C+ + . Objects are therefore implemented as data structures with some fields pointing to functions that correspond to the object's methods.
The common file model consists of the following object types:
The superblock object
Stores information concerning a mounted filesystem. For disk-based filesystems, this object usually corresponds to a filesystem control block stored on disk.
The inode object
Stores general information about a specific file. For disk-based filesystems, this object usually corresponds to a file control block stored on disk. Each inode object is associated with an inode number, which uniquely identifies the file within the filesystem.
The file object
Stores information about the interaction between an open file and a process. This information exists only in kernel memory during the period when each process accesses a file.
The dentry object
Stores information about the linking of a directory entry with the corresponding file. Each disk-based filesystem stores this information in its own particular way on disk.
Figure 12-2 illustrates with a simple example how processes interact with files. Three different processes have opened the same file, two of them using the same hard link. In this case, each of the three processes uses its own file object, while only two dentry objects are required—one for each hard link. Both dentry objects refer to the same inode object, which identifies the superblock object and, together with the latter, the common disk file.
Figure 12-2. Interaction between processes and VFS objects
Figure 12-2. Interaction between processes and VFS objects
Besides providing a common interface to all filesystem implementations, the VFS has another important role related to system performance. The most recently used dentry objects are contained in a disk cache named the dentry cache, which speeds up the translation from a file pathname to the inode of the last pathname component.
Generally speaking, a disk cache is a software mechanism that allows the kernel to keep in RAM some information that is normally stored on a disk, so that further accesses to that data can be quickly satisfied without a slow access to the disk itself. [3] Beside the dentry cache, Linux uses other disk caches, like the buffer cache and the page cache, which are described in forthcoming chapters.
[3] Notice how a disk cache differs from a hardware cache or a memory cache, neither of which has anything to do with disks or other devices. A hardware cache is a fast static RAM that speeds up requests directed to the slower dynamic RAM (see Section 2.4.7). A memory cache is a software mechanism introduced to bypass the Kernel Memory Allocator (see Section 7.2.1).
12.1.2 System Calls Handled by the VFS
Table 12-1 illustrates the VFS system calls that refer to filesystems, regular files, directories, and symbolic links. A few other system calls handled by the VFS, such as ioperm( ), ioctl( ), pipe( ), and mknod( ), refer to device files and pipes. These are discussed in later chapters. A last group of system calls handled by the VFS, such as socket( ), connect( ), bind( ), and protocols( ), refer to sockets and are used to implement networking; some of them are discussed in Chapter 18. Some of the kernel service routines that correspond to the system calls listed in Table 12-1 are discussed either in this chapter or in Chapter 17.
|
System call name |
Description |
|
mount( ) umount( ) |
Mount/unmount filesystems |
|
sysfs( ) |
Get filesystem information |
|
statfs( ) fstatfs( ) ustat( ) |
Get filesystem statistics |
|
chroot( ) pivot root( ) |
Change root directory |
|
chdir( ) fchdir( ) getcwd( ) |
Manipulate current directory |
|
mkdir( ) rmdir( ) |
Create and destroy directories |
|
getdents( ) readdir( ) link( ) unlink( ) rename( ) |
Manipulate directory entries |
|
readlink( ) symlink( ) |
Manipulate soft links |
|
chown( ) fchown( ) lchown( ) |
Modify file owner |
|
chmod( ) fchmod( ) utime( ) |
Modify file attributes |
|
stat( ) fstat( ) lstat( ) access( ) |
Read file status |
|
open( ) close( ) creat( ) umask( ) |
Open and close files |
|
dup( ) dup2( ) fcntl( ) |
Manipulate file descriptors |
|
select( ) poll( ) |
Asynchronous I/O notification |
|
truncate( ) ftruncate( ) |
Change file size |
|
lseek( ) llseek( ) |
Change file pointer |
|
read( ) write( ) readv( ) writev( ) sendfile( ) readahead( ) |
Carry out file I/O operations |
|
pread( ) pwrite( ) |
Seek file and access it |
|
mmap( ) munmap( ) madvise( ) mincore( ) |
Handle file memory mapping |
|
fdatasync( ) fsync( ) sync( ) msync( ) |
Synchronize file data |
|
flock( ) |
Manipulate file lock |
We said earlier that the VFS is a layer between application programs and specific filesystems. However, in some cases, a file operation can be performed by the VFS itself, without invoking a lower-level procedure. For instance, when a process closes an open file, the file on disk doesn't usually need to be touched, and hence the VFS simply releases the corresponding file object. Similarly, when the lseek( ) system call modifies a file pointer, which is an attribute related to the interaction between an opened file and a process, the VFS needs to modify only the corresponding file object without accessing the file on disk and therefore does not have to invoke a specific filesystem procedure. In some sense, the VFS could be considered a "generic" filesystem that relies, when necessary, on specific ones.
I [email protected] RuBoard
Continue reading here: Inode Objects
Was this article helpful?