The Common File Model

The key idea behind the VFS consists of introducing a common file model capable of representing all supported filesystems. This model strictly mirrors the file model provided by the traditional Unix filesystem. This is not surprising, since Linux wants to run its native filesystem with minimum overhead. However, each specific filesystem implementation must translate its physical organization into the VFS's common file model.

For instance, in the common file model, each directory is regarded as a file, which contains a list of files and other directories. However, several non-Unix disk-based filesystems use a File Allocation Table (FAT), which stores the position of each file in the directory tree. In these filesystems, directories are not files. To stick to the VFS's common file model, the Linux implementations of such FAT-based filesystems must be able to construct on the fly, when needed, the files corresponding to the directories. Such files exist only as objects in kernel memory.

More essentially, the Linux kernel cannot hardcode a particular function to handle an operation such as read( ) or ioctl( ). Instead, it must use a pointer for each operation; the pointer is made to point to the proper function for the particular filesystem being accessed.

Let's illustrate this concept by showing how the read( ) shown in Figure 12-1 would be translated by the kernel into a call specific to the MS-DOS filesystem. The application's call to read( ) makes the kernel invoke the corresponding sys_read( ) service routine, just like any other system call. The file is represented by a file data structure in kernel memory, as we shall see later in this chapter. This data structure contains a field called f_op that contains pointers to functions specific to MS-DOS files, including a function that reads a file. sys_read( ) finds the pointer to this function and invokes it. Thus, the application's read( ) is turned into the rather indirect call:

Similarly, the write( ) operation triggers the execution of a proper Ext2 write function associated with the output file. In short, the kernel is responsible for assigning the right set of pointers to the file variable associated with each open file, and then for invoking the call specific to each filesystem that the f_op field points to.

One can think of the common file model as object-oriented, where an object is a software construct that defines both a data structure and the methods that operate on it. For reasons of efficiency, Linux is not coded in an object-oriented language like C+ + . Objects are therefore implemented as data structures with some fields pointing to functions that correspond to the object's methods.

The common file model consists of the following object types:

The superblock object

Stores information concerning a mounted filesystem. For disk-based filesystems, this object usually corresponds to a filesystem control block stored on disk.

The inode object

Stores general information about a specific file. For disk-based filesystems, this object usually corresponds to a file control block stored on disk. Each inode object is associated with an inode number, which uniquely identifies the file within the filesystem.

The file object

Stores information about the interaction between an open file and a process. This information exists only in kernel memory during the period when each process accesses a file.

The dentry object

Stores information about the linking of a directory entry with the corresponding file. Each disk-based filesystem stores this information in its own particular way on disk.

Figure 12-2 illustrates with a simple example how processes interact with files. Three different processes have opened the same file, two of them using the same hard link. In this case, each of the three processes uses its own file object, while only two dentry objects are required—one for each hard link. Both dentry objects refer to the same inode object, which identifies the superblock object and, together with the latter, the common disk file.

Figure 12-2. Interaction between processes and VFS objects

Figure 12-2. Interaction between processes and VFS objects

Linux File Object

Besides providing a common interface to all filesystem implementations, the VFS has another important role related to system performance. The most recently used dentry objects are contained in a disk cache named the dentry cache, which speeds up the translation from a file pathname to the inode of the last pathname component.

Generally speaking, a disk cache is a software mechanism that allows the kernel to keep in RAM some information that is normally stored on a disk, so that further accesses to that data can be quickly satisfied without a slow access to the disk itself. [3] Beside the dentry cache, Linux uses other disk caches, like the buffer cache and the page cache, which are described in forthcoming chapters.

[3] Notice how a disk cache differs from a hardware cache or a memory cache, neither of which has anything to do with disks or other devices. A hardware cache is a fast static RAM that speeds up requests directed to the slower dynamic RAM (see Section 2.4.7). A memory cache is a software mechanism introduced to bypass the Kernel Memory Allocator (see Section 7.2.1).

12.1.2 System Calls Handled by the VFS

Table 12-1 illustrates the VFS system calls that refer to filesystems, regular files, directories, and symbolic links. A few other system calls handled by the VFS, such as ioperm( ), ioctl( ), pipe( ), and mknod( ), refer to device files and pipes. These are discussed in later chapters. A last group of system calls handled by the VFS, such as socket( ), connect( ), bind( ), and protocols( ), refer to sockets and are used to implement networking; some of them are discussed in Chapter 18. Some of the kernel service routines that correspond to the system calls listed in Table 12-1 are discussed either in this chapter or in Chapter 17.

Table 12-1. Some system calls handled by the VFS

System call name

Description

mount( ) umount( )

Mount/unmount filesystems

sysfs( )

Get filesystem information

statfs( ) fstatfs( ) ustat( )

Get filesystem statistics

chroot( ) pivot root( )

Change root directory

chdir( ) fchdir( ) getcwd( )

Manipulate current directory

mkdir( ) rmdir( )

Create and destroy directories

getdents( ) readdir( ) link( ) unlink( ) rename( )

Manipulate directory entries

readlink( ) symlink( )

Manipulate soft links

chown( ) fchown( ) lchown( )

Modify file owner

chmod( ) fchmod( ) utime( )

Modify file attributes

stat( ) fstat( ) lstat( ) access( )

Read file status

open( ) close( ) creat( ) umask( )

Open and close files

dup( ) dup2( ) fcntl( )

Manipulate file descriptors

select( ) poll( )

Asynchronous I/O notification

truncate( ) ftruncate( )

Change file size

lseek( ) llseek( )

Change file pointer

read( ) write( ) readv( ) writev( ) sendfile( ) readahead( )

Carry out file I/O operations

pread( ) pwrite( )

Seek file and access it

mmap( ) munmap( ) madvise( ) mincore( )

Handle file memory mapping

fdatasync( ) fsync( ) sync( ) msync( )

Synchronize file data

flock( )

Manipulate file lock

We said earlier that the VFS is a layer between application programs and specific filesystems. However, in some cases, a file operation can be performed by the VFS itself, without invoking a lower-level procedure. For instance, when a process closes an open file, the file on disk doesn't usually need to be touched, and hence the VFS simply releases the corresponding file object. Similarly, when the lseek( ) system call modifies a file pointer, which is an attribute related to the interaction between an opened file and a process, the VFS needs to modify only the corresponding file object without accessing the file on disk and therefore does not have to invoke a specific filesystem procedure. In some sense, the VFS could be considered a "generic" filesystem that relies, when necessary, on specific ones.

I [email protected] RuBoard

+1 -1

Post a comment