nr pages

Number of pages in the direct access buffer



Number of free elements in the map array field



Offset to valid data inside the first page of the direct access buffer



Length of valid data inside the direct access buffer

struct page **


List of page descriptor pointers referring to pages in the direct access buffer (usually points to the map array field)

unsigned int


Lock flag for all pages in the direct access buffer

struct page * []


Array of 129 page descriptor pointers

struct buffer head * [ ]


Array of 1,024 preallocated buffer head pointers

unsigned long [ ]


Array of 1,024 logical block numbers

atomic t

io count

Atomic flag that indicates whether I/O is in progress



Error number of last I/O operation

void (*) (struct kiobuf *)


Completion method

wait queue head t


Queue of processes waiting for I/O to complete

Suppose a self-caching application wishes to directly access a file. As a first step, the application opens the file specifying the o_direct flag (see Section 12.6.1). While servicing the open( ) system call, the dentry_open( ) function checks the value of this flag; if it is set, the function invokes alloc_kiovec( ), which allocates a new direct access buffer descriptor and stores its address into the f_iobuf field of the file object. Initially the buffer includes no page frames, so the nr_pages field of the descriptor stores the value 0. The alloc_kiovec( ), however, preallocates 1,024 buffer heads, whose addresses are stored in the bh array of the descriptor. These buffer heads ensure that the self-caching application is not blocked while directly accessing the file (recall that ordinary data transfers block if no free buffer heads are available). A drawback of this approach, however, is that data transfers must be done in chunks of at most 512 KB.

Next, suppose the self-caching application issues a read( ) or write( ) system call on the file opened with o_direct. As mentioned earlier in this chapter, the generic_file_read( ) and generic_file_write( ) functions check the value of the flag and handle the case in a special way. For instance, the generic_file_read( ) function executes a code fragment essentially equivalent to the following:

inode = filp->f dentry->d inode->i mapping->host; if (count == 0 || *ppos >= inode->i size)

return 0; if (*ppos + count > inode->i size) count = inode->i size - *ppos; retval = generic file direct IO(READ, filp, buf, count, *ppos); if (retval > 0)

*ppos += retval; UPDATE_ATIME(filp->f_dentry->d_inode); return retval;

The function checks the current values of the file pointer, the file size, and the number of requested characters, and then invokes the generic_file_direct_iO( ) function, passing to it the read operation type, the file object pointer, the address of the User Mode buffer, the number of requested bytes, and the file pointer. The generic_file_write( ) function is similar, but of course it passes the write operation type to the generic file direct IO( ) function.

The generic_file_direct_lO( ) function performs the following steps:

1. Tests and sets the f_iobuf_lock lock in the file object. If it was already set, the direct access buffer descriptor stored in f_iobuf is already in use by a concurrent direct I/O transfer, so the function allocates a new direct access buffer descriptor and uses it in the following steps.

2. Checks that the file pointer offset and the number of requested characters are multiples of the block size of the file; returns -einval if they are not.

3. Checks that the direct_io method of the address_space object of the file (filp-

>f_dentry->d_inode->i_mapping) is defined; returns -EINVAL if it isn't.

4. Even if the self-caching application is accessing the file directly, there could be other applications in the system that access the file through the page cache. To avoid data loss, the disk image is synchronized with the page cache before starting the direct I/O transfer. The function flushes the dirty pages belonging to memory mappings of the file to disk by invoking the filemap_fdatasync( ) function (see the previous section).

5. Flushes to disk the dirty pages updated by write( ) system calls by invoking the fsync_inode_data_buffers( ) function, and waits until the I/O transfer terminates.

6. Invokes the filemap_fdatawait( ) function to wait until the I/O operations started in the Step 4 complete (see the previous section).

7. Starts a loop, and divides the data to be transferred in chunks of 512 KB. For every chunk, the function performs the following substeps:

a. Invokes map_user_kiobuf( ) to establish a mapping between the direct access buffer and the portion of the user-level buffer corresponding to the chunk. To achieve this, the function:

1. Invokes expand_kiobuf( ) to allocate a new array of page descriptor addresses in case the array embedded in the direct access buffer descriptor is too small. This is not the case here, however, because the 129 entries in the map_array field suffice to map the chunk of 512 KB (notice that the additional page is required when the buffer is not page-aligned).

2. Accesses all user pages in the chunk (allocating them when necessary by simulating Page Faults) and stores their addresses in the array pointed to by the maplist field of the direct access buffer descriptor.

3. Properly initializes the nr_pages, offset, and length fields, and resets the locked field to 0.

b. Invokes the direct_io method of the address_space object of the file (explained next).

c. If the operation type was read, invokes mark_dirty_kiobuf( ) to mark the pages mapped by the direct access buffer as dirty.

d. Invokes unmap_kiobuf( ) to release the mapping between the chunk and the direct access buffer, and then continues with the next chunk.

8. If the function allocated a temporary direct access buffer descriptor in Step 1, it releases it. Otherwise, it releases the f_iobuf_lock lock in the file object.

In almost all cases, the direct_iO method is a wrapper for the generic_direct_iO( )

function, passing it the address of the usual filesystem-dependent function that computes the position of the physical blocks on the block device (see the earlier section Section 15.1.1). This function executes the following steps:

1. For each block of the file portion corresponding to the current chunk, invokes the filesystem-dependent function to determine its logical block number, and stores this number in an entry of the blocks array in the direct access buffer descriptor. The 1,024 entries of the array suffice because the minimum block size in Linux is 512 bytes.

2. Invokes the brw_kiovec( ) function, which essentially calls the submit_bh( ) function on each block in the blocks array using the buffer heads stored in the bh array of the direct access buffer descriptor. The direct I/O operation is similar to a buffer or page I/O operation, but the b_end_io method of the buffer heads is set to the special function end_buffer_io_kiobuf( ) rather than to end_buffer_io_sync( ) or end_buffer_io_async( ) (see Section 13.4.8). The method deals with the fields of the kiobuf data structure. brw_kiovec( ) does not return until the I/O data transfers are completed.

I [email protected] RuBoard

Was this article helpful?

0 0

Post a comment