How Journaling Works

Let's try to explain how journaling works with an example: the Ext3 filesystem layer receives a request to write some data blocks of a regular file.

As you might easily guess, we are not going to describe in detail every single operation of the Ext3 filesystem layer and of the JBD layer. There would be far too many issues to be covered! However, we describe the essential actions:

1. The service routine of the write( ) system call triggers the write method of the file object associated with the Ext3 regular file. For Ext3, this method is implemented by the generic_file_write( ) function, already described in Section 15.1.3.

2. The generic_file_write( ) function invokes the prepare_write method of the address_space object several times, once for every page of data involved by the write operation. For Ext3, this method is implemented by the ext3 prepare write( ) function.

3. The ext3_prepare_write( ) function starts a new atomic operation by invoking the journal_start( ) JBD function. The handle is added to the active transaction. Actually, the atomic operation handle is created only when executing the first invocation of the journal_start( ) function. Following invocations verify that the journal_info field of the process descriptor is already set and use the referenced handle.

4. The ext3 prepare write( ) function invokes the block prepare write( ) function already described in Chapter 15, passing to it the address of the ext3_get_block( ) function. Remember that block_prepare_write( ) takes care of preparing the buffers and the buffer heads of the file's page.

5. When the kernel must determine the logical number of a block of the Ext3 filesystem, it executes the ext3_get_block( ) function. This function is actually similar to ext2_get_block( ), which is described in the earlier section Section 17.6.5. A crucial difference, however, is that the Ext3 filesystem invokes functions of the JBD layer to ensure that the low-level operations are logged:

o Before issuing a low-level write operation on a metadata block of the filesystem, the function invokes journal_get_write_access( ) . Basically, this latter function adds the metadata buffer to a list of the active transaction. However, it must also check whether the metadata is included in an older incomplete transaction of the journal; in this case, it duplicates the buffer to make sure that the older transactions are committed with the old content.

o After updating the buffer containing the metadata block, the Ext3 filesystem invokes journal_dirty_metadata( ) to move the metadata buffer to the proper dirty list of the active transaction and to log the operation in the journal.

Notice that metadata buffers handled by the JBD layer are not usually included in the dirty lists of buffers of the inode, so they are not written to disk by the normal disk cache flushing mechanisms described in Chapter 14.

6. If the Ext3 filesystem has been mounted in "journal" mode, the ext3 prepare write( ) function also invokes journal get write access( ) on every buffer touched by the write operation.

7. Control returns to the generic_file_write( ) function, which updates the page with the data stored in the User Mode address space and then invokes the commit_write method of the address_space object. For Ext3, this method is implemented by the ext3_commit_write( ) function.

8. If the Ext3 filesystem has been mounted in "journal" mode, the ext3 commit write( ) function invokes journal dirty metadata( ) on every buffer of data (not metadata) in the page. This way, the buffer is included in the proper dirty list of the active transaction and not in the dirty list of the owner inode; moreover, the corresponding log records are written to the journal.

9. If the Ext3 filesystem has been mounted in "ordered" mode, the ext3_commit_write( ) function invokes the journal_dirty_data( ) function on every buffer of data in the page to insert the buffer in a proper list of the active transactions. The JBD layer ensures that all buffers in this list are written to disk before the metadata buffers of the transaction. No log record is written onto the journal.

10. If the Ext3 filesystem has been mounted in "ordered" or "writeback" mode, the ext3 commit write( ) function executes the normal generic commit write( ) function described in Chapter 15, which inserts the data buffers in the list of the dirty buffers of the owner inode.

11. Finally, ext3_commit_write( ) invokes journal_stop( ) to notify the JBD layer that the atomic operation handle is closed.

12. The service routine of the write( ) system call terminates here. However, the JBD

layer has not finished its work. Eventually, our transaction becomes complete when all its log records have been physically written to the journal. Then journal_commit_transaction( ) is executed.

13. If the Ext3 filesystem has been mounted in "ordered" mode, the journal_commit_transaction( ) function activates the I/O data transfers for all data buffers included in the list of the transaction and waits until all data transfers terminate.

14. The journal_commit_transaction( ) function activates the I/O data transfers for all metadata buffers included in the transaction (and also for all data buffers, if Ext3 was mounted in "journal" mode).

15. Periodically, the kernel activates a checkpoint activity for every complete transaction in the journal. The checkpoint basically involves verifying whether the I/O data transfers triggered by journal_commit_transaction( ) have successfully terminated. If so, the transaction can be deleted from the journal.

Of course, the log records in the journal never play an active role until a system failure occurs. Only in this case, in fact, does the e2fsck utility program scan the journal stored in the filesystem and reschedule all write operations described by the log records of the complete transactions.

I [email protected] RuBoard

Was this article helpful?

0 0

Post a comment