The basic idea of Ext3 is to regard each operation on the filesystem metadata as a transaction that is saved in a journal before it is performed. Once the transaction has terminated (i.e., when the desired modifications to the metadata have been made), the associated information is removed from the journal. If a system error occurs after transaction data have been written to the journal — but before (or during) performance of the actual operations — the pending operations are carried out in their entirety the next time the filesystem is mounted. The filesystem is then automatically in a consistent state. If the interruption occurs before the transaction is written to the journal, the operation itself is not performed because the information on it is lost when the system is restarted, but at least filesystem consistency is retained.

However, Ext3 cannot perform miracles. It is still possible to lose data because of a system crash. Nevertheless, the filesystem can always be restored to a consistent state very quickly afterward.

The additional overhead needed to log transactions is, of course, reflected in the performance of Ext3, which does not quite match that of Ext2. The kernel is able to access the Ext3 filesystem in three different ways in order to strike a suitable balance between performance and data integrity in all situations:

1. In writeback mode, only changes to the metadata are logged to the journal. Operations on useful data bypass the journal. This mode guarantees highest performance but lowest data protection.

2. In ordered mode only changes to the metadata are logged to the journal. However, changes to useful data are grouped and are always made before operations are performed on the metadata. This mode is therefore slightly slower than Writeback mode.

3. In journal mode, changes not only to metadata but also to useful data are written to the journal. This guarantees the highest level of data protection but is by far the slowest mode (except in a few pathological situations). The chance of losing data is minimized.

The desired mode is specified in the data parameter when the filesystem is mounted. The default is ordered.

As already stated, the Ext3 filesystem is designed to be fully compatible with Ext2 — not only downward but also (as far as possible) upward. The journal therefore resides in a special file with (as usual) its own inode. This enables Ext3 filesystems to be mounted on systems that support only Ext2. Even existing Ext2 partitions can be converted to Ext3 quickly and, above all, without the need for complicated data copying operations — a major consideration on server systems.

The journal can be held not only in a special file but also on a separate partition, but the details are not discussed here.

The kernel includes a layer called a journaling block device (JBD) layer to handle journals and associated operations. Although this layer can be used on different filesystems, currently it is used only by Ext3. All other journaling filesystems such as ReiserFS, XFS, and JFS have their own mechanisms. In the sections below, therefore, JBD and Ext3 are regarded as a single unit.

Continue reading here: Log Records Handles and Transactions

Was this article helpful?

0 0