 | Level: Introductory Daniel Robbins (drobbins@gentoo.org), President/CEO, Gentoo Technologies, Inc.
01 Jun 2002 In the "Advanced filesystem implementor's guide" series, Daniel Robbins shows you how to use the latest filesystem technologies in Linux 2.4. Along the way, he shares valuable practical implementation advice, performance information, and important technical notes so that your new filesystem experience is as pleasant as possible. In this article, Daniel gives an update on the status of the XFS, ReiserFS, and ext3 filesystems, sharing his experiences as Chief Architect of Gentoo Linux. In addition, he outlines how these various filesystems will continue to improve over the next six months to a year. While browsing past articles, I suddenly realized that the
"Advanced filesystem implementor's guide" series has been around for nearly
a year! Don't worry, this series will will be wrapping up soon as
I cover IBM's JFS and EVMS (enterprise volume management) technologies for
Linux. But since this is an IBM site, I thought it would be best to cover
technologies developed by IBM only after I had first covered all
other new filesystem technologies for Linux.
Before we get to JFS and EVMS, let me share an
official update about the current state of affairs in the Linux filesystem world.
We've been through a lot of 2.4 kernels;
some of them have been decent, and others have been not-so-decent. And
along with the kernel, XFS, ext3, and ReiserFS have been under very active
development. During this time, lots of Gentoo Linux users have used various
combinations of XFS, ext3, and ReiserFS filesystems with varying results.
And in general, when a Gentoo Linux user has a problem with one of the new
journaled filesystems, I usually hear about it. So, what filesystems
have been most popular? Which have been the most reliable? In this article, I'll share my experiences along with feedback and status updates from the
ReiserFS, ext3, and XFS development teams.
What's up with XFS?
Over the past few months, XFS has turned out to be a popular Linux
filesystem choice. Based on feedback from Gentoo Linux users, people tend
to like XFS because of its generally good overall performance and its
robust feature set. However, the 1.0.x release of XFS has suffered from
one serious problem. You'll recall that "metadata only" journaling
filesystems like XFS and ReiserFS can cause data corruption if a file's
metadata is updated but some unforseen circumstance -- such as a crash --
prevents the new data from hitting the disk. In the case of ReiserFS, an
affected file will contain stale or garbage data blocks, and in the case
of XFS, the file will contain blocks consisting entirely of binary zeros.
It turns out that XFS 1.0.x had the unfortunate tendency of frequently
mangling recently modified files if your server happened to crash or
unexpectedly lose power. Those who happened to be using XFS on a rugged
server were generally fine, but those who were running XFS on a system
that was suffering from some kind of software or hardware stability
problem faced the risk of losing a good deal of data.
Fortunately, the SGI XFS guys dramatically reduced the incidence of this
problem in XFS 1.1. The problem manifested itself much more often with
XFS 1.0 because certain kinds metadata updates were required to be
recorded to the filesystem in the order that they occured. These in-order
metadata updates, called "synchronous" metadata updates, also had the
effect of flushing all previous pending metadata updates to disk. Here's
where the problem arose. If some of these early flushes of metadata also
had corresponding data blocks that needed to get flushed, then it was
possible that the new data blocks wouldn't hit the disk for up to 30
seconds after the metadata was recorded. This created a relatively large
window for data loss to occur.
 |
Technical note
With XFS 1.1, a filesystem's metadata is only updated synchronously (in-order) in
two cases:
- If the filesystem needs to allocate new space and there's a
pending transaction to free that same space
- When XFS processes transactions for files opened with the
O_SYNC (synchronous) option; in this case, writes to this file will cause any
of the filesystem's other pending metadata changes to be flushed to disk.
Fortunately, the vast majority of a typical server's I/O operations are
asynchronous in nature.
|
|
If the system rebooted or died during this window (after the
metadata was flushed but before the corresponding data was written to
disk), then both the old and new data could be lost. Here's why this
could happen: the metadata update would erase any reference to the
original data block(s), but would point to data block(s) on disk that were
never filled with data. When the server started up again after the crash,
the XFS code would look at the journal, realize the situation, and fill
those incomplete data blocks with binary zeros as a security precaution.
Unfortunately, the data would be lost for good.
This problem could be particularly troublesome in situations where files
were regularly overwritten with completely new data. In these situations,
the early flushing of metadata could cause the entire contents of the file
to be lost if the system happened to die at the wrong time. This
particular scenario bit the gentoo.org server a couple of times, resulting
in data loss. Since our mailman mailing list software would overwrite its
own configuration file with new data every few minutes, it was a prime
candidate to fall prey to the scenario I describe above.
The moral of the story is this: the SGI guys have dramatically improved
the situation in XFS 1.1, and if you're running XFS 1.0, then you should
definitely plan to upgrade to XFS 1.1 in the near future. XFS 1.1 also
includes many additional fixes. Oh, and when SGI reduced XFS's dependence
on synchronous metadata updates, it also had the effect of speeding up one
of XFS 1.0.x's weak spots -- file deletion. Yay!
In the near future, we can also expect to see a new release of XFS that is
better suited for Intel's Itanium platform. Right now, XFS for Linux
requires that the XFS filesystem block size is the same size as the
platform's memory page size. This often makes it impossible to move disks
from x86 systems to Itanium systems, since the Itanium can use a page size
up to 64 K, while the x86 is stuck at 4 K. In addition, a filesystem block
size of 64 K is a suboptimal choice for most tasks, and the current code
would force some Itanium systems to use this filesystem block size. When
this block size issue is fixed, it will not only make it easy to migrate XFS
filesystems from x86 to ia64, but it will also provide the added benefit of
allowing system administrators to choose an XFS filesystem block size that
corresponds to their needs.
ReiserFS
The ReiserFS filesystem is arguably the most ambitious journaled
filesystem development project because it's not just a port of an existing
filesystem to the Linux kernel (XFS, JFS,) nor is its design based upon
that of an earlier filesystem as is ext3. In contrast, ReiserFS has been
designed completely from scratch and boasts some very impressive
performance numbers when it comes to the handling of small files. So, how
has ReiserFS fared in terms of stability and general filesystem robustness
since its introduction to the 2.4 kernel?
Since its introduction, ReiserFS has had an unusually high number of
stability and corruption problems. There are a number of kernels that
have been total nightmares for ReiserFS users, including 2.4.3, 2.4.9, and
even the relatively recent 2.4.16. However, while some of these issues
have been caused by bugs in the ReiserFS filesystem code itself, a
surprising number of them have been unwanted side-effects caused by
changes made to other parts of the kernel. One unfortunate thing about
the Linux kernel development process is that no matter how carefully you
test your own code, it's possible for some other kernel developer to sneak
in a relatively untested change that causes your code to break. All too
often, intra-developer communication only happens after these
unwanted side-effects have been introduced and released to the
unsuspecting Linux computing public. I think it's fair to say that there
are a good number of disheartened ReiserFS users out there who have found
themselves in this unfortunate no-win situation.
But there is good news, my friends. In the last few months, things have
started looking a lot better for ReiserFS. For one, the kernel sources
have started to stabilize around the 2.4.17 release. In addition, the
guys at Namesys (the developers of ReiserFS) have been able to fix quite a
few obscure bugs in their code over the past few months. And the news
gets even better -- it appears that kernel 2.4.18 has a very solid
ReiserFS implementation. And 2.4.18 isn't exactly a spring chicken -- at
the time this article was written, it was nearly 3 months old and there
still haven't been any major problems found in the code. In fact, due to a lack of incoming bug reports, Namesys
has reassigned the Release Manager to a new job of improving
ReiserFS performance.
So, it appears that ReiserFS and the 2.4 kernel have finally resolved
their differences. For me personally, this is heartening news; I'm very
eager to start using ReiserFS again and I plan to use it as my root
filesystem when I next reload my development workstation. I'm sure there
are many other ex-ReiserFS users who will be moving back to ReiserFS now
that things have calmed down in kernel-land. Frankly, it's quite hard to
live without ReiserFS once you've seen how its small file performance can
boost the performance of certain applications.
So, what can we expect to see from ReiserFS in the near future? According
to Hans Reiser and his team of developers, there are some very nice
improvements that are scheduled to appear in the 2.4.20_pre1 kernel,
including Chris Mason's data journaling (like ext3's "data=journal" mode!)
support, new block allocation code that scales much better, and several
improvements in large file peformance, resulting in an up to 15%
performance improvement when reading large files from IDE drives. Beyond
these immediate and significant improvements, we are likely to soon see
ReiserFS support the equivalent of ext3's "data=ordered" mode. At that
point, ReiserFS will offer equivalent data integrity features to those
found in the ext3 filesystem. I'm very happy to see that the ReiserFS
development team is making data integrity (not just metadata integrity)
such a high priority.
Ext3
So, what about ext3? In general, ext3 has been quite stable and hasn't
suffered from any major issues. For this reason, ext3 has gained a
reputation as being a very reliable and robust journaled filesystem
choice. While some may consider the filesystem to be "boring" because it
doesn't sport any major improvements over ext2 besides a very good
journaling implementation, "boring" is a good thing in the world of
filesystems. It means that the filesystem is very good at simply doing
its job without fuss or incident. In addition, despite ext3's scalability
limitations when compared to ResierFS, XFS, and JFS, ext3 has shown itself
to be very fast and well-tuned for the typical kinds of filesystem
operations performed by most servers and workstations. It's clear that
the ext3 developers have met their goal of creating a high-quality
journaling filesystem that Linux users can upgrade to easily and
confidently.
With kernel 2.4.19_pre5, synchronous mounts of ext3 filesystems and
"chattr +S"'d files now perform about ten times faster than they did
previously. In the near future, expect to see the addition of an option
for synchronous updates of specific directory trees, which is a feature
that will be of use primarily to mailer programs. Besides that, we can
expect to see regular small bug fixes and performance improvements to the
code, but nothing major; ext3 is already quite refined, and the code now
appears to be in maintenance mode.
Thanks for joining me in this article, and please join me next time as we
take a look at JFS!
Resources
About the author  | |  |
Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect
of Gentoo Linux, an advanced
ports-based Linux for x86, PowerPC, Sparc, and Sparc64 systems. He has
also served as a contributing author for several books, including
Samba Unleashed and SuSE Linux
Unleashed. Daniel enjoys spending time with his wife, Mary,
and his daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org.
|
Rate this page
|  |