ReiserFS

ReiserFS
Developer Namesys
Full name ReiserFS
Introduced 2001 (Linux 2.4.1)
Partition identifier Apple_UNIX_SVR2 (Apple Partition Map)
0x83 (MBR)
EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (GPT)
Structures
Directory contents B+ tree
File allocation Bitmap[1]
Limits
Max file size 1 EiB (8 TiB on 32 bit systems)[2]
Max number of files 232−3 (~4 billion)[2]
Max filename length 4032 bytes, limited to 255 by Linux VFS
Max volume size 16 TiB[2]
Allowed characters in filenames All bytes except NUL and '/'
Features
Dates recorded modification (mtime), metadata change (ctime), access (atime)
Date range December 14, 1901 – January 18, 2038
Date resolution 1s
Forks Extended attributes
File system permissions Unix permissions, ACLs and arbitrary security attributes
Transparent compression No
Transparent encryption No
Supported operating systems Linux

ReiserFS is a general-purpose, journaled computer file system designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux (without quota support). Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel. ReiserFS is the default file system on the Elive, Xandros, Linspire, GoboLinux, and Yoper Linux distributions. ReiserFS was the default file system in Novell's SUSE Linux Enterprise until Novell decided to move to ext3 on October 12, 2006 for future releases.[3]

Namesys considered ReiserFS (now occasionally referred to as Reiser3) stable and feature-complete and, with the exception of security updates and critical bug fixes, ceased development on it to concentrate on its successor, Reiser4. Namesys went out of business in 2008 after Reiser was charged for the murder of his wife. However, volunteers continue to work on the open source project.[4]

Contents

Features

At the time of its introduction, ReiserFS offered features that had not been available in existing Linux file systems:

Performance

Compared with ext2 and ext3 in version 2.4 of the Linux kernel, when dealing with files under 4 KiB and with tail packing enabled, ReiserFS may be faster. This was said to be of great benefit in Usenet news spools, HTTP caches, mail delivery systems and other applications where performance with small files is critical. However, in practice news spools use a feature called cycbuf, which holds articles in one large file; fast HTTP caches and several revision control systems use similar approach, nullifying these performance advantages. For email servers, reiserfs was problematic due to semantic problems explained below. Also, ReiserFS had a problem with very fast filesystem aging when compared to other filesystems – in several usage scenarios filesystem performance decreased dramatically with time.

Before Linux 2.6.33,[6] ReiserFS heavily used the big kernel lock (BKL) — a global kernel-wide lock — which does not scale very well[7][8] for systems with multiple cores, as the critical code parts are only ever executed by one core at a time.

Criticism

Some directory operations (including unlink(2)) are not synchronous on ReiserFS, which can result in data corruption with applications relying heavily on file-based locks (such as mail transfer agents qmail[9] and Postfix[10]) if the machine halts before it has synchronized the disk.[11]

There are no programs to specifically defragment a ReiserFS file system, although tools have been written to automatically copy the contents of fragmented files hoping that more contiguous blocks of free space can be found. However, a "repacker" tool was planned for the next Reiser4 file system to deal with file fragmentation.[12]

fsck

The tree rebuild process of ReiserFS's fsck has attracted much criticism: if the file system becomes so badly corrupted that its internal tree is unusable, performing a tree rebuild operation may further corrupt existing files or introduce new entries with unexpected contents,[13] but this action is not part of normal operation or a normal file system check and has to be explicitly initiated and confirmed by the administrator.

ReiserFS v3 images should not be stored on a ReiserFS v3 partition (e.g. backups or disk images for emulators) without transforming them (e.g., by compressing or encrypting) in order to avoid confusing the rebuild. Reformatting an existing ReiserFS v3 partition can also leave behind data that could confuse the rebuild operation and make files from the old system reappear. This also allows malicious users to intentionally store files that will confuse the rebuilder. As the metadata is always in a consistent state after a file system check, corruption here means that contents of files are merged in unexpected ways with the contained file system's metadata. The ReiserFS successor, Reiser4, fixes this problem.

Earlier issues

ReiserFS in versions of the Linux kernel before 2.4.16 were considered unstable by Namesys and not recommended for production use, especially in conjunction with NFS.[14]

Early implementations of ReiserFS (prior to that in Linux 2.6.2) were also susceptible to out-of-order write hazards. But the current journaling implementation in ReiserFS is now on par with that of ext3's "ordered" journaling level.

Novell / SuSE move away from ReiserFS to ext3

Jeff Mahoney of SuSE wrote a post on Sep 14 2006 proposing to move from ReiserFS to ext3 for the default installation file system.[7] Some reasons he mentioned were scalability, "performance problems with extended attributes and ACLs", "a small and shrinking development community", and that "Reiser4 is not an incremental update and requires a reformat, which is unreasonable for most people."[7] On October 4 he wrote a response comment on a blog in order to clear up some issues.[15] He wrote that his proposal for the switch was unrelated to Reiser's "legal troubles" (i.e., Hans Reiser's prosecution for murdering his wife)[16] Mahoney wrote he "was concerned that people would make a connection where none existed" and that "the timing is entirely coincidental and the motivation is unrelated."[15]

On Oct 12, 2006, Novell similarly announced that SuSE Linux Enterprise would switch from ReiserFS to ext3.[3]

Design

ReiserFS stores file metadata ("stat items"), directory entries ("directory items"), inode block lists ("indirect items"), and tails of files ("direct items") in a single, combined B+ tree keyed by a universal object ID. Disk blocks allocated to nodes of the tree are "formatted internal blocks". Blocks for leaf nodes (in which items are packed end-to-end) are "formatted leaf blocks". All other blocks are "unformatted blocks" containing file contents. Directory items with too many entries or indirect items which are too long to fit into a node spill over into the right leaf neighbour. Block allocation is tracked by free space bitmaps in fixed locations.

By contrast, ext2 and other Berkeley FFS-like file systems of that time simply used a fixed formula for computing inode locations, hence limiting the number of files they may contain.[17] Most such file systems also store directories as simple lists of entries, which makes directory lookups and updates linear time operations and degrades performance on very large directories. The single B+ tree design in ReiserFS avoids both of these problems due to better scalability properties.

See also

References

  1. ^ Reiser FS node layout, Namesys, http://namesys.com/X0reiserfs.html#nodelayout .
  2. ^ a b c "Reiser FS Specifications", FAQ, Namesys, http://namesys.com/faq.html#reiserfsspecs .
  3. ^ a b Shankland, Stephen (2006-10-12). "Novell makes file storage software shift". Business Tech (cnet). http://news.com.com/Novell+makes+file-storage+software+shift/2100-1016_3-6125509.html. .
  4. ^ Shankland, Stephen (January 16, 2008). "Namesys vanishes, but Reiser project lives on". CNet. http://www.news.com/8301-13580_3-9851703-39.html. Retrieved 2008-01-26. 
  5. ^ Reiser, Hans. "Reiser4 is released!". http://www.namesys.com/v4/v4.html#BLOBs. Retrieved 2006-07-15. 
  6. ^ "kill-the-BKL". git.kernel.org. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8ebc423238341b52912c7295b045a32477b33f0. 
  7. ^ a b c Jeff Mahoney (2006 9 14). "Proposal: Change in default fs for releases >= 10.2". gmane.org. http://article.gmane.org/gmane.linux.suse.opensuse.devel/4312. Retrieved 2009 8 23. .
  8. ^ discussion thread stored at gmane.org
  9. ^ Daniel Robbins (2001), "Advanced file system implementor's guide". Retrieved 5. July 2006
  10. ^ Matthias Andree (2001), LKML post on Postfix synchronity assumptions. Retrieved 15. July 2006
  11. ^ NEOHAPSIS - Peace of Mind Through Integrity and Insight
  12. ^ Hans Reiser, Reiser4 design, repacker. Retrieved 5. July 2006
  13. ^ Theodore Ts'o LKML post. Retrieved 5. July 2006
  14. ^ ReiserFS download page, see warning. Retrieved 5. July 2006
  15. ^ a b comment by Jeff Mahoney (2006 10 4). "SUSE 10.2 Ditching ReiserFS as its’ default FS? (comment 29)". linux.wordpress.com / archive.org. Archived from the original on 2006 11 09. http://web.archive.org/web/20061109162537/http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-default-fs/#comment-28534. Retrieved 2009 8 23. 
  16. ^ CBS 5 / AP / BCN (2006 9 14). "Oakland Police Search Home Of Missing Woman's Ex". cbs5.com / archive.org. Archived from [cbs5.com/topstories/local_story_256204954.html the original] on 2006 11 06. http://web.archive.org/web/20061106173127/cbs5.com/topstories/local_story_256204954.html. Retrieved 2009 8 23. 
  17. ^ Mingming Cao, Theodore Y. Ts'o, Badari Pulavarty, Suparna Bhattacharya (2005-07-26). "State of the Art: Where we are with the Ext3 file system". 2005 Linux Symposium. Ottawa, Canada: IBM Linux Technology Center. http://ext2.sourceforge.net/2005-ols/paper-html/node40.html. Retrieved 2007-03-08. 

External links