Talk:Comparison of file systems

From Wikipedia, the free encyclopedia

Info This page was recently split from Talk:File system; old discussions are still at Talk:File system/archive 1.

Contents

[edit] Incorrect or incomplete

I think this article is very incorrect and incomplete, and should be somewhat rewrote. Suggestions, here, please, as this weekend if no one does it firts, I'll do the job. —Claunia 02:37, 2 December 2005 (UTC)

  • Please explain what you think is incorrect and incomplete, and why you think that rewriting is the answer, instead of merely correcting the parts that are incorrect and filling in the parts that are incomplete. Uncle G 03:55, 2 December 2005 (UTC)
    • Well, I mean correcting, but there are so many errors, that will be almost a rewrote. For example, it says that all filesystems supports any characters but NULL, but that is not real, specifications says illegal characters. It treats HFS and HFS+ as the same, when they are very different filesystems, and FFS and UFS1 as different filesystems, when they are just the same. It says that are not ADS aware in a lot of filesystems that are (ext2, ext3, XFS, JFS, Reiser, etc). And the most important thing I saw is that is says HFS/HFS+ not uses extents, when they are basically extent based (Extent Overflow File). There is also information in infoboxes that are not here, and viceversa. Both should be enriched. —Claunia 13:31, 2 December 2005 (UTC)
      • Most filesystems do support any characters except NUL. It is only the filesystem drivers that implement additional restrictions. There's even a clear footnote on this, footnote number 25. And, no, Unix File System is not the same as Berkeley Fast File System, as our articles on them, and the documents that they link to, make clear. (As discussed above on this very page.) If this is your idea of "correction", please do not put it into practice. Uncle G 16:22, 2 December 2005 (UTC)
        • Did you read FAT specification, for example? The implementation (DOS) don't support " " (spaces) but the specification says they are supported (ever saw "EA DATA. SF" file?). There are filesystems in the table that the specifications says clearly illegal characters and this should be corrected. Just one question, if 4.3BSD filesystem is FFS, and FreeBSD filesystem is UFS, and both drivers can read both revisions, why is it different? So we should consider the Atari ST's FAT 12, the DOS <3.0 FAT12 and the rest of FAT12s are different filesystems? And that each revision of NTFS is a different filesystem? As the only difference, as I see, are some differences in some non critical structures. (even the magic number is the same for FFS and UFS)
          • The article is correct. Spaces are supported in FAT. And if you want your question answered, read the documents linked to by the articles mentioned above. Uncle G 18:43, 11 December 2005 (UTC)
            • Spaces are supported in FAT, but nothing between 0x00 (0x0000 in LFN) and 0x1F (0x001F in LFN), however the article says it support. Article says "anything but NUL". Read Microsoft FAT specification. ARTICLE IS INCORRECT!. —Claunia 19:57, 11 December 2005 (UTC)
              • Wrong. It is the Microsoft specification that is incorrect. Uncle G 18:26, 12 December 2005 (UTC)

OK, this is a bit long, so I'm resetting the indentation, so that we can also, hopefully, reset the discussion.

There are several layers at which you can ask what characters are supported in file systems. At the lowest layer of the on-disk data structures, most file systems probably support, in file names, either all byte values ("byte" for the benefit of those talking about non-8-bit-byte systems...) or all two-byte values if file names are stored as counted strings, or all byte values or two-byte values except 0 if file names are stored as null-terminated strings. (Byte vs. two-byte depending on whether the count in a counted string counts single bytes or byte pairs, and on whether the terminator in a null-terminated string is one byte or two.) Other limitations are imposed by the layer into which a particular implementation of the file system (there can, of course, be more than one implementation of a given file system with a given on-disk layout, plugging into one or more pluggable file system frameworks on OSes that have those frameworks), or by the code above that layer.

For example, UFS's on-disk structure can support names with any byte value in them, including not only '/', but even NUL, as there's a string count in the directory entry - you still have to append a '\0', however, as the count doesn't include a terminating NUL that's supposed to be there (so a file named "foo/\0bar" would have a count of 8 and the bytes 'f', 'o', '/', '\0', 'b', 'a', 'r', '\0' in the entry). However, the VFS layers into which it plugs on most UN*Xes pass null-terminated strings to it, so, at least on those systems, the name can't include NUL. It can, however, include '/' - and, in fact, older UN*X-based NFS servers would cheerfully create files named "foo/bar" if you sent them an NFS packet requesting that a file/directory/link/symlink/etc. be created with that name; you couldn't remove the name locally, though, you'd have to do it over NFS. Most if not all NFS servers should now have that fixed, either by checking for '/' in the file system or in the NFS server code.

Similarly, NTFS's on-disk structure supports either all two-byte values or all but 0x0000 - and, with the proper (or, if you will, improper :-)) use of smbclient, you can create files with at least some of the names that the Win32 API doesn't allow (I've done that), just as you can do with the POSIX subsystem.

Further limits might be imposed by OS APIs, e.g. you cannot create files with '/' in their name through a POSIX/UN*X API, as that's a pathname separator.

In practice, implementations for OSes other than the "native" OS (or OS family - I'm lumping all UN*Xes together in one family, and both "Windows OT", i.e. 95/98/Me, and "Windows NT", i.e. NT 4.0, W2K, WXP, WServer2K3, WVista, etc., together in another family here) probably impose the same limitations the "native" OS does, at least if one of the purposes of the implementation is data exchange with the "native" OS or family.

Footnote 25 clearly states that the limitations being discussed are those of the on-disk data structure. FAT's on-disk data structure does not, as far as I know, forbid a byte with the value 0x01 in a file name. Microsoft's specification might forbid it, but that doesn't, for example, mean that you couldn't implement FAT for some UN*X and allow control-A in a file name. However, it might be unwise to do that if the goal is data interchange with DOS and Windows, as those systems won't be able to handle those files. If the goal is to hide those files (e.g., if you're trying to implement features not present in standard FAT by, for example, having, for a file named "foo", a secondary file named "^Afoo" containing something such as file permissions), however, it might actually be a wise choice (modulo file names of that sort causing DOS or Windows to crash, or destroy those files, if you do intend to read those file systems on DOS or Windows machines - or even if you don't, as somebody's probably going to try to do it anyway).

Footnote 25 also clearly states that the file system implementation, or the OS into which it plugs, might impose other restrictions.

So:

  • the article correctly describes the limitations imposed by the on-disk data structure (except that some of them might also support NUL) and states in the footnote that these are the limits imposed by the on-disk data structure and that there might be other limits imposed by the OS;
  • however, allowing, in an implementation, all of the byte values allowed by those limitations might be impossible on some OSes and unwise in many situations on other OSes;
  • it might, therefore, be useful, and avoid some confusion (and dispute) if it were made clearer in the table itself that the on-disk data structure allows more byte values than "conventional use" of the file system does, and also indicate what the "conventional use" of the file system allows (where "conventional use" would, in most cases, be use on the "native" OS or OS family for the OS - although that might differ depending on the "Windows NT" subsystem you're using). The preceding unsigned comment was added by Guy Harris (talk • contribs) 09:33, 12 December 2005 (UTC)
    • I just happened across this on WP:RC, but Guy - very nice explanation. A pleasure to read. (BTW, please use ~~~~ to sign posts on talk pages.) JesseW, the juggling janitor 09:43, 12 December 2005 (UTC)
      • (Yeah, I knew about ~~~~, I just forgot to sign the article; sorry about that. I was going to add some more to the discussion anyway, and sign that, and note that the previous comment was also mine, but I'll let this parenthetical note retrocredit the previous comment. :-))

        For file systems with explicit specifications it might be worth giving limitations imposed by the specification (the specification might affect other limits as well, e.g. it might limit file sizes to a value lower than the on-disk structure could support). Those limits might be different from the "conventional use" limit, in that the limit might allow certain characters that the "native OS" doesn't allow.

        Note also that the HFS+ specification in Apple Tech Note 1150 doesn't mention any restrictions on characters in file names, but, in practice, colons aren't allowed in file names, as they're traditional Mac OS path name component separators, and in OS X (and possibly other UN*Xes that include HFS+ implementations), a colon passed into HFS+ is converted to a slash on disk, and a slash on disk is passed out of HFS+ as a colon, so that OS X can read HFS+ volumes from traditional Mac OS and traditional Mac OS can (assuming OS X didn't use any new features in the on-disk format that traditional Mac OS can't handle) read HFS+ volumes from OS X. Thus, saying that the "conventional use" limit on file names is "any Unicode character other than colon" is technically true, but, in practice, OS X will accept from the VFS layer file names with colons in them and will return them, with colons, to the caller of the VFS layer. That would deserve to be noted in a footnote on HFS+, if we add "specification limits" and "conventional use limits" columns to the table. Guy Harris 10:00, 12 December 2005 (UTC)

        • Just as a filesystem developer I think containing any information that violates the filesystem specifications (like saying FAT supports anything but NUL -note that via your description supports everything, NUL also, as they are fixed length strings-) is misinformative and against the encyclopedia spirit. —Claunia 15:07, 12 December 2005 (UTC)
          • It's not misinformative. What's misinformative is the very approach that you describe. Many specifications, especially those for FAT, are written long after the fact, are attempts to revise history, and are downright wrong. Reporting only what the specifications say is to report erroneous information. Uncle G 18:26, 12 December 2005 (UTC)
            • So you think that because a paper wasn't published to public, developers work on air? Just like in the NTFS article discusion, as its specification is private there isn't one and the filesystem is allowed to have any kind of data in its structures? Or that if someone different of the inventor makes a structure modification, it should be took as official? — Claunia 22:56, 12 December 2005 (UTC)
          • As a file system developer, I think that describing what the on-disk structure can support might be useful, and including it, as well as separately describing limitations imposed by the file system specifications if any exist, and limitations of the "conventional use" of the file system, would make the description more, well, encyclopedic.

            Note, though, that if we're discussing the on-disk characteristics of the file system, "case-sensitive" and "case-preserving" aren't on-disk characteristics, they're specification or "conventional use" characteristics (except maybe for HFSX, where there's actually a per-volume case-sensitivity attribute) - a FAT or VFAT implementation could, in theory, be case-sensitive - so one could ask which on-disk characteristics are relevant. File system sizes and file sizes clearly are, but some file name characteristics might not be. Guy Harris 17:27, 12 December 2005 (UTC)

        • The convention in this article has been for the table to discuss the actual on-disc data structures, and for the limits of particular operating systems and filesystem drivers (which vary from platform to platform, which aren't inherent to the filesystems themselves, and which do not adhere to a fixed set of standards) to be discussed in footnotes. Indeed, my opinion has been for some time that the "maximum pathname length" column does not belong in the table at all, because it has nothing whatsoever to do with the actual filesystems. If it can be confirmed that the only two remaining filesystems with pathname length limits (ODS5 and UDF) in fact have no inherent pathname limits and what pathname limits exist are nothing to do with the actual filesystems themselves, it should be removed, and be discussed in comparison of operating systems instead. Uncle G 18:26, 12 December 2005 (UTC)
          • At that point, "allowable characters in entry names" largely devolves to "number of bytes per character in file name" + "are file names counted or null-terminated" (and I suspect that's true even for NSS and NWFS, unless they use very odd string encodings that really do prevent storage of some character values on disk). If that's the intent, the column should probably be changed. For most if not all file systems, the file name character encoding isn't part of the on-disk data structure; should that be given or, is it the case that, as you state, there is no "native" operating system for a given file system, and maybe IBM will use JFS2 on z/OS and encode file names in EBCDIC? In that sense, file names are uninterpreted strings of bytes or 2-byte characters, or strings interpreted only to the extent that they're zero-terminated.

            BTW, comparison of operating systems doesn't currently have anything about file name limitations, etc.. Guy Harris 21:08, 12 December 2005 (UTC)

    • We shouldn't be promulgating the notion that there is a "native" operating system for a filesystem, because in the vast majority of cases many operating systems have filesystem drivers for the filesystem, and there is no objective criterion by which any one operating system can be said to be more "native" than all of the others. "native" should not be conflated with "first implementation". Uncle G 18:26, 12 December 2005 (UTC)
  • I don't know if it's appropriate or worth mentioning, but there's nothing in the on disk structure of FAT16 that disallows hard links. MS-DOS (at least 5.0) will behave in a fairly sane fashion if this is done, although chkdsk will report the directory or file as corrupt (cross linked or multiple links I forget exactly). Phredward 23:24, 1 May 2006 (UTC)

If you bypass the filesytem API to manually hard link a file, and chkdsk rejects the resulting filesystem state as invalid, that would seem to be evidence that the FAT designers never intended to support that filestystem state. Similarly, nobody disputes that the ability to include a slash in a Unix filename was a bug, or that a filesystem containing such a file is in error, at least in part due to the strong Unix cultural expectation that (for example) every inode with a non-zero reference count should be addressible by at least one full pathname (and in fact fsck will detect and correct many situations where this isn't the case).

The question of a FAT filename containing CTRL-A is murkier, since so much was unwritten for so long (or written by third-party reverse engineers like Peter Norton). Microsoft now says it was never intended to be allowed, but if it's true that this was never initially communicated, and that the community of implementors developed and acted on a different prevailing belief, it's not obvious to me that Microsoft's current opinion automatically takes precedence. I do lean in the direction of taking Microsoft's word for it, however. Perhaps a new footnote is in order -- I gather you can never have too many footnotes. ;) --Saucepan 01:26, 13 September 2006 (UTC)

It looks like Talk:Comparison of file systems is still redirecting to Talk:File system. --Saucepan 01:40, 13 September 2006 (UTC)

[edit] JFS2 --> JFS

Why does JFS2 wikilink to JFS? If there is not a good reason, I will create the redlink. -- Unixguy 18:59, 23 March 2006 (UTC)

Both JFS and JFS2 in the tables wiki link to JFS, although this is a disambiguation page. It seems the JFS2 page includes historical talk about JFS1, so should both JFS and JFS2 in the tables be redirected to JFS2 (the disambiguated link)? --Pekster 01:40, 4 January 2007 (UTC)
Yes, they should. I've fixed all the JFS links I found in regular pages to point to IBM Journaled File System 2 (JFS2). (JFS2 goes to IBM Journaled File System 2 (JFS2) already; I didn't bother fixing those, but if somebody wants to get rid of the redirection, they can.) Guy Harris 09:16, 4 January 2007 (UTC)
Thanks. I just got rid of the redirection for those JFS2 links. --Unixguy 18:28, 28 March 2007 (UTC)

[edit] Timestamping granularity and date limitations

It may be a good idea to add the limitations of each filesystem's timestamp(s).

  • Granularity (precision): FAT-based systems save based on a 2-second interval, while others operate at 1-second intervals (I think).
  • Limitations: I believe many systems start measuring dates at Jan 1, 1970, so they can't register a time before that. It would also be interesting to see which filesystems are susceptible to problems such as the 2038 problem.

Bajenkins 21:59, 26 November 2006 (UTC)

Would really like to have the information about timestamp precision too. Only reference I could find is this Java bug entry: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4697792 where it is said:

Without having a good reference source: FAT12, FAT16, and FAT32 file systems have a 2 second file time resolution. NTFS has a 100 nanosecond file time resolution. Unix/Linux has a 1 second file time resolution.

Rngadam 20:33, 18 January 2007 (UTC)
The original Berkeley Fast File System reserved space in the inode to expand timestamps to 64 bits. McKusick's expectation was that this would be used to extend the range of the timestamps, to deal with the Y2038 problem. When 4.4BSD was released (with what would come to be called UFS1), the extra space was instead used to add nanosecond resolution. (The utimes() system call had always supported setting microsecond timestamps, because it used a struct timeval, but stat() did not return them; you can see in the 4.3BSD Networking/2 manual pages that struct stat includes only st_spareX fields where the microseconds would have gone. In 4.4, struct stat has struct timespec and macros for compatibility.) When Kirk redid the inode format for UFS2, he widened the timestamp to 96 bits -- 64 bits for the time_t and 32 bits for the nanoseconds part. 121a0012 02:56, 10 February 2007 (UTC)

[edit] File Change notification

Some filesystem could handle notification following some events

(to complete)

For HFS+ under OS X, it dates back to Tiger, and uses kqueues.
This, BTW, is a characteristic of the OS you're using and the OS code for the file system, not of the file system "in the abstract". A given file system might support it on one OS but not on another OS, and a given OS might support it on some file systems but not others. Guy Harris 18:28, 1 December 2006 (UTC)
Agreed, it depends on the functionality of the operating system's virtual file system layer and not the file system itself. -- intgr 19:26, 16 December 2006 (UTC)

[edit] Footnotes on comparison page

The comparison of file systems article currently uses the old footnote system, instead of the new (and arguably better) m:Cite.php one (see WP:FOOT). I have already added one <ref></ref>-style footnote. Would anyone be opposed if I started gradually converting footnotes using the old format into the newer one? I realize that while in progress, the conversion could result in confusion, but I think the task is too big to do it all in one run. -- intgr 19:32, 16 December 2006 (UTC)

[edit] GPFS / ZFS Limits

Questions about the "Limits" Table...

Question 1: In GPFS, "Maximum Volume Size" says "2^99 bytes". But accordingly to Page2 of http://www.linuxnetworx.com/file_redirect.jsp?siteObjectID=913&fname=GPFSDataSht-web.pdf GPFS supports 2^63-1Bytes. Is the PDF's information incorrect or outdated?


Question 2: In GPFS, "Maximum File Sise" says "No limit found". But accordingly to Page2 of http://www.almaden.ibm.com/StorageSystems/file_systems/GPFS/Fast02.pdf GPFS supports 2^63-1Bytes. Is the PDF's information incorrect or outdated?


Question 3: In ZFS, "Maximum Volume Size" says "2^128 bytes". But accordingly to http://en.wikipedia.org/wiki/ZFS#Capacity ZFS supports 16Exbibyte. And *also* accordingly to http://www.opensolaris.org/os/community/zfs/faq/#whatlimits ZFS supports 16Exbibyte. Can someone please clarify this?

Comment was added by Tinho 05:46, 9 January 2007 (UTC).

I don't see a "Limits" table -- where did it go? 69.87.200.105 14:25, 22 January 2007 (UTC)

[edit] Tiny files

A column that elaborates on how the filesystem handles tiny file data (of a few bytes, like 5) would be nice. I.e. does it waste an entire sector for it (or worse), does it store it alongside the directory entry (inside the directory "file"), does it store it inside the inode of some other file?

(This doesn't really overlap with "extended attribute", because that's just another tiny file and really, regular tiny files are as interesting) -- dannym 12:12:46, 30 January 2007 (UTC)

This is called tail packing 62.31.67.29 09:49, 29 March 2007 (UTC)
And block suballocation, which is a similar concept. I am unsure about file systems that keep short files with the directory or inode entry. I've heard something vague about NTFS keeping short files in the MFT; if this is true, it indeed isn't represented right now. -- intgr 11:17, 29 March 2007 (UTC)

[edit] / allowed on ext3?

If I recall correctly, the ext3 page reads that the filenames can contain any unicode characters except NUL and /. This page states that only NUL is prohibited. Which one is it? —The preceding unsigned comment was added by B^4 (talkcontribs) 11:08, 18 February 2007 (UTC).

The virtual file system layer is responsible for splitting slashes to directory names, so you cannot normally create files or directories with a slash in their name. However, the low-level file system code normally does not verify file names, so technically, the file system itself does allows names with slashes in them. -- intgr 11:23, 29 March 2007 (UTC)

[edit] QNX

What about Qnx4fs?? --200.59.172.38 18:17, 12 March 2007 (UTC)

Nobody probably knows enough about it. -- intgr 11:19, 29 March 2007 (UTC)

[edit] Change request for Comparison of Filesystems

Please change the release date of Amiga FFS from 1987 to 1988 and remove the "This article contradicts the article Amiga Fast File System" notice. This is the only contradiction. References for the change were added to the Amiga Fast File System page. 62.31.67.29 09:48, 29 March 2007 (UTC)

Thanks, will fix. Please note that Wikipedia is a wiki, and thus you are invited to make changes yourself. -- intgr 11:18, 29 March 2007 (UTC)
I would love to, but the page is locked. I am not convinced of the merits of registering an account, but that's an argument for some other time. 62.31.67.29 11:29, 29 March 2007 (UTC)
Ahh, okay, that explains it. -- intgr 11:35, 29 March 2007 (UTC)