XFS

XFS
Developer(s)
Full name XFS
Introduced 1994 with IRIX 5.3
Partition identifier 0x83 (Master Boot Record)
Structures
Directory contents B+ trees
File allocation B+ trees
Limits
Max. volume size 8 exbibytes − 1 byte
Max. file size 8 exbibytes − 1 byte
Max. number of files 264[1]
Max. filename length 255 bytes
Allowed characters in filenames All except NUL and "/"
Features
Dates recorded atime, mtime, ctime[2], version 5: crtime[3]
Date range December 14, 1901January 18, 2038[2], proposed: 8 bit epoch[4]
Date resolution 1 ns
Attributes Yes
File system permissions Yes
Transparent compression No
Transparent encryption No (provided at the block device level)
Data deduplication Experimental, Linux only[5]
Other
Supported operating systems

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993.[6] It was the default file system in the SGI's IRIX operating system starting with its version 5.3; the file system was ported to the Linux kernel in 2001. As of June 2014, XFS is supported by most Linux distributions, some of which use it as the default file system.

XFS excels in the execution of parallel input/output (I/O) operations due to its design, which is based on allocation groups (a type of subdivision of the physical volumes in which XFS is used- also shortened to AGs). Because of this, XFS enables extreme scalability of I/O threads, file system bandwidth, and size of files and of the file system itself when spanning multiple physical storage devices.

XFS ensures the consistency of data by employing metadata journaling and supporting write barriers. Space allocation is performed via extents with data structures stored in B+ trees, improving the overall performance of the file system, especially when handling large files. Delayed allocation assists in the prevention of file system fragmentation; online defragmentation is also supported. A feature unique to XFS is the pre-allocation of I/O bandwidth at a pre-determined rate, this is suitable for many real-time applications; however, this feature was supported only on IRIX, and only with specialized hardware.

A notable XFS user, NASA Advanced Supercomputing Division, takes advantage of these capabilities deploying two 300+ terabyte XFS filesystems on two SGI Altix archival storage servers, each of which is directly attached to multiple Fibre Channel disk arrays.[7]

History

Silicon Graphics began development of the Extents File System or XFS[8][9] in 1993, including it in IRIX for the first time in IRIX version 5.3 in 1994. The file system was released under the GNU General Public License (GPL) in May 2000. A team led by Steve Lord at SGI ported it to Linux,[10] and first support by a Linux distribution came in 2001. This support gradually became available in almost all Linux distributions.

Initial support for XFS in the Linux kernel came through patches from SGI. It merged into the Linux kernel mainline for the 2.6 series, and separately merged in February 2004 into the 2.4 series in version 2.4.25,[11] making XFS almost universally available on Linux systems.[12] Gentoo Linux became the first Linux distribution to introduce an option for XFS as the default filesystem in mid-2002.[13]

FreeBSD added read-only support for XFS in December 2005, and in June 2006 introduced experimental write support. However, this was intended only as an aid in migration from Linux, not as a "main" file system. FreeBSD 10 removed support for XFS.[14]

In 2009, version 5.4 of 64-bit Red Hat Enterprise Linux (RHEL) Linux distribution contained the necessary kernel support for the creation and usage of XFS file systems, but lacked the corresponding command-line tools. The tools available from CentOS could operate for that purpose, and Red Hat also provided them to RHEL customers on request.[15] RHEL 6.0, released in 2010, includes XFS support for a fee as part of Red Hat's "scalable file system add-on".[16] Oracle Linux 6, released in 2011, also includes an option for using XFS.[17]

RHEL 7.0, released in June 2014, uses XFS as its default file system[18], including support for using XFS for the /boot partition, which previously was not practical due to bugs in the GRUB bootloader.[19]

Linux 4.8 added a new feature, "reverse mapping". This is the foundation for a large set of planned features: snapshots, copy-on-write (COW) data, data deduplication, online data and metadata scrubbing, highly accurate reporting of data loss or bad sectors, and significantly improved reconstruction of damaged or corrupted filesystems. This work required changes to XFS's on-disk format. [20]

Features

Capacity

XFS is a 64-bit file system[21] and supports a maximum file system size of 8 exbibytes minus one byte (263 − 1 bytes), but limitations imposed by the host operating system can decrease this limit. 32-bit Linux systems limit the size of both the file and file system to 16 tebibytes.

Journaling

In modern computing, journaling is a capability which ensures consistency of data in the file system, despite any power outages or system crash that may occur. XFS provides journaling for file system metadata, where file system updates are first written to a serial journal before the actual disk blocks are updated. The journal is a circular buffer of disk blocks that is not read in normal file system operation.

The XFS journal is limited to a maximum size of both 64 KB blocks and 128 MB, with the minimum size dependent upon a calculation of the file system block size and directory block size. Placing the journal on an external device larger than the maximum journal size will simply leave the extra space unused by the journal. It can be stored within the data section of the file system (as an internal log), or on a separate device to minimize disk contention.

In XFS, the journal contains "logical" entries that describe, in a humanly understandable way, what operations are being performed (as opposed to a "physical" journal that stores a copy of the blocks modified during each operation). Journal updates are performed asynchronously to avoid a decrease in performance speed.

In the event of a system crash, file system operations which occurred immediately prior to the crash can be reapplied and completed as recorded in the journal, which is how data stored in XFS file systems remain consistent. Recovery is performed automatically the first time the file system is mounted after the crash. The speed of recovery is independent of the size of the file system, instead depending on the amount of file system operations to be reapplied.

Allocation groups

XFS file systems are internally partitioned into allocation groups, which are equally sized linear regions within the file system. Files and directories can span allocation groups. Each allocation group manages its own inodes and free space separately, providing scalability and parallelism so multiple threads and processes can perform I/O operations on the same file system simultaneously.

This architecture helps to optimize parallel I/O performance on systems with multiple processors and/or cores, as metadata updates can also be parallelized. The internal partitioning provided by allocation groups can be especially beneficial when the file system spans multiple physical devices, allowing for optimal usage of throughput of the underlying storage components.

Striped allocation

If an XFS file system is to be created on a striped RAID array, a stripe unit can be specified when the file system is created. This maximizes throughput by ensuring that data allocations, inode allocations and the internal log (the journal) are aligned with the stripe unit.

Extent based allocation

Blocks used in files stored on XFS file systems are managed with variable length extents where one extent describes one or more contiguous blocks. This can shorten the list of blocks considerably, compared to file systems that list all blocks used by a file individually.

Block-oriented file systems manage space allocation with one or more block-oriented bitmaps; in XFS, these structures are replaced with an extent oriented structure consisting of a pair of B+ trees for each file system allocation group. One of the B+ trees is indexed by the length of the free extents, while the other is indexed by the starting block of the free extents. This dual indexing scheme allows for the highly efficient allocation of free extents for file system operations.

Variable block sizes

The file system block size represents the minimum allocation unit. XFS allows file systems to be created with block sizes ranging between 512 bytes and 64 KB, allowing the file system to be tuned for the expected degree of usage. When many small files are expected, a small block size would typically maximize capacity, but for a system dealing mainly with large files, a larger block size can provide a performance efficiency advantage.

Delayed allocation

XFS makes use of lazy evaluation techniques for file allocation. When a file is written to the buffer cache, rather than allocating extents for the data, XFS simply reserves the appropriate number of file system blocks for the data held in memory. The actual block allocation occurs only when the data is finally flushed to disk. This improves the chance that the file will be written in a contiguous group of blocks, reducing fragmentation problems and increasing performance.

Sparse files

XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for "holes" within files in which no disk space is allocated. As the file system uses an extent map for each file, the file allocation map size is kept small. Where the size of the allocation map is too large for it to be stored within the inode, the map is moved into a B+ tree which allows for rapid access to data anywhere in the 64-bit address space provided for the file.

Extended attributes

XFS provides multiple data streams for files; this is made possible by its implementation of extended attributes. These allow the storage of a number of name/value pairs attached to a file. Names are nul-terminated printable character strings which are up to 256 bytes in length, while their associated values can contain up to 64 KB of binary data.

They are further subdivided into two namespaces: root and user. Extended attributes stored in the root namespace can be modified only by the superuser, while attributes in the user namespace can be modified by any user with permission to write to the file.

Extended attributes can be attached to any kind of XFS inode, including symbolic links, device nodes, directories, etc. The attr utility can be used to manipulate extended attributes from the command line, and the xfsdump and xfsrestore utilities are aware of extended attributes, and will back up and restore their contents. Most other backup systems do not support working with extended attributes.

Direct I/O

For applications requiring high throughput to disk, XFS provides a direct I/O implementation that allows non-cached I/O operations to be applied directly to the userspace. Data is transferred between the buffer of the application and the disk using DMA, which allows access to the full I/O bandwidth of the underlying disk devices.

Guaranteed-rate I/O

The XFS guaranteed-rate I/O system provides an API that allows applications to reserve bandwidth to the filesystem. XFS dynamically calculates the performance available from the underlying storage devices, and will reserve bandwidth sufficient to meet the requested performance for a specified time. This is a feature unique to the XFS file system. Guaranteed rates can be "hard" or "soft", representing a trade off between reliability and performance; however, XFS will only allow "hard" guarantees if the underlying storage subsystem supports it. This facility is used mostly for real-time applications, such as video streaming.

Guaranteed-rate I/O was only supported under IRIX, and required special hardware for that purpose.[22]

DMAPI

XFS implemented the DMAPI interface to support Hierarchical Storage Management in IRIX. As of October 2010, the Linux implementation of XFS supported the required on-disk metadata for DMAPI implementation, but the kernel support was reportedly not usable. For some time, SGI hosted a kernel tree which included the DMAPI hooks, but this support has not been adequately maintained, although kernel developers have stated an intention to bring this support up to date.[23]

Snapshots

XFS does not yet[24] provide direct support for snapshots, as it currently expects the snapshot process to be implemented by the volume manager. Taking a snapshot of an XFS filesystem involves temporarily halting I/O to the filesystem using the xfs_freeze utility, having the volume manager perform the actual snapshot, and then resuming I/O to continue with normal operations. The snapshot can then be mounted read-only for backup purposes.

Releases of XFS in IRIX incorporated an integrated volume manager called XLV. This volume manager has not been ported to Linux, and XFS works with standard LVM in Linux systems instead.

In recent Linux kernels, the xfs_freeze functionality is implemented in the VFS layer, and is executed automatically when the Volume Manager's snapshot functionality is invoked. This was once a valuable advantage as the ext3 file system could not be suspended[25] and the volume manager was unable to create a consistent "hot" snapshot to back up a heavily busy database.[26] Fortunately this is no longer the case. Since Linux 2.6.29, the file systems ext3, ext4, GFS2 and JFS have the freeze feature as well.[27]

Online defragmentation

Although the extent-based nature of XFS and the delayed allocation strategy it uses significantly improves the file system's resistance to fragmentation problems, XFS provides a filesystem defragmentation utility (xfs_fsr, short for XFS filesystem reorganizer) that can defragment the files on a mounted and active XFS filesystem.[28]

Online resizing

XFS provides the xfs_growfs utility to perform online resizing of XFS file systems. XFS filesystems can be grown so long as there is remaining unallocated space on the device holding the filesystem. This feature is typically used in conjunction with volume management, as otherwise the partition holding the filesystem will need enlarging separately. XFS partitions cannot (as of August 2017) be shrunk in place,[29] although several possible workarounds have been discussed.[30]

Native backup/restore utilities

XFS provides the xfsdump and xfsrestore utilities to aid in the backup of data stored in XFS file systems. The xfsdump utility backs up an XFS filesystem in inode order, and in contrast to traditional UNIX file systems which must be unmounted before dumping to guarantee a consistent dump image, XFS file systems can be dumped while the file system is in use. This is not the same as a snapshot, since files are not frozen during the dump.

XFS dumps and restores are also resumable, and can be interrupted without difficulty. The multi-threaded operation of xfsdump provides high performance of backup operations by splitting the dump into multiple streams, which can be sent to different dump destinations. The multi stream capabilities have not been fully ported to Linux yet, however.

Atomic disk quotas

Quotas for XFS filesystems are turned on when initially mounted; this fixes a race window that is present with most other filesystems that first require to be mounted and where no quotas are enforced until quotaon(8) is called.

Performance considerations

Write barriers

XFS filesystems mount with "write barriers" enabled by default. This feature will cause the write back cache of the underlying storage device to be flushed at appropriate times, particularly on write operations to the XFS log. This feature is intended to assure filesystem consistency, and its implementation is device-specific because not all underlying hardware will support cache flush requests.

When an XFS filesystem is used on a logical device provided by a hardware RAID controller with battery backed cache, this feature can slow performance significantly, as the filesystem code is not aware that the cache is nonvolatile, and if the controller honors the flush requests, data will be written to the disk more often than is necessary. To avoid this problem, areas wherein the data in the device cache is protected from power failure or other host problems, the filesystem can be mounted with the "nobarrier" option.

Journal placement

By default, XFS filesystems are created with an "internal" log, which places the filesystem journal on the same block device as the filesystem data. Filesystem writes are preceded by metadata updates to the journal, which can be a cause of disk contention. Under most workloads, the level of contention caused is too low to impact performance, but random-write heavy workloads, such as those seen on busy database servers, can suffer from less than optimal performance as a result of this I/O contention. An additional factor that may increase the severity of this problem is that writes to the journal are committed synchronously because they must complete successfully before the associated write operation can begin.

Where optimum filesystem performance is required, XFS provides the option of placing the log on a separate physical device, with its own I/O path. This requires little physical space, and if a low-latency path can be provided for synchronous writes, it can greatly improve performance in the operation of the filesystem. The required performance characteristics make this a suitable candidate for the use of a solid-state drive (SSD) device, or a RAID system with write-back cache, though the latter can reduce data safety in the event of power interruptions. The use of an external log requires the filesystem to be mounted with the logdev option, indicating a suitable journal device.

Disadvantages

Historical

See also

References

  1. "What is the maximum number of inodes in Linux filesystems?". 2014-06-17. Retrieved 2016-07-24.
  2. 1 2 http://oss.sgi.com/projects/xfs/papers/xfs_filesystem_structure.pdf p. 25
  3. https://git.kernel.org/pub/scm/fs/xfs/xfs-documentation.git/tree/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
  4. http://oss.sgi.com/archives/xfs/2014-06/msg00008.html
  5. "Duperemove". github.com. Retrieved 21 August 2016.
  6. "xFS: the extension of EFS - "x" for to-be-determined (but the name stuck)", xfs.org
  7. "Archival Storage System". Nas.nasa.gov. 2013-03-04. Retrieved 2013-04-29.
  8. Smith, Roderick W. (2007). Linux Administrator Street Smarts: A Real World Guide to Linux Certification Skills. Street smarts series. John Wiley & Sons. p. 204. ISBN 9780470116746. Retrieved 2016-03-21. Silicon Graphics (SGI) created its Extents File System (XFS) for its IRIX OS and [...] later donated the code to Linux.
  9. "XFS file system". linux-bible.com. linux-bible.com. Retrieved 2016-03-21. XFS (Extents File System) is a 64-bit, high performance journaling file system for Linux. It was initially created by Silicon Graphics for its IRIX OS, but the code was later donated to Linux.
  10. "Porting XFS to Linux". Olstrans.sourceforge.net. 2000-07-21. Retrieved 2013-04-29.
  11. "Linux kernel 2.4.25 changelog". kernel.org. 2004-02-18. Retrieved 2014-08-14.
  12. Daniel Robbins (January 1, 2002). "Common threads: Advanced filesystem implementor's guide, Part 9, Introducing XFS". Developer Works. IBM. Archived from the original on September 4, 2015. Retrieved November 6, 2011.
  13. Daniel Robbins (April 1, 2002). "Common threads: Advanced filesystem implementor's guide, Part 10, Deploying XFS". Developer Works. IBM. Retrieved November 6, 2011.
  14. "Has FreeBSD 10 Dropped Support For XFS?". Lists.freebsd.org. 2013-10-27. Retrieved 2014-03-30.
  15. "Bug 521173 -xfsprogs is missing in RHEL-5.4". Bug report. Redhat.com. May 24, 2010. Retrieved November 6, 2011.
  16. "Red Hat Enterprise Linux Scalable File System Add-On". redhat.com. Retrieved 2014-05-22.
  17. "Oracle Linux 6 Release Notes". Oracle Corporation. February 2011. Retrieved 2013-04-07. Oracle Linux 6 includes many new features, including [...] XFS [:] Oracle Linux 6 includes XFS as an optional filesystem.
  18. "Red Hat Unveils Red Hat Enterprise Linux 7, Redefining the Enterprise Operating System". Red Hat. 2014-06-10. Retrieved 2014-06-10.
  19. "Bug 250843 -grub-install hangs on xfs". Bug report. Redhat.com. May 4, 2009. Retrieved November 6, 2011.
  20. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0cbbc422d56668528f6efd1234fe908010284082
  21. "XFS Overview". Silicon Graphics International Corp. 2013-07-02. Retrieved 2013-07-02.
  22. Dave Chinner (July 30, 2012). "Re: Re: realtime section bugs still around". XFS mailing list (Mailing list). SGI. Retrieved April 13, 2014.
  23. Christoph Hellwig (October 3, 2010). "Re: Linux and DMAPI". XFS mailing list (Mailing list). SGI. Retrieved November 6, 2011.
  24. https://lwn.net/Articles/638546/#The%20near%20future
  25. Linux questions about freezing Ext3
  26. Linux questions on LVM snapshots for database backup
  27. Freeze Feature Commit to Linux kernel
  28. Bitubique.com Archived April 1, 2009, at the Wayback Machine.
  29. XFS.org, FAQ
  30. SGI.com
  31. Dave Chinner (December 23, 2010). "Improving Metadata Performance By Reducing Journal Overhead". XFS.org wiki. Retrieved November 6, 2011.
  32. Dave Chinner (May 24, 2010). "Re: PATCH 0/12 xfs: delayed logging V6". xfs mailing list message (Mailing list). Retrieved November 6, 2011.

Further reading

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.