Shared disk file system

From Wikipedia, the free encyclopedia

A shared disk file system, also known as cluster file system or SAN file system, is an enterprise storage file system which can be shared (concurrently accessed for reading and writing) by multiple computers. Such devices are usually clustered servers, which connect to the underlying block device over an external storage device. Such a device is commonly a storage area network (SAN).

Shared disk file systems are necessary because with regular file systems, if multiple instances were to attempt concurrent access to the same physical device, the data would rapidly become corrupt, because there is nothing to prevent two devices from performing a modification of the same part of the file system at the same time. Conventional file locking is no aid in this, as it operates above the file system level; it can protect files against concurrent access but offers no protection of the file system itself.

A shared file system extends the file system concept by adding a mechanism for concurrency control. It provides each device accessing the file system with a consistent and serializable view of the file system, avoiding corruption and [unintended] data loss. Such file systems also usually employ some sort of a fencing mechanism to prevent data corruption in case of node failures.

There are different architectural approaches to a shared disk file system. Some distribute file information across all the servers in a cluster (fully distributed). Others utilize a centralized metadata server. Both achieve the same result of enabling all servers to access all the data on a shared storage device.

Examples of such file systems include:

[edit] Comparison of shared file systems

Shared disk file systems were introduced in the early 1980s, predominantly in VAX VMS clusters. They rely on a SAN, usually based on Fibre Channel, iSCSI or InfiniBand technology.

The IBM General Parallel File System (GPFS), PolyServe storage solutions, Silicon Graphics clustered file system (CXFS), Red Hat Global File System (GFS) and TerraScale Technologies TerraFS are all SAN-based shared file systems. The architecture of these file systems mirrors that of a local disk file system. Performance for a single client is good, although concurrent behavior is limited by an architecture that is not optimized for scalability.

These systems offer failover with varying degrees of robustness. GPFS has been successful for clusters of up to a few hundred nodes.

Typically, SAN performance on Fibre Channel is reasonable, but it cannot compete with clients that use InfiniBand, Quadrics or Myricom networks with native protocols. To limit scalability issues encountered by shared disk file systems, systems such as GPFS, CxFS, GFS and PolyServe Matrix Server are often used on an I/O sub-cluster that exports NFS. Isilon Systems offers an appliance for this purpose. Each of the I/O nodes then exports the file system through NFS version 2 or 3.

For NFS version 4, such exports are more complex due to the requirement for managing shared state among the NFS servers. While the scalability of NFS is improved, layering introduces further performance degradation, and NFS failover is rarely completely transparent to applications. NFS also does not support POSIX semantics. A well-tuned Lustre cluster will normally out-perform a NFS protocol-based cluster[1].

Several systems offer novel architectures to address scalability and performance issues. Ibrix offers a symmetric solution, but little is publicly known about its architecture, semantics, and scalability. Panasas offers a server hardware solution, combined with client file system software. It makes use of smart object iSCSI storage devices and a metadata server that can serve multiple file sets. Good scaling and security are achievable, even though all file locking is done by a single metadata server. The Panasas system uses TCP/IP networking. Lustre’s architecture is similar, but is an open source, software-only solution running on commodity hardware. The Lustre file system has been scaled to 25,000 clients.

Shared File Systems
Lustre GPFS Panasas StorNext Ibrix NFS GFS
License Open source (GPL) Proprietary Proprietary Proprietary Proprietary Clients open, most servers proprietary GPL (Linux Kernel)
Type of solution Software Software, generally bundled with IBM hardware Storage blades with disks Software Software Software and hardware Software
Networks supported Most networks IP, InfiniBand, Federation IP SAN IP IP IP
Architecture Object storage architecture Traditional VAX cluster file system architecture Object storage architecture with central lock server Traditional VAX cluster file system architecture Unknown Not a cluster file system, but a well-known standard cluster file system

[edit] See also

[edit] References

  1. ^ Cope, Jason et al. “Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments”, Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, April 2005.
Languages