Distributed file system
From Wikipedia, the free encyclopedia
|
- For the Microsoft distributed file system (DFS), see Distributed File System (Microsoft). For the distributed file system from The Open Group (and earlier from IBM), see DCE Distributed File System.
A Distributed File System (DFS) is a file system that supports sharing of files, printers and resources in the form of persistent storage over a network. The first file servers were developed in the 1970s and Sun's Network File System (NFS) became the first widely used distributed file system after its introduction in 1985. Notable distributed file systems besides NFS are Andrew File System (AFS) and Common Internet File System (CIFS).
[edit] Clients and servers
A file server provides file services to clients. A client interface for a file service is formed by a set of primitive file operations, such as creating a file, deleting a file, reading from a file, and writing to a file. The primary hardware component that a file server controls is a set of local secondary-storage devices on which files are stored, and from which they are retrieved according to the client requests.
[edit] Distribution
A DFS is a file system whose clients, servers, and storage devices are dispersed among the machines of a distributed system or intranet. Accordingly, service activity has to be carried out across the network, and instead of a single centralized data repository, the system has multiple and independent storage devices. The concrete configuration and implementation of a DFS may vary — in some configurations, servers run on dedicated machines while in others a machine can be both a server and a client. A DFS can be implemented as part of a distributed operating system, or alternatively, by a software layer whose task is to manage the communication between conventional operating systems and file systems. The distinctive features of a DFS are the multiplicity and autonomy of clients and servers in the system.
[edit] Transparency
Ideally, a DFS should appear to its users to be a conventional, centralized file system. The multiplicity and dispersion of its servers and storage devices should be made invisible. That is, the client interface used by programs should not distinguish between local and remote files. It is up to the DFS to locate the files and to arrange for the transport of the data.
[edit] Performance
The most important performance measurement of a DFS is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount of CPU-processing time. In a DFS, however, a remote access has the additional overhead attributed to the distributed structure. This overhead includes the time to deliver the request to a server, as well as the time to get the response across the network back to the client. For each direction, in addition to the transfer of the information, there is the CPU overhead of running the communication protocol software. The performance of a DFS can be viewed as another dimension of the DFS' transparency. That is, the performance of an ideal DFS would be comparable to that of a conventional file system.
[edit] Concurrent File Updates
A DFS should provide for multiple client processes on multiple machines not just accessing but also updating the same files. Hence updates to the file from one client should not interfere with access and updates from other clients. Concurrency control or locking may be either built into the file system or be provided by an add-on protocol.
[edit] Distributed data store
A distributed data store is a network in which a user stores his or her information on a number of peer network nodes. The user also usually reciprocates and allows users to use his or her computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network.
Most of the peer to peer networks do not have distributed data stores in that the user's data is only available when their node is on the network. However, this distinction is somewhat blurred in a system such as BitTorrent, where it is possible for the originating node to go offline but the content to continue to be served. Still, this is only the case for individual files requested by the redistributors, as contrasted with a network such as Freenet where all computers are made available to serve all files.
[edit] Distributed datastore networks
- 9P
- Amazon S3
- Andrew File System (AFS) distributed filesystem
- BitTorrent
- Ceph
- Chord project
- Coda distributed filesystem
- DCE/DFS
- GNUnet
- Google File System
- Freenet
- Hadoop Distributed File System
- Lustre
- Microsoft Distributed File System
- Mnet
- Groove shared workspace, used for DoHyki.
- NNTP ( the distributed data storage protocol used for Usenet news)
- Mnesia Database ( http://www.erlang.org/~hakan/mnesia_overview.pdf , http://c2.com/cgi/wiki?MnesiaDatabase )
- Secure File System (SFS) http://elbe.borg.umn.edu/ (not to be confused with the Self-certifying File System (SFS) http://fs.net/ )
- SVK - Distributed Version Control http://www.bieberlabs.com/wordpress/svk-tutorials , http://svk.elixus.org/
[edit] External links
- A distributed file system for distributed conferencing system ("A DFS for the DCS") by Philip S Yeager, Thesis, University of Florida, 2003. (pdf)