GridFTP

GridFTP is an extension of the standard File Transfer Protocol (FTP) for high-speed, reliable, and secure data transfer.[1] The protocol was defined within the GridFTP working group of the Open Grid Forum. There are multiple implementations of the protocol; the most widely used is that provided by the Globus toolkit.

The aim of GridFTP is to provide a more reliable and high performance file transfer, for example to enable the transmission of very large files. GridFTP is used extensively within large science projects such as the Large Hadron Collider and by many supercomputer centers and other scientific facilities.

GridFTP also addresses the problem of incompatibility between storage and access systems. Previously, each data provider would make their data available in their own specific way, providing a library of access functions. This made it difficult to obtain data from multiple sources, requiring a different access method for each, and thus dividing the total available data into partitions. GridFTP provides a uniform way of accessing the data, encompassing functions from all the different modes of access, building on and extending the universally accepted FTP standard. FTP was chosen as a basis for it because of its widespread use, and because it has a well defined architecture for extensions to the protocol (which may be dynamically discovered).

Numerous GridFTP clients have been developed. The Globus Online software-as-a-service system is particularly popular.

Features of GridFTP

GridFTP is useful for a number of reasons - including faster transfer and in-built security. It achieves this through the following alterations to normal FTP.

Security with GSI

GSI - Grid Security Infrastructure provides authentication and encryption to file transfers, with user specified levels of confidentiality and data integrity. FTP itself is inherently insecure, and thus open to packet sniffing and eavesdropping, and has traditionally relied on things such as SSH and SSL for security.

Third party transfers

A useful feature of FTP is that it allows remote transfer between servers to be initiated by a local client. GridFTP builds on this, and adds security and authentication for the local initiator. This feature is similar to File eXchange Protocol (FXP) in FTP terminology.

Parallel and striped transfer

GridFTP achieves much greater use of bandwidth by allowing multiple simultaneous TCP streams. Files can be downloaded in pieces simultaneously from multiple sources; or even in separate parallel streams from the same source, which is still able to make better use of the bandwidth. Striped and interleaved transfers, again either from multiple or single sources, allow further speed increases.

Partial file transfer

Although FTP has the ability to resume an interrupted file transfer from a specific point in a file, it does not support the transmission of only a certain portion of a file. GridFTP allows a subset of a file to be sent. Such a feature is useful in applications where only small sections of a very large data file are required for processing (a motivating example being the processing of data from a high energy physics experiment, a traditional use of Grid technology).

Fault tolerance and restart

GridFTP provides a fault tolerant implementation of FTP, to handle network unavailability and server problems. Transfers can also be automatically restarted if a problem occurs.

Automatic TCP optimisation

The underlying TCP connection in FTP has numerous settings such as window size and buffer size. GridFTP allows automatic (or manual) negotiation of these settings to provide optimal transfer speeds and reliability (settings are likely to need to be different for best performance with large files and for large groups of files).

References

  1. Allcock, W.; Bresnahan, J.; Kettimuthu, R.; Link, M. (2005). "The Globus Striped GridFTP Framework and Server". ACM/IEEE SC 2005 Conference (SC'05). p. 54. doi:10.1109/SC.2005.72. ISBN 1-59593-061-2.

External links