Rsync

From Wikipedia, the free encyclopedia

rsync
Image:Newrsynclogo.jpg
Developer: Wayne Davison
Latest release: 2.6.9 / November 6th, 2006
OS: Cross-platform
Use: Data transfer/ Differential backup.
License: GNU GPL
Website: rsync.samba.org
The correct title of this article is rsync. The initial letter is shown capitalized due to technical restrictions.

rsync is a free software computer program for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction.

rsync can copy or display directory contents and copy files, optionally using compression and recursion.

rsyncd, the rsync protocol daemon, uses the default TCP port of 873. rsync can also be used to synchronize local directories, or via a remote shell such as RSH or SSH. In the latter case, the rsync client executable must be installed on the near as well as the far host (the computer running the remote shell daemon).

Contents

[edit] Algorithm

The rsync utility uses an algorithm (invented by Australian computer programmer Andrew Tridgell) for efficiently transmitting a structure (such as a file) across a communications link when the receiving computer already has a different version of the same structure.

The recipient splits its copy of the file into fixed-size non-overlapping chunks, say of size S, and computes two checksums for each chunk: the MD4 hash, and a weaker 'rolling checksum'. It sends these checksums to the sender.

The sender computes the rolling checksum for every chunk of size S in its own version of the file, even overlapping chunks. It can do this efficiently because of a special property of the rolling checksum: if the rolling checksum of bytes n through n + S − 1 is R, one can readily compute the rolling checksum of bytes n + 1 through n + S from R, byte n, and byte n + S; without having to examine the intervening bytes. Thus, if you had already calculated the rolling checksum of bytes 1–25, one could calculate the rolling checksum of bytes 2–26 solely from the previous checksum, and from bytes 1 and 26.

The rolling checksum used in rsync is based on Mark Adler's adler-32 checksum, which is used in zlib, and which itself is based on Fletcher's checksum.

The sender then compares its rolling checksums with the set sent by the recipient to determine if any matches exist. If they do, it verifies the match by computing the MD4 checksum for the matching block and by comparing it with the MD4 checksum sent by the recipient.

The sender then sends the recipient those parts of its file that didn't match any of the recipient's blocks, along with assembly instructions on how to merge these blocks into the recipient's version to create a file identical to the sender's copy.

If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files.

While the rsync algorithm forms the heart of the rsync application that essentially optimizes transfers between two computers over TCP/IP, it supports other key features that aid significantly in data transfers or backup. They include compression and decompression of data block by block using zlib at sending and receiving ends, respectively, and support for protocols such as ssh that enables encrypted transmission of compressed and efficient incremental data using rsync algorithm. Instead of ssh, stunnel can also be used to create an encrypted tunnel to secure the data transmitted.

One of the earliest applications of rsync was to implement mirroring or backup for multiple Unix clients onto a central Unix server user using rsync/ssh and standard Unix accounts. With a scheduling utility such as cron, one can even schedule automated transfers of encrypted rsync based mirroring between multiple host computers and a central server.

[edit] Variations

[edit] rdiff and rdiff-backup

There also exists a utility called rdiff, which uses the rsync algorithm to generate delta files with the difference from file A to file B (like the utility diff, but in a different delta format). The delta file can then be applied to file A, turning it into file B (similar to the patch utility).

Unlike diff, the process of creating a delta file has two steps: first a signature file is created from file A, and then this (relatively small) signature and file B is used to create the delta file. Also unlike diff, rdiff works well with binary files.

Using rdiff, a utility called rdiff-backup has been created, capable of maintaining a backup mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point.

[edit] duplicity

duplicity is a variation on rdiff-backup that allows for backups without cooperation from the storage server, as with simple storage services like Amazon S3. It works by generating the hashes for each block in advance, encrypting them, and storing them on the server, then retrieving them when doing an incremental backup. The rest of the data is also stored encrypted for security purposes.

[edit] rsyncX and rsyncXCD

There is a special version of rsync for the Mac OS X filesystem, rsyncX, which allows transferring of resource forks. This is a feature not currently supported by rsync itself, nor by most other UNIX programs, although as of Mac OS 10.4, Apple has updated the included version of rsync to reproduce this functionality.

rsyncXCD is another variant of rsync, which is able to make a bootable partition.

[edit] Windows

Since rsync was designed for Unix/Linux/BSD systems, to run rsync on Microsoft Windows, the Cygwin package is necessary to provide the expected system interfaces, or Microsoft's SFU (Services for Unix) package. A few package combinations are available that include rsync, Cygwin, and an installer, making it easier and more familiar to Windows users. These include:

Packages based on Cygwin rsync are limited by the conditions that Cygwin is not yet Unicode-aware, and there is a maximum path length of 255 characters on file names.

[edit] Novell NetWare

Novell maintains its own port of rsync for the NetWare operating system as a subversion repository at Novell Forge.

[edit] Platform Independent

A script to imitate rsync behaviour (on local system folders) written in Python is available from Vivian De Smedt. This will run on any platform with Python installed.

[edit] See also

[edit] External links