Data synchronization

From Wikipedia, the free encyclopedia

Data synchronization is the process of establishing consistency among data on remote sources. It is fundamental to a wide variety of applications, including file synchronization,[1] Personal Digital Assistant synchronization,[2] and Public Key Server synchronization.[3]

Contents

[edit] Theoretical models

Several theoretical models of data synchronization exist in the research literature, and the problem is also related to problem of Slepian-Wolf coding in information theory. The models are classified based on how they consider the data to be synchronized.

[edit] Unordered data

The problem of synchronizing unordered data (also known as the set reconciliation problem) is modeled as an attempt to compute the symmetric difference S_A \oplus S_B = (S_A - S_B) \cup (S_B - S_A) between two remote sets SA and SB of b-bit numbers.[4] Some solutions to this problem are typified by:

Wholesale transfer
In this case all data is transferred to one host for a local comparison.
Timestamp synchronization
In this case all changes to the data are marked with timestamps. Synchronization proceeds by transferring all data with a timestamp later than the previous synchronization.[5]
Mathematical synchronization
In this case data are treated as mathematical objects and synchronization corresponds to a mathematical process.[6][7][8]

[edit] Ordered data

In this case, two remote strings σA and σB need to be reconcilied. Typically, it is assumed that these strings differ by up to a fixed number of edits (i.e. character insertions, deletions, or modifications). Some solution approaches to this problem include:

  1. rsync
  2. shingling - splitting the strings into shingles in order to reduce this problem into an unordered synchronization problem.[9]

[edit] Notes

  1. ^ A. Tridgell. Efficient algorithms for sorting and synchronization. PhD thesis, The Australian National University, 2000.
  2. ^ S. Agarwal, D. Starobinski, and A. Trachtenberg, On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices, IEEE Network Magazine: Scalability in Communication Networks, July 2002.
  3. ^ sks.dnsalias.net
  4. ^ Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Trans. on Info. Theory, September 2003.
  5. ^ Palm developer knowledgebase manuals
  6. ^ Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Trans. on Info. Theory, September 2003.
  7. ^ A. Trachtenberg, D. Starobinski, and S. Agarwal, Fast PDA Synchronization Using Characteristic Polynomial Interpolation, IEEE INFOCOM 2002
  8. ^ Y. Minsky and A. Trachtenberg, Scalable set reconciliation, Allerton Conference on Communication, Control, and Computing, Oct. 2002
  9. ^ S. Agarwal, V. Chauhan, and A. Trachtenberg. Bandwidth efficient string reconciliation using puzzles. IEEE Transactions on Parallel and Distributed Systems, 17(11):1217{1226, November 2006.