Replication (computer science)

From Wikipedia, the free encyclopedia

Replication refers to the use of redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or performance. Replication typically involves replication in space, in which the same data is stored on multiple storage devices or the same computing task is executed on multiple devices, or replication in time, in which a computing task is executed repeatedly on a single device.

Contents

[edit] Replication in distributed systems

There are two approaches to replication in distributed systems, active and passive replication. active replication, also known as state machine replication, is performed by processing the same request at every replica. In passive replication, requests are usually processed on a single replica and then the state is transferred to the other replicas. If there is only one machine that processes the requests, then we are talking about the primary-backup scheme. On the other side, if any machine can process a request, then we have a multi-primary scheme. In the multi-primary scheme, some form of distributed concurrency control must be used.

[edit] Database replication

Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. See also Coda and RAID. Multi-master replication, where updates can be submitted to any database node, and then "ripple" through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations.

The most common challenge that exists in multi-master replication is conflict resolution. For instance, if records are changed in two systems simultaneously, the resolution of that conflict can take many paths. One simple method is that of timing, where data with the first timestamp wins. Alternately, data with the latest timestamp could be saved as most valuable. Another way of resolving conflicts is through hierarchical rules, having declared sites and/or users to have greater rights that supersede changes of lower sites/users. Finally, logic-based conflict resolution can be employed, which is more configurable, but more complex.

[edit] Filesystem replication

Active (real-time) file system replication is usually implemented by distributing updates of a virtual block device to several physical hard disks. This way, any filesystem supported by the operating system can be replicated without modification, as the file system code works on a level above the block device layer. The most popular method for filesystem replication is RAID which is typically limited to locally-connected disks only.

Alternatively, updates to a block device can be replicated (that is, distributed) over a computer network. This has the advantage that the replication slaves can be located in physically distant locations, to avoid damage done by, and improve availability in case of local failures or disasters. An example of this kind of replication is the DRBD module for Linux.

[edit] Distributed shared memory replication

Another example of using replication appears in distributed shared memory systems, where it may happen that many nodes of the system share the same page of the memory - which usually means, that each node has a separate copy (replica) of this page.

[edit] Replication transparency

If a resource is replicated among several locations, it should appear to the user as a single resource.

[edit] Active/active replication

Traditional approaches to replication are based on a master/slave model where one device or process has unilateral control over one or more other devices. This approach has many flaws, not least, that only one process may be active at any one time, which means that only one user could edit a unit of information at any one time. The other approach is multi-master technology. The problem here is that it can not efficiently operate in a WAN and can only efficiently operate in a Local Area Network (LAN).

WANdisco has developed a new mathematical theory that enables active/active replication where every node on a network is an exact copy or replica and hence every node on the network is active at at one time. Hence there is no master/slave paradigm. WANdisco is able to achieve this replication over a wide area network (WAN).[1]

[edit] See also

[edit] References

  1. ^ Distributed computing systems and system components thereof. USPTO (20 January 2006). Retrieved on 2006-01-03.

[edit] External links