Two-node cluster
From Wikipedia, the free encyclopedia
A two-node cluster is the minimal high-availability cluster that can be built. Should one node fail (for a hardware or software problem), the other must acquire the resources being previously managed by the failed node, in order to re-enable access to these resources. This process is known as failover.
Contents |
[edit] Introduction and some definitions
There are various kinds of resources, notably:
- Storage space (containing data, binaries, or everything else that needs to be accessed)
- Network address(es) (the users can reach the resources via a network connection)
- Application software (that acts as an interface between users and other resources)
Typical services provided by a computer cluster are built from a combination of each of the previously defined resources.
As an example, an Oracle database service might be composed of:
- some storage space, to hold the database files (and, ultimately, the data);
- an Oracle installation, configured to be remotely (or locally) accessed; and,
- an IP address to listen on; the users must connect to this address in order to use Oracle to access the data.
[edit] Hardware components
[edit] Required
- Two hosts, each with its own local storage device(s)
- Shared storage, that can be accessed by each host (or node), such as a file server
- Some method of interconnection that enables one node to see if the other is dead, and to help coordinate resource access
[edit] Interconnection topologies
- A serial crossover cable is the simpler (and more reliable) way to ensure proper intracluster communication
- An Ethernet crossover cable needs each host's TCP/IP stack to be functional to ensure proper intracluster communication
- A shared disk (in advanced setups), usually used for heartbeat only
[edit] Optional but strongly recommended
Gear to eliminate other single points of failure:
- Three Uninterruptible Power Supplies, one for each node and one for the shared storage
- Redundant network connections (using dual NICs and dual switches with bonding or trunking software on the server)
Some method of exclusive access to shared resources. This can be:
- Physical, in order to forcefully eject the other machine:
- Two power switches, allowing each node to remotely cut the power to the other when it becomes stuck or inoperative
- Logical, using one of (as appropriate for each resource):
- SCSI-3 persistent reservation, for denying access to shared storage
- Controlling access to Fibre Channel or Ethernet network through manageable switches, and disabling the other node's port
[edit] Classification by role symmetry
There are two kinds of two-node clusters, from this perspective:
[edit] Active/Passive
One node owns the services, the other one remains inoperative.
Should the primary node fail, the secondary or backup node takes the resources and reactivates the services, while the ex-primary remains in turn inoperative.
This is a configuration where only one node is operative at any point of time.
[edit] Active/Active
There is no concept of a primary or backup node: both nodes provide some service, should one of these nodes fail, the other must also assume the failed node's services.
[edit] Classification by service aggregation level
Two kinds of computer cluster, again:
[edit] Service based
Every service is independent from each other, provided by the cluster: for example, a web server and a mail server are run on the cluster, and each one can be independently managed, switched from one node to the other, without affecting the functionality of other services.
One open source example of this kind of cluster is Kymberlite.
[edit] Logical-host based
In more complex configurations, there can be dependencies among services.
For example, a mail server receives e-mail for local users, thus storing the mail on a local storage resource, but requires the ability to read this email from a remote site. This then requires a mail retrieving server, such as an IMAP server.
Both of these services need access to the same storage resource, the first for writing the e-mail messages that arrived from the Internet, the second to read, move or delete them.
This means that the mail server cannot simply be failed over from one node to the other, as the mail retrieving server will still need access to the same data. These two services must be grouped together, forming a so-called logical host. To be more precise, this logical host will consist of three resources:
- the storage resource, needed by both server applications;
- the mail transfer service that receives e-mail from the Internet; and,
- the mail retrieval service' that acts as an interface, permitting users to view their email.
If it becomes necessary to fail over this logical host, the following steps need to be automatically or manually performed:
- stop the mail retrieving service on the failed node (if possible)
- stop the mail transfer service on the failed node (if possible)
- release the storage resource on the failed node (if possible)
- acquire the storage resource on the failover node
- start the mail transfer service on the failover node
- start the mail retrieving service on the failover node
One open source example of this kind of cluster is Linux-HA; one commercial example for systems running the Solaris Operating System is called Sun Cluster.