Two-node cluster
From Wikipedia, the free encyclopedia
A two-node cluster is the minimal High-availability cluster that can be built. Should one node fail (for a hardware or software problem), the other must acquire the resources being previously managed by the failed node, in order to re-enable access to these resources: this process is known as failover.
Contents |
[edit] Introduction and some definitions
There are various kinds of resources, notably:
- Storage space (containing data, binaries, or everything else that needs to be accessed)
- IP address(es) (the users can reach the resources via TCP/IP connection)
- Application software (that acts as an interface through the users and the data)
Typical services provided by a Computer cluster are built by a combination of each of the previously defined resources.
So, you can have an Oracle database service, composed by:
- Some storage space, to hold the database files (and, ultimately, the data)
- An Oracle installation, configured to be remotely (or locally) accessed
- An IP address to listen on... the users must connect to this address in order to use Oracle to access the data
[edit] Hardware components
[edit] Required
- Two hosts, each with its own local storage device(s)
- Shared storage, that can be accessed by each host (or node), such as a file server
- Some method of interconnection (that enables one node to see if the other is dead, and to help coordinate resource access)
[edit] Interconnection topologies
- A serial crossover cable is the simpler (and more reliable) way to ensure proper intracluster communication
- An ethernet crossover cable needs the hosts' TCP/IP stack to be functional to ensure proper intracluster communication
- A shared disk (in advanced setups), usually used for hearbeat only
[edit] Optional but strongly recommended
Gear to eliminate other single points of failure:
- Three Uninterruptible Power Supplies, one for each node and one for the shared storage
- Redundant network connections (using dual NICs and dual switches with bonding or trunking software on the server)
Some method of exclusive access to shared resources. This can be:
- Physical, in order to forcefully eject the other machine:
- Two power switches, allowing each node to remotely cut the power to the other when it becomes stuck, or inoperative
- Logical, using one of (as appropriate for each resource):
- SCSI-3 Persistent reservation, for denying access to shared storage
- Controlling access to Fibre Channel or Ethernet network through manageable switches, and disabling the other's node port
Classification: by Role Symmetry There are two kinds of two-node clusters, from this perspective:
Active/Passive That is, one node owns the services, the other one remains inoperative.
Should the primary node fail, the secondary or backup node takes the resources and reactivates the services, while the ex-primary remains in turn inoperative.
Active/Active There is no concept of a primary or backup node: both nodes provide some service, should one of these nodes fail, the other must take care also of the failed node's services.
[edit] Classification: by Service Aggregation Level
Two kinds of Computer cluster, again:
[edit] Service based
When every service is independent from each other, provided by the cluster: say, you can run a web server and a mail server on the cluster, and each one can be independently managed, switched from one node to the other, without affecting the functionality of other services.
One Open Source example of this kind of cluster is Kymberlite.
[edit] Logical-host based
In a more complex world, you can end up with some dependencies from one service to another!
Say, you run a mail server that receives e-mail for the local users, thus storing them on his storage resource, but, how can the users read the e-mails from a remote side?
So, you must implement some kind of mail retrieving server, like an IMAP server.
Both of these services need access to the same storage resource, the first for writing the e-mail messages that arrived from the Internet, the second to read them/move them/delete them.
So, you cannot simply failover the mail server from one node to the other, because the mail retrieving server needs the data provided by the first service!!!
These two services has to be grouped together, forming a so-called logical host... to be more precise, this logical host will be built by 3 resources:
- the storage resource, needed by both server applications
- the mail transfer service that receives e-mail from the Internet
- the mail retrieval service that acts as an interface, permitting to the user to view his e-mail
So, should you fail over this logical host, you must:
- stop the mail retrieving service on the failed node (if possible)
- stop the mail transfer service on the failed node (if possible)
- release the storage resource on the failed node (if possible)
- acquire the storage resource on the failover node
- start the mail transfer service on the failover node
- start the mail retrieving service on the failover node
One Open Source example of this kind of cluster is Linux-HA, one commercial example (limited to Sun Microsystems Solaris machines) is called SunCluster (see Sun cluster documentation for more information).