Talk:Computer cluster

From Wikipedia, the free encyclopedia

Where is PVM? MPI?--80.98.246.107 22:39, 14 Jun 2004 (UTC)


The article talks about VA Tech's X system but doesn't even mention the top 500 or any other clusters... sounds like a shameless plug to me...--Dhuss 16:27, 21 Jun 2004 (UTC)

As the previous author stated, there must be a mention of PVM and MPI compliant resource distribution framworks, MPICH (Open MPI implementation) and DIPC (Network System V IPC). The author then goes on to affirm that beowulf is software, Beowulf is not software more than a philosophy and methodology of applying open and free operating systems, GNU/Linux; with commodity hardware based on a high performance network to acheive a distributive computing enviroment.

Contents

[edit] Computer Clusters

Clustering was indeed "invented" by Digital Equipment Corp. in the 1980s. Actually, the VAXcluster software (as it was then known) was a response to the needs of VMS users who were already sharing files between systems over DECnet and needed a more robust, more integrated strategy for sharing resources among a group of computer systems.

OpenVMS clusters remain in use today. Some of the few operations to survive 9-11 were OpenVMS clusters where redundant "shadow" sites were outside the area destroyed in the attack.

To date, no other clustering solution has provided the redundancy, parallelism and disaster survivability offered by OpenVMS clusters. Some approaches are now catching up to where OpenVMS (then known as VAX/VMS) was twenty(20) years ago.

Reference: http://www.hp.com/go/openvms

David J Dachtera

[edit] misnomers, etc.

The taxonomy is such that I've never heard before - and I run one of the projects it refers to (http://linux-ha.org), and perform technical reviews of books on the subject, and am the author of a commonly-referred-to web site on the subject. This is basically about 70% right and about 30% wrong.

There are lots of taxonomies of clusters, but the one chosen is really not very good, and the article isn't very well organized, and will likely confuse people more than it will enlighten them.

Director clusters are more often called "Load Balancing Clusters", or server farms, or various other things.

Two node versus multi-node is not a useful distinction. A more useful distinction (at least for the present time) is "High-Availability (HA) Clusters" versus "High-Performance Clusters". Some HA clusters have 2 nodes, and some have more than 2. Two is a minimum.

"Massively Parallel" isn't defined - but it's probably intended to mean High Performance Clusters. But, high-performance clusters aren't necessarily massively parallel - they're just faster than 1 machine.

And, Grid computing is not the next phase of cluster computing. It's a related idea, like a network of workstations is a related idea. And, a grid is commonly no locally located. But the distinction isn't in how well connected they are, it's in how trust between the computers is managed.

A cluster is a collection of computers part of a single domain of trust (political entity) tasked to perform a set of jobs as though they were one computer. They spread this set of tasks across them, in various ways, and for many purposes as though they were a single computer.

A grid is a collection of computers part of many domains of trust tasked to act as a common computing utility for a plurality of user communities sprad across these many domains of trust.

Regarding who invented clusters - it's at best arguable, and perhaps words about "some of the earliest clusters were" rather than "XXX invented clusters". Pyramid also produced some of the earliest clusters around. And, like most things, clusters weren't "invented" by any company - it's an idea that grew up over time from things like collections of PDP-11s given a single task to do.


I may have a go at a partial rewrite of the page.

[edit] Cluster Software

I'll be adding IBM's HACMP, since it's not mentioned in the article. Gbeeker 14:02, 16 September 2005 (UTC)

Also, MC/Service Guard is for HP-UX. And I am not sure why the list of software is split into open source and other. Gbeeker 14:28, 16 September 2005 (UTC)

[edit] Notability of Cluster Software Products

I would like to open a discussion as to the notability of some of the cluster software products which are on the article's list right now. I know that there are tens of thousands of SunCluster and Veritas Cluster Server HA clusters out there; they're standard in enterprise organizations, and have been for most of a decade. HACMP and Beowulf and MSCS and LSF and Sun N1 Grid are all well known. Linux-HA is sufficiently strong and growing that I think is notable. MC Service Guard and the old VMS cluster stuff are from major vendor and have market presence. BOINC is widely known as SETI@HOME successor, etc.

Less well known would include Moab, NEC ExpressCluster, Parallel Sysplex (I know, it's real and legit and has been around for a long time, but pretty small market presence), Novell (same).

Possibly not notable, include KeyCluster (so not notable that its two-paragraph WP article is nominated for deletion), PolyServe, SteelEye.

I would like to propose that a line be drawn on notability, and not-notable products not be in the list. I think that the first group are clearly above the line, the last group are clearly below the line, and the middle group ... I don't know. Other opinions sought. Georgewilliamherbert 20:17, 16 February 2006 (UTC)

[edit] History

I did a fairly significant edit of the History section just now. See what you all think. I used as references both Greg Pfister's In Search of Clusters and numerous references from around Wikipedia (and a few I googled up outside as well). I think that it provides a lot more detailed view of how the development of clusters and the development of networking have gone hand in hand. It also removes what can only be called a commercial POV -- the idea that DEC invented the cluster. Pfister addresses this directly and I heartily concur -- compute clusters were probably invented by the first group that could afford to purchase more than one computer. I won't go to the extreme of claiming that the Bombe computers used to decrypt Enigma transmissions in the Ultra project formed a cluster, although I think it is arguable that they did, albeit one with human-based IPC's. However it is almost certain that Bombes were linked into clusters very shortly thereafter by intelligence agencies, and by the 1950's the probability that there were at least some covert clusters doing cryptography, among other things, approaches unity.

The less covert history of clusters goes hand in hand with the invention of packet switched networks, the Arpanet, and Unix, as I note. The Internet itself is basically the first cluster built on a packet switched network, all grown up. Socket based computing was being used over both local and grid like computing at the research level pretty much continuously from 1969 on, although the formal protocols weren't specified by means of RFC until much later. While DEC certainly was involved in the creation of a network stack in the form of DECnet at about the same time that IBM was introducing their own in the form of SNA, TCP/IP was clearly first and indeed the provided a clear demonstration of the necessity of any "player" in the world of computing having a network protocol for interconnecting machines into a cluster. If I had the patience to do the research or build the links, I'd do a more careful set of crossreferences that include DECnet and SNA, even though really they turned out to be nothing more than intermediate states in the development of large scale clusters based on commodity networks.

I also added the missing reference to the invention of PVM as being pivotal to the widespread implementation of HPC cluster computing, added explicit mention of the beowulf project, didn't add a discussion of MPI (as it was developed by commercial big iron supercomputer vendors and their users and didn't become a basis of commodity HPC clustering until long after PVM was well-established and the beowulf project itself was being begun).

The history section still, in my opinion, lacks a discussion of the history of HA compute clusters (aside from my passing mention of Tandem and IBM) -- this is not my speciality and so I leave it for somebody else who is more of an expert here. There are also still corrections that I agree need to be made in the original discussion of big famous clusters.

In particular Va Tech's cluster was something of a joke (seriously) when it was first built at great expense and with much fanfare -- and with no actual plan for the research that was to be done for it. Clearly a commercial venture by Apple trying to break into the cluster market, almost totally irrelevant from the point of view of "important clusters". REALLY inappropriate for a discussion of the cost-benefit of clusters (something I'm something of an expert on:-) as cost-benefit tends to be obscured when a cluster is built with an undisclosed price tag and obviously "very special pricing" on the part of the vendor and for no particular purpose but to put the cluster and hosting institution on the clustering map, so to speak (whatever it is being used for by now).

Finally, why in the world is there a statement that John Koza owns the largest cluster owned by an individual? First, the cluster referenced is owned by a corporation, not an individual. Second, nowhere in the universe that I know of is there a list of people who own large clusters and the size of the clusters owned. I've personally owned a cluster that contains between 8 and 10 nodes depending on how rich I feel (it costs around $1000/year to run a 10 node private/personal cluster for power and cooling alone). I know several other people on the beowulf list that own small clusters -- ballpark same size as mine. I have a hard time imagining that there are really rich people that can afford to pop $100K/year on a personal cluster with 1000 nodes unless it is really owned as a corporate entity, used for business, deducted on tax forms, payed for with income derived from same. There one has to really ask whether the words "owned as an individual" still mean anything. Is there any point whatsoever in including this in the article, or is this just somebody's personally inserted POV?

Anyway, just some thoughts. Since clustering is near and dear to my heart, I'll likely return to this page and make another round of fairly significant edits when next I have time. Let me know what you think.

Rgbatduke 16:56, 27 April 2006 (UTC)


I added the note on C.mmp/Hydra because that work generated a significant number of papers on tightly-coupled vs. loosley-coupled (C.mmp vs CM*), fault tolerant architecture (C.vmp) and how to do security and capability-based permissions in the OS (Hydra). There were other research clusters at the time. -Smallpond 18:37, 17 July 2006 (UTC)

[edit] Shared Everything, Shared Nothing, Shared Disk

Should these be added here as a distinction in clustering technology? I cannot find any reference to them on Wikipedia and I understood them to be different clustering architectures.

Pixie2000 12:37, 16 October 2006 (UTC)

We should address them... ok. Taken as a request for enhancement. I'll work on it when time allows, or if someone else has bandwidth before then... Georgewilliamherbert 21:10, 16 October 2006 (UTC)