Grid computing

From Wikipedia, the free encyclopedia

Grid computing is a term for either of two broad subcategories of distributed computing:

Online computation or storage offered as a service supported by a pool of distributed computing resources, also known as utility computing, on-demand computing, or cloud computing. Data grids provide controlled sharing and management of large amounts of distributed data, often used in combination with computational grids.

The creation of a "virtual supercomputer" composed of a network of loosely-coupled computers, acting in concert to perform very large tasks. This technology has been applied to computationally-intensive scientific, mathematical, and academic problems through volunteer computing, and it is used in commercial enterprises for such diverse applications as drug discovery, economic forecasting, seismic analysis, and back-office data processing in support of e-commerce and web services.

What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be dedicated to a specialized application, it is often constructed with the aid of general purpose grid software libraries and middleware.

Virtual Organizations accessing different and overlapping sets of resources

[edit] Grids versus conventional supercomputers

"Distributed" or "grid" computing in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus.

The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a multiprocessor supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors.

The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet. Conventional supercomputers also create physical challenges in supplying sufficient electricity and cooling capacity in a single location. Both supercomputers and grids can be used to run multiple parallel computations at the same time, which might be different simulations for the same project, or computations for completely different applications. The infrastructure and programming considerations needed to do this on each type of platform are different, however.

There are also some differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug programs on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.

[edit] Design considerations and variations

One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks.

One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes.

Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results as expected.

The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors.

In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system such as placing applications in virtual machines.

Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a tradeoff between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this tradeoff, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).

Various middleware projects have created generic infrastructure, to allow diverse scientific and commercial projects to harness a particular associated grid, or for the purpose of setting up new grids. BOINC is a common one for academic projects seeking public volunteers; more are listed at the end of the article.

In fact, the middleware can be seen as a layer between the hardware and the software. On top of the middleware, a number of technical areas have to be considered, and these may or may not be middleware independent. Example areas include SLA management, Trust and Security, VO (virtual Organization) management, License Management, Portals and Data Management. These techniocal areas may be taken care of in a commercial solution, though the cutting edge of each area is often found within specific research projects examining the field.

[edit] CPU scavenging

CPU-scavenging, cycle-scavenging, cycle stealing, or shared computing creates a "grid" from the unused resources in a network of participants (whether worldwide or internal to an organization). Typically this technique uses desktop computer instruction cycles that would otherwise be wasted at night, during lunch, or even in the scattered seconds throughout the day when the computer is waiting for user input or slow devices.

Volunteer computing projects use the CPU scavenging model almost exclusively.

In practice, participating computers also donate some supporting amount of disk storage space, RAM, and network bandwidth, in addition to raw CPU power. Since nodes are apt to go "offline" from time to time, as their owners use their resources for their primary purpose, this model must be designed to handle such contingencies.

[edit] History

The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure".

CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems.

The ideas of the grid (including those from distributed computing, object oriented programming, cluster computing, web services and others) were brought together by Ian Foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers of the grid^[1]." They led the effort to create the Globus Toolkit incorporating not just computation management but also storage management, security provisioning, data movement, monitoring and a toolkit for developing additional services based on the same infrastructure including agreement negotiation, notification mechanisms, trigger services and information aggregation. While the Globus Toolkit remains the defacto standard for building grid solutions, a number of other tools have been built that answer some subset of services needed to create an enterprise or global grid.

During 2006 the term "Great Global Grid" was coined by Robert Marcus in his book "Emerging Technology Strategies". Many organizations working on grid computing name their servers "ggg.<domain>.com" as an analogy to the "www.<domain>.com" naming convention of the World Wide Web.^{[citation needed]}

During 2007 the term cloud computing came into popularity. It is conceptually identical to the canonical Foster definition of grid computing below.^[clarify] In practice all clouds are grids, but not all grids manage a cloud.

[edit] Fastest virtual supercomputers

BOINC - 1,064 teraflops, as of May 03, 2008 ^[2]
Folding@Home - 1,949 teraflops, as of June 10, 2008 ^[3]

Please help improve this section by expanding it.
Further information might be found on the talk page or at requests for expansion.

Seminal work done:

[edit] Current projects and applications

Main article: List of distributed computing projects

Grids offer a way to solve Grand Challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.

Grid computing is presently being applied successfully by the National Science Foundation's National Technology Grid, NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, Co., and American Express.^{[citation needed]}

One of the most famous cycle-scavenging networks is SETI@home, which was using more than 3 million computers to achieve 23.37 sustained teraflops (979 lifetime teraflops) as of September 2001 [3].

As of March 2008, Folding@home had achieved peaks of 1502 teraflops on over 270,000 machines.

Another well-known project is distributed.net, which was started in 1997 and has run a number of successful projects in its history.

The NASA Advanced Supercomputing facility (NAS) has run genetic algorithms using the Condor cycle scavenger running on about 350 Sun and SGI workstations.

Until April 27, 2007, United Devices operated the United Devices Cancer Research Project based on its Grid MP product, which cycle scavenges on volunteer PCs connected to the Internet. As of June 2005, the Grid MP ran on about 3,100,000 machines [4].

The Enabling Grids for E-sciencE project, which is based in the European Union and includes sites in Asia and the United States, is a follow up project to the European DataGrid (EDG) and is arguably the largest computing grid on the planet. This, along with the LHC Computing Grid ^[4] (LCG) have been developed to support the experiments using the CERN Large Hadron Collider. The LCG project is driven by CERN's need to handle huge amounts of data, where storage rates of several gigabytes per second (10 petabytes per year) are required. A list of active sites participating within LCG can be found online^[5] as can real time monitoring of the EGEE infrastructure.^[6] The relevant software and documentation is also publicly accessible.^[7]

[edit] Definitions

This section may require cleanup to meet Wikipedia's quality standards.
Please improve this article if you can (June 2007).

Today there are many definitions of Grid computing:

In his article "What is the Grid? A Three Point Checklist"^[8], Ian Foster lists these primary attributes:
- Computing resources are not administered centrally.
- Open standards are used.
- Non-trivial quality of service is achieved.

Plaszczak/Wellner^[9] define grid technology as "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations."
IBM defines grid computing as "the ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the Internet. A grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across 'multiple' administrative domains based on their (resources) availability, capacity, performance, cost and users' quality-of-service requirements" ^[10]
An earlier example of the notion of computing as utility was in 1965 by MIT's Fernando Corbató. Fernando and the other designers of the Multics operating system envisioned a computer facility operating "like a power company or water company". http://www.multicians.org/fjcc3.html
Buyya defines a grid as "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements".^[11]
CERN, one of the largest users of grid technology, talk of The Grid: "a service for sharing computer power and data storage capacity over the Internet." ^[12]

Grids can be categorized with a three stage model of departmental grids, enterprise grids and global grids. These correspond to a firm initially utilising resources within a single group i.e. an engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise grids where non-technical staff's computing resources can be used for cycle-stealing and storage. A global grid is a connection of enterprise and departmental grids which can be used in a commercial or collaborative manner.

[edit] See also

[edit] Concepts and related technology

[edit] Alliances and organizations

Open Grid Forum (Formerly Global Grid Forum)
Object Management Group

[edit] Production grids

Enabling Grids for E-sciencE
NorduGrid
Open Science Grid
OurGrid
Sun Grid
Xgrid
Distributed European Infrastructure for Supercomputing Applications DEISA
FusionGrid

INFN Production Grid [5]
UC Grid [6]

[edit] International Grid Projects

Name	Region	Start	End	Link
Open Middleware Infrastructure Institute Europe (OMII-Europe)	Europe	May 2006	May 2008
Enabling Grids for E-sciencE (EGEE)	Europe	March 2004	March 2006
Enabling Grids for E-sciencE II (EGEE II)	Europe	April 2006	April 2008
E-science grid facility for Europe and Latin America (EELA-2)	Europe and Latin America	April 2008	March 2010	[7]
E-Infrastructure shared between Europe and Latin America (EELA)	Europe and Latin America	January 2006	December 2008	[8]
Business Experiments in GRID (BEinGRID) Also see Gridipedia	Europe	June 2006	November 2009	[9]
BREIN	Europe	September 2006	August 2009	[10]
KnowARC	Europe	June 2006	August 2009	[11]
Nordic Data Grid Facility	Scandinavia and Finland	June 2006	December 2010	[12]
DataTAG	Europe and North America	January 2001	January 2003	[13]
European DataGrid (EDG)	Europe	March 2001	March 2004	[14]
BalticGrid	Europe (Baltic States)	November 2005	April 2008	[15]
EUFORIA (EU Fusion fOR Iter Applications)	Europe	January 2008	December 2010	[16]
XtreemOS	Europe	June 2006	June 2010	[17]

[edit] National Grid Projects

D-Grid (German)
Grid5000 (French)
GARUDA (Indian)
National Grid Service (UK)
Open Science Grid (USA)
VECC (Calcutta, India)
China Grid Project
INFN Grid (Italian)
KnowledgeGrid Malaysia
NAREGI Project
Singapore National Grid Project
Thai National Grid Project
LitGRID (Lithuanian)
EestiGrid (Estonia) [18]
Hellasgrid (Greek) [19]
Swiss National Grid Association [20]
Swegrid (Swedish National computational resource) [21]
RDIG - Russian Data Intensive Grid [22]
NorGrid - Norwegian Grid Initiative [23]
Rogrid - Romanian Grid Initiative [24]
Austrian Grid - Austrian Grid Initiative [25]
TR-Grid - Turkish National Grid Initiative [26]

[edit] Standards and APIs

[edit] Software implementations and middleware

Advanced Resource Connector (NorduGrid's ARC)
Berkeley Open Infrastructure for Network Computing (BOINC)
Globus Toolkit
Load Sharing Facility (LSF)
Message Passing Interface (MPI)
OMII UK development kit [27]
Parallel Virtual Machine (PVM)
Simple Grid Protocol
Sun Grid Engine
ProActive
UNICORE
SDSC Storage resource broker (data grid)
gLite (EGEE)
NInf GridRPC
IceGrid
Invisionix Roaming System Remote (IRSR)
Java CoG Kit
GridWay

Alchemi [28]
GridGain [29]
gridGISTICS [30]
Gridbus Middleware [31]
Java Parallel Processing Framework (JPPF) [32]
Vishwa [33]
UGP [34]
GRIA [35]
iRODS [36] (data grid)

[edit] References

[edit] Notes

^ Father of the Grid.
^ [1], accessed 03 May 2008
^ [2], accessed 10 June 2008
^ Large Hadron Collider Computing Grid offical homepage
^ GStat: 02:05:55 03/25/08 GMT - @wgoc01
^ Real Time Monitor @ Imperial College London HEP e-Science
^ LCG - Deployment
^ What is the Grid? A Three Point Checklist (PDF).
^ P Plaszczak, R Wellner, Grid computing, 2005, Elsevier/Morgan Kaufmann, San Francisco
^ IBM Solutions Grid for Business Partners: Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand.
^ A Gentle Introduction to Grid Computing and Technologies (PDF). Retrieved on 2005-05-06.
^ The Grid Café - What is Grid?. CERN. Retrieved on 2005-02-04.

[edit] Bibliography

Davies, Antony (June 2004). "Computational Intermediation and the Evolution of Computation as a Commodity" (PDF). Applied Economics.
Foster, Ian; Carl Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers. ISBN 1-55860-475-8.
Plaszczak, Pawel; Rich Wellner, Jr. Grid Computing "The Savvy Manager's Guide". Morgan Kaufmann Publishers. ISBN 0-12-742503-9.
Berman, Fran; Anthony J. G. Hey, Geoffrey C. Fox. Grid Computing: Making The Global Infrastructure a Reality. Wiley. ISBN 0-470-85319-0.
Li, Maozhen; Mark A. Baker. The Grid: Core Technologies. Wiley. ISBN 0-470-09417-6.
Catlett, Charlie; Larry Smarr (June 1992). "Metacomputing". Communications of the ACM 35 (6).
Smith, Roger (2005). Grid Computing: A Brief Technology Analysis. CTO Network Library.
Buyya, Rajkumar (July 2005). "Grid Computing: Making the Global Cyberinfrastructure for eScience a Reality". CSI Communications 29 (1). Mumbai, India: Computer Society of India (CSI). ISSN 0970-647X.
Berstis, Viktors. Fundamentals of Grid Computing. IBM.
Ferreira, Luis; et.al.. Grid Computing Products and Services. IBM.
Ferreira, Luis; et.al.. Introduction to Grid Computing with Globus. IBM.
Jacob, Bart; et.al.. Enabling Applications for Grid Computing. IBM.
Ferreira, Luis; et.al.. Grid Services Programming and Application Enablement. IBM.
Jacob, Bart; et.al.. Introduction to Grid Computing. IBM.
Ferreira, Luis; et.al.. Grid Computing in Research and Education. IBM.
Ferreira, Luis; et.al.. Globus Toolkit 3.0 Quick Start. IBM.
Surridge, Mike; et.al.. Experiences with GRIA – Industrial applications on a Web Services Grid. IEEE.
Stockinger, Heinz; et al. (to be published in 2007). "Defining the Grid: A Snapshot on the Current View" (PDF). Supercomputing.
Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

[edit] External links

The external links in this article may not follow Wikipedia's content policies or guidelines.
Please improve this article by removing excessive or inappropriate external links.

[edit] News Sites

GridComputingPlanet - Part of the JupiterMedia empire
GridToday
International Science Grid This Week
Primeur magazine - HPC and Grid computing news

[edit] Information sites

Gridipedia - The European Grid Marketplace Contains reports and software components relating to Grid computing
Grid Computing Info Center
IEEE Distributed Systems Online, Grid Computing Section

[edit] Portals and grid projects

ASKALON
World Community Grid: Focuses on advancing scientific projects to benefit humanity, such as researching possible cures for cancer and muscular dystrophy, sequencing human genomes, finding better drug molecular structures to combat AIDS, etc. Open to anyone who wants to contribute idle PC processing time.
DistributedComputing.info Contains informations and links to mathematical, science, security, biological, rendering, economical, games and other world-wide distributed computing projects
Wikipedia article on the World Community Grid: Contains additional links for each project being conducted on the World Community Grid.
3tera AppLogic
Gigaspaces
Appistry
DataSynapse
GridGain Systems
EnterTheGrid directory on Grid computing
EELA: E-Infrastructure shared between Europe and Latin America
Enabling Grids for E-sciencE (EGEE)
BREIN: Business objective driven reliable and intelligent grids for real business.
Fura GPL Ready to use grid
IBM Grid Computing website
ICEAGE: International Collaboration to Extend and Advance Grid Education
Java Parallel Processing Framework
ParadisEO, a C++ framework coupled with Globus and Condor-G for combinatorial and continuous optimization on grid support
GridSphere Portal Framework (JSR-168 compliant)
GridSummit.com
Gridalogy
BigBlueRiver
Grid Computing Now!: Knowledge Transfer Network
NICE EnginFrame: Grid computing portals for research and industry
Nivio: Virtual Desktop Based on Grid Computing
Rechenkraft.net (German)
gridGistics: service virtualization and grid computing.
myGrid: bioinformatics and eScience research project built by several UK universities and EMBL-EBI.
Consortsium SIRENE (Sharing Infrastructure and REsources iN Europe)
ECSS: European Community for Software and Software Services - Architectures, Infrastructures, Engineering
Vendor-independent documentation on Grid-compliant open source portals

[edit] Grid Simulators

[edit] Grid Emulators

[edit] Articles

O'Reilly article about grid computing software
Grid Café, the place for everyone to learn about the Grid
Describing the Elephant: The Different Faces of IT as Service, positions grid in a broader context

[edit] Associations and conferences

[edit] Past events

GridWorld Washington 2006
IEEE Richmond Section Blog. Meeting — 5 October 2006: "Autonomic Grid Computing: Concepts, Infrastructure and Applications"PDF (3.61 MB).

v • d • e Parallel computing topics

General	High-performance computing

Parallelism	Bit-level parallelism · Instruction level parallelism · Data parallelism · Task parallelism

Theory	Speedup · Amdahl's law · Flynn's taxonomy (SISD • SIMD • MISD • MIMD) · Cost efficiency · Gustafson's law · Karp-Flatt metric · Parallel slowdown

Elements	Process · Thread · Fiber · Parallel Random Access Machine

Coordination	Multiprocessing · Multithreading · Multitasking · Memory coherency · Cache coherency · Barrier · Synchronization · Distributed computing · Grid computing

Programming	Programming model · Implicit parallelism · Explicit parallelism

Hardware	Computer cluster · Beowulf · Symmetric multiprocessing · Non-Uniform Memory Access · Cache only memory architecture · Asymmetric multiprocessing · Simultaneous multithreading · Shared memory · Distributed memory · Massively parallel processing · Superscalar processing · Vector processing · Supercomputer · Stream processing · GPGPU

Software	Distributed shared memory · Application checkpointing · Warewulf

APIs	POSIX Threads · OpenMP · Message Passing Interface (MPI) · Intel Threading Building Blocks

Problems	Embarrassingly parallel · Grand Challenge · Software lockout

Categories: Grid computing | Distributed computing