TeraGrid
From Wikipedia, the free encyclopedia
TeraGrid is an open scientific discovery infrastructure combining large computing resources (including supercomputers, storage, and scientific visualization systems) at nine Resource Provider partner sites to create an integrated, persistent computational resource. Deployment of TeraGrid was completed in September 2004, and as of April 2006 provides over 100 teraflops of computing power and over 3 petabytes of rotating storage, and specialized data analysis and visualization resources into production, interconnected at 10-30 gigabits/second via a dedicated national network.
TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership with the Resource Provider sites. Funding for TeraGrid is provided by the National Science Foundation Office of Cyberinfrastructure. Access to TeraGrid is available through scientific peer review, at no cost, to any academic researcher in the US.
Contents |
[edit] TeraGrid History
The TeraGrid project was launched by the National Science Foundation in August 2001 with $53 million in funding to four sites: the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, the San Diego Supercomputer Center (SDSC) at the University of California, San Diego, University of Chicago Argonne National Laboratory, and Center for Advanced Computing Research (CACR) at the California Institute of Technology in Pasadena, California.
In October 2002, the Pittsburgh Supercomputing Center (PSC) at Carnegie Mellon University and the University of Pittsburgh joined the TeraGrid as major new partners when NSF announced $35 million in supplementary funding. The TeraGrid network was transformed through the ETF project from a 4-site mesh to a dual-hub backbone network with connection points in Los Angeles and at the Starlight facilities in Chicago.
In October 2003, NSF awarded $10 million to add four sites to TeraGrid as well as to establish a third network hub, in Atlanta. These new sites were Oak Ridge National Laboratory (ORNL), Purdue University, Indiana University, and the Texas Advanced Computing Center (TACC) at The University of Texas at Austin.
TeraGrid construction was also made possible through key corporate partnerships with Sun Microsystems, IBM, Intel Corporation, Qwest Communications, Juniper, Myricom, Hewlett-Packard Company, and Oracle Corporation.
TeraGrid construction was completed in October 2004, at which time the TeraGrid facility began full production.
In August 2005, NSF awarded $148M for a five-year program to operate and enhance the TeraGrid facility, with eight resource provider awards and a system integration award (the Grid Infrastructure Group at the University of Chicago).
[edit] TeraGrid Architecture
TeraGrid resources are integrated through a service oriented architecture in that each resource provides a "service" that is defined in terms of interface and operation. Computational resources run a set of software packages called "Coordinated TeraGrid Software and Services" (CTSS). CTSS provides a familiar user environment on all TeraGrid systems, allowing scientists to more easily port code from one system to another. CTSS also provides integrative functions such as single-signon, remote job submission, workflow support, data movement tools, etc. CTSS includes the Globus Toolkit, Condor, distributed accounting and account management software, verification and validation software, and a set of compilers, programming tools, and environment variables.
TeraGrid resources are interconnected by a dedicated optical network, with each resource provider site connecting at either 10 Gigabits per second or 30 Gigabits per second. TeraGrid users access the facility through national research networks such as the Internet2 Abilene backbone and National LambdaRail.
[edit] TeraGrid Usage
TeraGrid users primarily come from US universities. There are roughly 4,000 users at over 200 universities. Academic researchers in the US can obtain exploratory, or development allocations (roughly, in "CPU hours") based on an abstract describing the work to be done. More extensive allocations involve a proposal that is reviewed during a quarterly peer-review process. All allocation proposals are handled through the TeraGrid website. Proposers select a scientific discipline that most closely describes their work, and this enables reporting on the allocation of, and use of, TeraGrid by scientific discipline. As of July 2006 the scientific profile of TeraGrid allocations and usage is shown in the following table.
Allocated (%) | Used (%) | Scientific Discipline |
---|---|---|
19 | 23 | Molecular Biosciences |
17 | 23 | Physics |
14 | 10 | Astronomical Sciences |
12 | 21 | Chemistry |
10 | 4 | Materials Research |
8 | 6 | Chemical, Thermal Systems |
7 | 7 | Atmospheric Sciences |
3 | 2 | Advanced Scientific Computing |
2 | 0.5 | Earth Sciences |
2 | 0.5 | Biological and Critical Systems |
1 | 0.5 | Ocean Sciences |
1 | 0.5 | Cross-Disciplinary Activities |
1 | 0.5 | Computer and Computation Research |
0.5 | 0.25 | Integrative Biology and Neuroscience |
0.5 | 0.25 | Mechanical and Structural Systems |
0.5 | 0.25 | Mathematical Sciences |
0.5 | 0.25 | Electrical and Communication Systems |
0.5 | 0.25 | Design and Manufacturing Systems |
0.5 | 0.25 | Environmental Biology |
Each of these discipline categories correspond to a specific program area of the National Science Foundation (thus more detail can be found at the NSF website).
During 2006, TeraGrid has begun to provide application-specific services to Science Gateway partners, who serve (generally via a web portal) discipline-specific scientific and education communities. Through the Science Gateways program TeraGrid aims to broaden access by at least an order of magnitude in terms of the number of scientists, students, and educators who are able to use TeraGrid.
[edit] External links
More about TeraGrid
Similar Projects
- DEISA Distributed European Infrastructure for Supercomputing Applications, a facility integrating eleven European supercomputing centers.
- HPC-UK strategic collaboration between the UK's three leading supercomputer centres - Manchester Computing, EPCC and Daresbury Laboratory
- NAREGI Japanese NAtional REsearch Grid Initiative involving several supercomputer centers
- OSG Open Science Grid - a distributed computing infrastructure for scientific research
TeraGrid Resource Provider Sites
- Argonne National Laboratory
- Indiana University
- National Center for Atmospheric Research (NCAR)
- National Center for Supercomputing Applications (NCSA)
- Oak Ridge National Laboratory
- Pittsburgh Supercomputing Center operated by University of Pittsburgh and Carnegie Mellon University.
- Purdue University
- San Diego Supercomputer Center (SDSC)
- Texas Advanced Computing Center (TACC)