Human Proteome Folding Project
From Wikipedia, the free encyclopedia
The Human Proteome Folding Project (HPF) is a collaborative effort between New York University (Bonneau Lab), the Institute for Systems Biology (ISB) and the University of Washington (Baker Lab), using the Rosetta software developed by the latter (Rosetta@home project).
The HPF project is currently in Phase 2, which is running exclusively on the World Community Grid. Phase 1 ran on two distributed computing grids: the World Community Grid - an IBM philanthropic initiative - and on United Devices' grid.org.
ISB designed the Human Proteome Folding project for World Community Grid and will use the results within its larger research efforts. For more information about the Human Proteome Folding project, please visit the Institute For Systems Biology web site. Visit the About the Project page for a non-scientist description of the HPF project.
Contents |
[edit] Project description
Only a few years ago in 2003, scientists completed a draft sequence of the Human Genome. While our genes are an amazing repository of information, knowing the genes is only the beginning. It is the proteins made from these genes that actually carry out all the functions that keep us alive.
However, scientists still do not know the functions of a large fraction of human proteins. With an understanding of how each protein affects human health, scientists can develop new cures for human disease.
Today only a fraction of the 30,000 Human Genome proteins have known structures and functions. Being able to predict the structure of every protein in an organism will contribute to our overall understanding of how those predicted proteins interact with the organism as a system. Can you imagine trying to fix a car or a machine knowing the function of only 30% of the components? That is the situation that biomedical and biological researchers, to their credit, operate in. Thus, anything that can shed light on these mystery proteins is highly valuable to the field of biology and medicine.
Huge amounts of data exist that can identify the role of individual proteins, but it must be analyzed to be useful. This analysis could take years to complete on super computers.
Human Proteome
Proteins are long and disordered chains folded into globs. The number of shapes that proteins can fold into is enormous. Searching through all of the possible shapes to identify the correct function of an individual protein is a tremendous challenge.
The Human Proteome Folding project will provide scientists with data that predicts the shape of a very large number of human proteins. These predictions will give scientists the clues they need to identify the biological functions of individual proteins within the human body. With an understanding of how each protein affects human health, scientists can develop new cures for human diseases such as cancer, HIV/AIDS, SARS, and malaria.
[edit] Computing platform
The HPF science application (Rosetta v4.2x) is available for the Windows and Linux platforms. Project requirements are for PCs with 256MB RAM or more.
The project can run either on the
- Berkeley Open Infrastructure for Network Computing (BOINC) platform for Linux and Windows users
- United Devices agent for Windows users
[edit] How to join HPF via BOINC
- Read the project's rules and policies
- Download, install and run the BOINC software version 5.2 or later
- Register in WCG
- On your PC, go to BOINC Manager, then to Projects → Attach to Project and when prompted, enter the project's URL http://www.worldcommunitygrid.org/ and the username and password you selected in previous step
Debian GNU/Linux and Ubuntu users are advised to install the BOINC package from http://wiki.debian.org/BOINC which includes automatic startup/shutdown scripts
Once you complete the steps, BOINC will automatically download the science software (Rosetta) and your first work-unit from WCG and start working on it.
Joining WCG automatically gives you work on all projects running on it. Currently there are four (4) projects: HPF, FightAIDS@Home, Help Defeat Cancer, and Genome Comparison. You can opt-in/out of any project(s) at any time via WCG My Projects
Information on joining WCG via BOINC can be found in WCG Setup and Installation Help and WCG Information about BOINC. More on how to install and run BOINC software, in the BOINC install instructions of BOINC-Wiki.
[edit] Current project status
HPF Phase-1 applied Rosetta v4.2x software on the human genome and 89 others, starting in November 2004. It is expected to end in April 2006. HPF Phase-2 (HPF2) will apply the latest Rosetta v4.8x software in higher resolution, "full atom refinement" mode, concentrating on cancer biomarkers (proteins found at dramatically increased levels in cancer tissues), human secreted proteins and malaria.
[edit] Relation of HPF with Rosetta@home
Dr. Bonneau (head scientist for HPF/WCG) answers "How does the human proteome folding project (HPF) on the world community grid (WCG) relate to Rosetta@home?"
- It is important to differentiate Rosetta@home from the HPF project (the one currently running on the WCG and grid.org), so I'll take a few lines to explain each from the perspective of motivation. The two grids HPF and Rosetta@home are not competing grids and we would like to see them both thrive.
- Rosetta@home is run by the bakerlab as a way to accelerate development of the Rosetta code. With the focus on all-atom refinement and protein design even their benchmarks (to see if they broke the code or improved the code) are taking a large amount of time. Thus, Rosetta@home is primarily for testing/developing new versions of the Rosetta code and making Rosetta better. Also the robetta server could be hooked up to this project. This grid meets the spikes in the compute demands of the robetta server and the bakerlab. Rosetta@home does not aim to produce databases that will in turn be used by biologists but it helps to make the code better, which in turn helps efforts, like HPF, that use the code to give biologists usable fold and function predictions. In that way it is an essential part of the field-wide effort to fold proteomes.
- Conversely, our project, HPF on the WCG, aims to use stable versions of Rosetta to make predictions that can be presented to biologists and biomedical researchers in comprehensive databases with intuitive front ends. HPF on the WCG can be thought of as the production phase of the project, where we produce function annotation for many genomes and then distribute this product to biologists. Due to the large number of proteins we're folding (comprehension is essential) we need a great deal of computer time if we want to make our databases comprehensive and available to biologists.
- We are working with the bakerlab on HPF on the WCG and think of the two grids as very different parts of the solution of getting function out of fold prediction: one using Rosetta and one improving Rosetta (D.Baker is involved in both projects)."
More background information about Rosetta, HPF and Rosetta@home
- "The Baker laboratory at the University of Washington has developed a protein folding program named Rosetta. It has 3 major sections. The first section tries to fold a protein, going from a long string of amino acids to a crumpled up 3D structure. The second section tries to reverse this process. Given the surface of a crumpled up protein molecule, it attempts to design a chain of amino acids that will fold up to form that molecule. The third section tries to dock 2 different protein molecules to see how they will interact with each other.
- A number of universities (such as the University of Warsaw) and research institutes use this Rosetta program for different purposes (see Rosetta Commons at http://www.rosettacommons.org/ ). David Baker maintains a server on the Internet called the Robetta server which allows other scientists to use Rosetta for their projects without maintaining local servers with Rosetta.
- Recently (3Q05) the Baker Lab has started a BOINC project named Rosetta@home ( http://boinc.bakerlab.org/rosetta/ ). The Baker Lab only has a 500-node Linux cluster, so it is very time-consuming to test variations while trying to improve Rosetta. The first section of Rosetta which folds proteins (called the ab initio prediction section) uses 2 methods. The first method is a speedy low resolution method. The second method is a computationally intensive high resolution method which takes a fold prediction from the low resolution method and attempts to refine it to produce a more accurate prediction. The cluster of computers created for Rosetta@home is used to test various improvements in the high resolution method. Eventually, Dr. Baker also intends to use this cluster to run queries from other scientists that are currently queued up to run on the Robetta server.
- Rosetta has been producing the most accurate computer predictions of protein folds, as you can see at CASP6. The most accurate predictions are still made by human scientists, assisted by computer programs, but like human chess players, the computers are putting some pressure on them. Also see the 'Gene Machine' in the July 2001 issue of Wired
- Now, getting down to particulars. Where do we come in? The Institute for Systems Biology (ISB) in Seattle, WA, USA, has started a project called the Human Proteome Folding Project (HPF) to fold all the unknown proteins found in the human genome plus a number of proteins from 80 other genomes. See HPF
- This project uses the low resolution method of protein folding that is in the ab initio section of Rosetta. Each unknown protein is folded to produce about 10,000 predictions. Variable conditions are established by a random seed. Both grid.org and the World Community Grid are running this project for ISB. Each Work Unit makes 100-500 fold predictions for a previously unknown protein. The ISB creates a batch of proteins and puts it on the ISB server. Then either grid.org or WCG downloads the batch, sends out the work units, reassembles the results returned and finally uploads the corresponding batch of results back to the ISB server.
- Both grids (WCG and grid.org) are using the same version of Rosetta to fold the proteins. There were some bug fixes made in Rosetta back in December 2004. The WCG took the lead and then sent the patched version to grid.org which beta tested the new version, then deployed it in January 2005. This was the only cross-grid transfer of Rosetta code that I know of since the HPF project went live.
- On June 23, 2006, the Human Proteome Folding Project Phase II started running on the World Community Grid, using the new high resolution folding method being developed at Rosetta@home to refine the folding predictions made by HPF for some selected proteins. The main aims of this new project are "1) obtain higher resolution structures for specific human proteins and pathogen proteins and 2) further explore the limits of protein structure prediction by further developing Rosetta software structure prediction."
http://www.worldcommunitygrid.org/projects_showcase/viewHpf2Research.do