Protein Data Bank

From Wikipedia, the free encyclopedia

For the file format that describes the 3D structures of molecules found in the Protein Data Bank, see Protein Data Bank (file format).

The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. These data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are released into the public domain, and can be accessed for free. See also protein structure.

1 History
- 1.1 Growth
2 Contents
- 2.1 Statistics
3 File format
4 Viewing the data
5 References
- 5.1 Printed
6 External links

[edit] History

Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton Brookhaven National Laboratory, management of the Protein Data Bank was transferred in 1998 to members of the Research Collaboratory for Structural Bioinformatics (RCSB). Rutgers University is the lead site and is currently under the direction of Helen M. Berman. ^[1]

The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as deposition, data processing and distribution centers for PDB data. The founding members are RCSB PDB (USA), MSD-EBI (Europe) and PDBj (Japan). The BMRB (USA) group joined the wwPDB in 2006. The mission of the wwPDB is to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community.

The PDB is a key resource in structural biology and is critical to more recent work in structural genomics.

Countless derived databases and projects have been developed to integrate and classify the PDB in terms of protein structure, protein function and protein evolution.

[edit] Growth

When the PDB was originally founded it contained just 7 protein structures. Since then it has undergone an approximate exponential growth in the number of structures, which does not show any sign of falling off.

The growth rate of the PDB has been the subject of fairly extensive analysis.

[edit] Contents

As of 15 April 2008, the database contained 50,277 released atomic coordinate entries (or "structures"), 46,400 of that proteins, the rest being nucleic acids, nucleic acid-protein complexes, and a few other molecules. About 5,000 new structures are released each year. Data are stored in the mmCIF format specifically developed for the purpose. It is estimated that the size of the PDB archive will triple to 150,000 structures by the year 2014.^[2]

Note that the database stores information about the exact location of all atoms in a large biomolecule (although, usually without the hydrogen atoms, as their positions are more of a statistical estimate); if one is only interested in sequence data, i.e., the list of amino acids making up a particular protein or the list of nucleotides making up a particular nucleic acid, the much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration should be used.

[edit] Statistics

As of 9 April 2008, the "PDB Holdings List" at RCSB reported the following statistics:

	Proteins	Nucleic Acids	Protein/NA complexes	Other	Total
X-ray diffraction	39791	1024	1813	24	42652
NMR	6291	804	137	7	7239
Electron microscopy	117	11	43	0	171
Other	88	4	4	2	98
Total	46287	1843	1997	33	50160

Note that theoretical models are no longer accepted in the PDB.

22,461 structures in the PDB have a structure factor file. 3,138 structures in the PDB have an NMR restraint file.

The current breakdown of holdings is updated weekly.

[edit] File format

Through the years the PDB file format has undergone many, many changes and revisions. Its original format was dictated by the width of computer punch cards.

PDB Format Guide - Prepared by the PDB Staff at BNL The PDB format specification can be found here, and it is vital that you read this before looking at the raw data.
Recently PDB provides a representation of PDB data in XML format, PDBML format.
ftp.rcsb.org The raw data can be downloaded from here.
PDB format files can be downloaded using HTTP with URLs like this: http://www.pdb.org/pdb/files/4hhb.pdb.gz
PDBML (XML) files can be downloaded using HTTP with URLs like this: http://www.pdb.org/pdb/files/4hhb.xml.gz
ftp.ebi.ac.uk/pub/databases/rcsb/ Alternate download location for the PDB archive.
www.pdb.org Statistics about the PDB can be found here.

This legacy format has caused many problems with the format, and consequently there are 'clean-up' projects;

The MMDB uses ASN.1 (and an XML conversion of this format). The wwPDB members RCSB PDB, MSD-EBI, and PDBj are working together to make the data uniform across the archive. Some believe this to be desirable; others argue that, without a universal repository of information (i.e., a common dictionary), it is not possible to draw comparisons.^{[citation needed]}

Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. This should not be used as an identifier for biomolecules, since often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs.

If a biologist submits structure data for a protein or nucleic acid, wwPDB staff reviews and annotates the entry. The data are then automatically checked for plausibility. The source code for this validation software has been released for free. The main data base accepts only experimentally derived structures, and not theoretically predicted ones (see protein structure prediction).

Various funding agencies and scientific journals now require scientists to submit their structure data to PDB.

[edit] Viewing the data

The structural data can be used to visualize the biomolecules with appropriate software, such as VMD, RasMol, PyMOL, Jmol, MDL Chime, QuteMol, web browser VRML plugin or any web-based software designed to visualize and analyse the protein structures such as STING. A recent desktop software addition is Sirius. The RCSB PDB website also contains resources for education, structural genomics, and related software.

[edit] References

[edit] Printed

H.M. Berman, K. Henrick, H. Nakamura (2003): Announcing the worldwide Protein Data Bank. Nature Structural Biology 10 (12), p. 980 PMID 14634627.
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000). PMID 10592235
Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535-542. PMID 875032.
E.F. Meyer “The First Years of the Protein Data Bank“, Protein Science 6:1591-1597 (1997)
Sussman, JL, Lin, D, Jiang, J, Manning, NO, Prilusky, J, Ritter, O & Abola, EE. Protein data bank (PDB): a database of 3D structural information of biological macromolecules. Acta Cryst 1998; D54:1078-1084. PMID 10089483.

[edit] External links

The Worldwide Protein Data Bank (wwPDB) — parent site to regional hosts (below)
RCSB Protein Data Bank - home page
MSD-EBI - home page
Protein Data Bank Japan - home page
The PDB FAQ - frequently asked questions about the PDB and working with structural models.

[edit] Related (derived) resources

Main article: Protein structure databases

Macromolecular Structure Database — MSD Home Page - project for data about macromolecular structures from the PDB.
PDBWiki — PDBWiki Home Page - a website for community annotation of PDB structures.
PDBsum — PDBsum Home Page - an overview of macromolecular structures in the PDB.
Proteopedia The collaborative, 3D encyclopedia of proteins and other molecules

[edit] Enzyme database data

EBI. The best mapping is provided by Kim Henrick's group at EBI as part of the MSD SIFTS initiative.
PDB provide a mapping on their beta site, but it is at the whole PDB level not chain level.
Search at BRENDA enzyme database portal.
PDBSProtEC

[edit] Molecular graphic visualisation tools

PyMOL — PyMol Home Page
Sirius — Sirius Home Page
STING — STING Home Page
RasMol — RasMol Home Page
Garlic
Swiss-PDB Viewer
Jmol Viewer — Jmol Home Page Open Source, Java based interactive molecular viewer
QuteMol — QuteMol Home Page Open Source, Win & Mac, high quality interactive molecular viewer
StarBiochem Java based interactive molecular viewer with integrated search of protein databank

Categories: Bioinformatics databases

Hidden categories: All articles with unsourced statements | Articles with unsourced statements since June 2007