Biological database

From Wikipedia, the free encyclopedia

As of 2006, there are over 1,000 public and commercial biological databases. These biological databases usually contain genomics and proteomics data, but databases are also used in taxonomy. The data are nucleotide sequences of genes or amino acid sequences of proteins. Furthermore information about function, structure, localisation on chromosome, clinical effects of mutations as well as similarities of biological sequences can be found.

Contents

[edit] Overview

Biological databases have become an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.

The biological knowledge of databases is usually (locally) distributed amongst many different specialized databases. This makes it difficult to ensure the consistency of information, which sometimes leads to low data quality.

By far the most important resource for biological databases is a special (yearly) issue of the journal "Nucleic Acids Research" (NAR). The Database Issue is freely available, and categorizes all the publicly available online databases related to computational biology (or bioinformatics).

The Database Issue of NAR

See also: NCBI, PubMed

[edit] Most important public databases for molecular biology

(from www.kokocinski.net)

[edit] Primary sequence databases

The International Nucleotide Sequence Database (INSD) consists of the following databases.

  1. DDBJ (DNA Data Bank of Japan)
  2. EMBL Nucleotide DB (European Molecular Biology Laboratory)
  3. GenBank [1] (National Center for Biotechnology Information)

These databanks represent the current knowledge about the sequences of all organisms. They interchange the stored information and are the source for many other databases.

[edit] Meta-databases

  1. MetaDB (MetaDB: A Metadatabase for the Biological Sciences) containing links and descriptions for over 1200 biological databases.
  2. Entrez[2] (National Center for Biotechnology Information)
  3. euGenes (Indiana University)
  4. GeneCards (Weizmann Inst.)
  5. SOURCE (Stanford University)
  6. mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
  7. Harvester (EMBL Heidelberg) Bioinformatic_Harvester Integrating 16 major protein resources.

Strictly speaking a meta-database can be considered a database of databases, rather than any one integration project or technology. It collects information from different other sources and usually makes them available in new and more convenient form.

[edit] Genome Browsers

  1. UCSC Genome Bioinformatics Genome Browser and Tools (UCSC)
  2. Ensembl Genome Browser (Sanger Institute and EBI)
  3. Integrated Microbial Genomes Microbial Genome Browser (Joint Genome Institute, Department of Energy)
  4. GBrowse The GMOD GBrowse Project

Genome Browsers enable researchers to visualize and browse entire genomes (most have many complete genomes) with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources.

[edit] Specialized databases

  1. CGAP Cancer Genes (National Cancer Institute)
  2. Clone Registry Clone Collections (National Center for Biotechnology Information)
  3. DBGET H.sapiens (Univ. of Kyoto)
  4. GDB Hum. Genome Db (Human Genome Organisation)
  5. I.M.A.G.E Clone Collections (Image Consortium)
  6. MGI Mouse Genome (Jackson Lab.)
  7. SHMPD The Singapore Human Mutation and Polymorphism Database
  8. NCBI-UniGene (National Center for Biotechnology Information)
  9. OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
  10. Off. Hum. Genome Db (HUGO Gene Nomenclature Committee)
  11. List with SNP-Databases
  12. p53 The p53 Knowledgebase

[edit] Expression, regulation & pathways databases

  1. KEGG PATHWAY Database[3] (Univ. of Kyoto)
  2. Reactome[4] (Cold Spring Harbor Laboratory, EBI, Gene Ontology Consortium)

[edit] Protein sequence databases

  1. UniProt[5] Universal Protein Resource (UniProt Consortium: EBI, Expasy, PIR)
  2. PIR Protein Information Resource (Georgetown University Medical Center (GUMC))
  3. Swiss-Prot[6] Protein Knowledgebase (Swiss Institute of Bioinformatics)
  4. PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
  5. PROSITE Database of Protein Families and Domains
  6. DIP Database of Interacting Proteins (Univ. of California)
  7. Pfam Protein families database of alignments and HMMs (Sanger Institute)
  8. ProDom Comprehensive set of Protein Domain Families (INRA/CNRS)
  9. SignalP Server for signal peptide prediction

[edit] Protein structure databases

Protein structure databases:

  1. Protein Data Bank[7] (PDB) (Research Collaboratory for Structural Bioinformatics (RCSB))
  2. CATH Protein Structure Classification
  3. SCOP Structural Classification of Proteins
  4. SWISS-MODEL Server and Repository for Protein Structure Models
  5. ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF)

[edit] Microarray-databases

Microarray-databases:

  1. ArrayExpress (European Bioinformatics Institute)
  2. Gene Expression Omnibus (National Center for Biotechnology Information)
  3. maxd (Univ. of Manchester)
  4. SMD (Stanford University)
  5. GPX(Scottish Centre for Genomic Technology and Informatics)

[edit] Protein-Protein Interactions

Protein-protein interactions:

  1. BioGRID[8] A General Repository for Interaction Datasets (Samuel Lunenfeld Research Institute)
  2. STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)

[edit] See also