Protein superfamily

From Wikipedia, the free encyclopedia

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is based on structural alignment[1] and mechanistic similarity even though no sequence similarity is evident.[2] Superfamilies typically contain several protein families which show sequence similarity within the family. The term protein clan is commonly used for protease superfamilies based on the MEROPS protease classification system.[2]

Identification

Above, secondary structural conservation of 80 members of the PA protease clan (superfamily). H indicates α-helix, E indicates β-sheet, L indicates loop. Below, sequence conservation for the same alignment. Arrows indicate catalytic triad residues. Aligned on the basis of structure by DALI
Structural homology in the PA superfamily (PA clan). The double beta-barrel that characterises the superfamily is highlighted in red. Shown are representative structures from several families within the PA superfamily. Note that some proteins show partially modified structural. Chymotrypsin (1gg6), tobacco etch virus protease (1lvm), calicivirin (1wqs), west nile virus protease (1fp7), exfoliatin toxin (1exf), HtrA protease (1l1j), snake venom plasminogen activator (1bqy), chloroplast protease (4fln) and equine arteritis virus protease (1mbm).

Sequence homology

Superfamily members typically show no detectable sequence homology. Indeed they are often impossible to align due to frequent insertions and deletions. In the PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Structural homology

Structure is much more evolutionarily conserved than sequence (as also exemplified by the PA clan of proteases). Very few residues show much amino acid sequence conservation, however secondary structural elements are highly conserved as are their arrangement in tertiary structural motifs. Structural alignment programs such as DALI can use the 3D structure of a protein of interest as to find proteins with similar folds. Comparing 3D structures can identify evolutionary relatedness that sequence comparison cannot.

Mechanistic similarity

The catalytic mechanism of enzymes within a superfamily is typically conserved, although substrate specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence. Once again, the PA clan of proteases acts as an example. Even though families within the superfamily use different nucleophiles, they all perform covalent, nucleophilic catalysis on proteins, peptides or amino acids through a similar mechanism.

Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry.[3] They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA).[4]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).

Examples

PA clan of chymotrypsin-like proteases - Members share a double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both cysteine and serine proteases (different nucleophiles).[2][5]

α/β hydrolase superfamily - Members share an α/β sheet, containing 8 strands connected by helices with catalytic triad residues in the same order,[6] activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases.[7]

TIM barrel superfamily - Members share a large α8β8 barrel structure. It is one of the most common protein folds and the monophylicity of this superfamily is still contested.[8][9]

Alkaline phosphatase superfamily -

Ras superfamily - Members share a common the catalytic G domain.

Protein superfamily resources

Several biological databases document protein superfamilies and protein folds, for example:

  • Pfam - Protein families database of alignments and HMMs
  • PROSITE - Database of protein domains, families and functional sites
  • PIRSF - SuperFamily Classification System
  • PASS2 - Protein Alignment as Structural Superfamilies v2
  • SUPERFAMILY - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
  • SCOP and CATH - Classifications of protein structures into superfamilies, families and domains

Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:

  • DALI - Structural alignment based on a distance alignment matrix method

See also

References

  1. Holm, L; Rosenström, P (July 2010). "Dali server: conservation mapping in 3D.". Nucleic Acids Research 38 (Web Server issue): W545–9. doi:10.1093/nar/gkq366. PMID 20457744. 
  2. 2.0 2.1 2.2 Rawlings, ND; Barrett, AJ; Bateman, A (January 2012). "MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.". Nucleic Acids Research 40 (Database issue): D343–50. doi:10.1093/nar/gkr987. PMID 22086950. 
  3. Shakhnovich, BE; Deeds, E; Delisi, C; Shakhnovich, E (March 2005). "Protein structure and evolutionary history determine sequence space topology.". Genome Research 15 (3): 385–92. PMID 15741509. 
  4. Ranea, JA; Sillero, A; Thornton, JM; Orengo, CA (October 2006). "Protein superfamily evolution and the last universal common ancestor (LUCA).". Journal of molecular evolution 63 (4): 513–25. PMID 17021929. 
  5. Bazan, JF; Fletterick, RJ (November 1988). "Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications.". Proceedings of the National Academy of Sciences of the United States of America 85 (21): 7872–6. PMID 3186696. 
  6. Carr PD, Ollis DL (2009). "Alpha/beta hydrolase fold: an update". Protein Pept. Lett. 16 (10): 1137–48. PMID 19508187. 
  7. Nardini M, Dijkstra BW (December 1999). "Alpha/beta hydrolase fold enzymes: the family keeps growing". Curr. Opin. Struct. Biol. 9 (6): 732–7. doi:10.1016/S0959-440X(99)00037-8. PMID 10607665. 
  8. Nagano, N; Orengo, CA; Thornton, JM (Aug 30, 2002). "One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions.". Journal of Molecular Biology 321 (5): 741–65. PMID 12206759. 
  9. Farber, G (1993). "An α/β-barrel full of evolutionary trouble". Current Opinion in Structural Biology 3 (3): 409–412. doi:10.1016/S0959-440X(05)80114-9. 
This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.