Protein superfamily

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is based on structural alignment^[1] and mechanistic similarity even though no sequence similarity is evident.^[2] Superfamilies typically contain several protein families which show sequence similarity within the family. The term protein clan is commonly used for protease superfamilies based on the MEROPS protease classification system.^[2]

Identification

Above, secondary structural conservation of 80 members of the PA protease clan (superfamily). H indicates α-helix, E indicates β-sheet, L indicates loop. Below, sequence conservation for the same alignment. Arrows indicate catalytic triad residues. Aligned on the basis of structure by DALI

Structural homology in the PA superfamily (PA clan). The double beta-barrel that characterises the superfamily is highlighted in red. Shown are representative structures from several families within the PA superfamily. Note that some proteins show partially modified structural. Chymotrypsin (1gg6), tobacco etch virus protease (1lvm), calicivirin (1wqs), west nile virus protease (1fp7), exfoliatin toxin (1exf), HtrA protease (1l1j), snake venom plasminogen activator (1bqy), chloroplast protease (4fln) and equine arteritis virus protease (1mbm).

Sequence homology

Superfamily members typically show no detectable sequence homology. Indeed they are often impossible to align due to frequent insertions and deletions. In the PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Structural homology

Main article: structural alignment

Structure is much more evolutionarily conserved than sequence (as also exemplified by the PA clan of proteases). Very few residues show much amino acid sequence conservation, however secondary structural elements are highly conserved as are their arrangement in tertiary structural motifs. Structural alignment programs such as DALI can use the 3D structure of a protein of interest to find proteins with similar folds. Comparing 3D structures can identify instances of evolutionary relatedness that sequence comparison cannot.

Mechanistic similarity

The catalytic mechanism of enzymes within a superfamily is typically conserved, although substrate specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence. Once again, the PA clan of proteases acts as an example. Even though families within the superfamily use different nucleophiles, they all perform covalent, nucleophilic catalysis on proteins, peptides or amino acids through a similar mechanism.

Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry.^[3] They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA).^[4]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).

Examples

PA clan - Members share a chymotrypsin-like double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both cysteine and serine proteases (different nucleophiles).^[2]^[5]

α/β hydrolase superfamily - Members share an α/β sheet, containing 8 strands connected by helices with catalytic triad residues in the same order,^[6] activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases.^[7]

TIM barrel superfamily - Members share a large α₈β₈ barrel structure. It is one of the most common protein folds and the monophylicity of this superfamily is still contested.^[8]^[9]

Alkaline phosphatase superfamily - Members share an αβα sandwich structure^[10] as well as performing common promiscuous reactions by a common mechanism.^[11]

Immunoglobulin superfamily - Members share a sandwich-like structure of two sheets of antiparallel beta strands (Ig-fold), and are involved in recognition, binding, and adhesion.^[12]^[13]

Globin superfamily - Members share an 8- helix globular globin fold.^[14]^[15]

Ras superfamily - Members share a common catalytic G domain of a 6-strand beta sheet surrounded by 5 alpha helices.^[16]