Protein domain

From Wikipedia, the free encyclopedia

The references in this article would be clearer with a different or consistent style of citation, footnoting, or external linking.

Pyruvate kinase, a protein from three domains (PDB 1pkn)

A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are self-stable, domains can be "swapped" by genetic engineering between one protein and another to make chimera proteins.

1 Background
2 Domains are units of protein structure
3 Relationship between primary and tertiary structure
4 Multidomain proteins
5 Domains are autonomous folding units
- 5.1 Folding
- 5.2 Advantage of domains in protein folding
6 Domains and quaternary structure
- 6.1 About quaternary structures
- 6.2 Domain swapping
7 Domains and protein flexibility
8 Domain definition from structural co-ordinates
9 Example domains
10 See also
11 External links
- 11.1 Structural domain databases
- 11.2 Sequence domain databases
12 References
13 Key papers

[edit] Background

The concept of the domain was first proposed in 1973 by Wetlaufer after X-ray crystallographic studies of hen lysozyme (Phillips, 1966), papain (Drenth et al., 1968) and by limited proteolysis studies of immunoglobulins (Porter, 1973; Edelman, 1973). Wetlaufer defined domains as stable units of protein structure that could fold autonomously. In the past domains have been described as units of:

compact structure^[1]
function and evolution^[2]
folding (Wetlaufer, 1973).

Each definition is valid and will often overlap, i.e. a compact structural domain that is found amongst diverse proteins is likely to fold independently within its structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities (Chothia, 1992). In a multidomain protein, each domain may fulfil its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.

An appropriate example is pyruvate kinase, a glycolytic enzyme that plays an important role in regulating the flux from fructose-1,6-biphosphate to pyruvate. It contains an all-β regulatory domain, an α/β-substrate binding domain and an α/β-nucleotide binding domain, connected by several polypeptide linkers (George and Heringa, 2002a) (see figure, right). Each domain in this protein occurs in diverse sets of protein families.

The central α/β-barrel substrate binding domain is one of the most common enzyme folds. It is seen in many different enzyme families catalysing completely unrelated reactions (Hegyi and Gerstein, 1999). The α/β-barrel is commonly called the TIM barrel named after triose phosphate isomerase, which was the first such structure to be solved^[3]. It is currently classified into 26 homologous families in the CATH domain database (Orengo et al., 1997). The TIM barrel is formed from a sequence of β-α-β motifs closed by the first and last strand hydrogen bonding together, forming an eight stranded barrel. There is debate about the evolutionary origin of this domain. One study has suggested that a single ancestral enzyme could have diverged into several families^[4], while another suggests that a stable TIM-barrel structure has evolved through convergent evolution (Lesk et al., 1989).

The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than one segment of the polypeptide is required to form the domain. This is likely to be the result of the insertion of one domain into another during the protein's evolution. It has been shown from known structures that about a quarter of structural domains are discontinuous (Jones et al., 1998; Holm and Sander, 1994). The inserted β-barrel regulatory domain is 'continuous', made up of a single stretch of polypeptide.

Covalent association of two domains represents a functional and structural advantage since there is an increase in stability when compared with the same structures non-covalently associated (Ghelis and Yon, 1979). Other, advantages are the protection of intermediates within inter-domain enzymatic clefts that may otherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of the enzymatic activity necessary for a sequential set of reactions (Ostermeier and Benkovic, 2000).

[edit] Domains are units of protein structure

Main article: Protein structure

[edit] Primary structure

The primary structure (string of amino acids) of a protein encodes its uniquely folded 3D conformation.^[5] The most important factor governing the folding of a protein into 3D structure is the distribution of polar and non-polar side chains.^[6] Folding is driven by the burial of hydrophobic side chains into the interior of the molecule so to avoid contact with the aqueous environment.

Sequence alignment is an important tool for determining domains.

[edit] Secondary structure

Generally proteins have a core of hydrophobic residues surrounded by a shell of hydrophilic residues. Since the peptide bonds themselves are polar they are neutralised by hydrogen bonding with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called 'secondary structure'. There are two main types of secondary structure:

α-helices
β-sheet

[edit] Secondary structure motifs

Some simple combinations of secondary structure elements have been found to frequently occur in protein structure and are referred to as 'super-secondary structure' or motifs. For example, the β-hairpin motif consists of two adjacent antiparallel β-strands joined by a small loop. It is present in most antiparallel β structures both as an isolated ribbon and as part of more complex β-sheets. Another common super-secondary structure is the β-α-β motif, which is frequently used to connect two parallel β-strands. The central α-helix connects the C-termini of the first strand to the N-termini of the second strand, packing its side chains against the β-sheet and therefore shielding the hydrophobic residues of the β-strands from the surface.

[edit] Tertiary structure

Several motifs pack together to form compact, local, semi-independent units called domains.^[1] The overall 3D structure of the polypeptide chain is referred to as the protein's 'tertiary structure'. Domains are the fundamental units of tertiary structure, each domain containing an individual hydrophobic core built from secondary structural units connected by loop regions. The packing of the polypeptide is usually much tighter in the interior than the exterior of the domain producing a solid-like core and a fluid-like surface.^[7] In fact, core residues are often conserved in a protein family, whereas the residues in loops are less conserved, unless they are involved in the protein's function. Protein tertiary structure can be divided into four main classes based on the secondary structural content of the domain.^[8]

All-α domains have a domain core built exclusively from α-helices. This class is dominated by small folds, many of which form a simple bundle with helices running up and down.
All-β domains have a core comprising of antiparallel β-sheets, usually two sheets packed against each other. Various patterns can be identified in the arrangement of the strands, often giving rise to the identification of recurring motifs, for example the Greek key motif.^[9]
α+β domains are a mixture of all-α and all-β motifs. Classification of proteins into this class is difficult because of overlaps to the other three classes and therefore is not used in the CATH domain database.^[10]
α/β domains are made from a combination of β-α-β motifs that predominantly form a parallel β-sheet surrounded by amphipathic α-helices. The secondary structures are arranged in layers or barrels.

Structural alignment is an important tool for determining domains.

[edit] Domains have limits on size

Domains have limits on size.^[11] The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1,^[12] but the majority, 90%, have less than 200 residues^[13] with an average of approximately 100 residues.^[14] Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.^[15]

[edit] Relationship between primary and tertiary structure

[edit] Modules

Nature is a tinkerer and not an inventor,^[16] new sequences are adapted from pre-existing sequences rather than invented. Domains are the common material used by nature to generate new sequences, they can be thought of as genetically mobile units, referred to as 'modules'. Often, the C and N termini of domains are close together in space, allowing them to easily be "slotted into" parent structures during the process of evolution. Many domain families are found in all three forms of life, Archaea, Bacteria and Eukarya. Domains that are repeatedly found in diverse proteins are often referred to as modules, examples can be found among extracellular proteins associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface adhesion molecules and cytokine receptors.^[17]

[edit] Protein families

Molecular evolution gives rise to families of related proteins with similar sequence and structure. However, sequence similarities can be extremely low between proteins that share the same structure. Protein structures may be similar because proteins have diverged from a common ancestor. Alternatively, some folds may be more favored than others as they represent stable arrangements of secondary structures and some proteins may converge towards these folds over the course of evolution . There are currently about 45,000 experimentally determined protein 3D structures deposited within the Protein Data Bank (PDB).^[18] However this set contains a lot of identical or very similar structures. All proteins should be classified to structural families to understand their evolutionary relationships. Structural comparisons are best achieved at the domain level. For this reason many algorithms have been developed to automatically assign domains in proteins with known 3D structure, see 'Domain definition from structural co-ordinates'.

[edit] Super-folds

The CATH domain database classifies domains into approximately 800 fold families, ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity.^[19] The most populated is the α/β-barrel super-fold as described previously.

[edit] Multidomain proteins

The majority of genomic proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins created as a result of gene duplication events.^[20] Many domains in multidomain structures could have once existed as independent proteins. More and more domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes.^[21] For example, vertebrates have a multi-enzyme polypeptide containing the GAR synthetase, AIR synthetase and GAR transformylase modules (GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt, in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain is encoded separately.^[22]

[edit] Origin

Multidomain proteins are likely to have emerged from a selective pressure during evolution to create new functions. Various proteins have diverged from common ancestors by different combinations and associations of domains. Modular units frequently move about, within and between biological systems through mechanisms of genetic shuffling:

transposition of mobile elements including horizontal transfers (between species);^[23]
gross rearrangements such as inversions, translocations, deletions and duplications;
homologous recombination;
slippage of DNA polymerase during replication.

[edit] Difference in proliferation

It is likely that all these and organisms. For example, the ABC transporter domain constitutes one of the largest domain families that appear in all organisms.^[24] Many other families that appear in all organisms show much less proliferation. These include metabolic enzymes and components of translational apparatus.

[edit] Types of organisation

The simplest multidomain organisation seen in proteins is that of a single domain repeated in tandem.^[25] The domains may interact with each other or remain isolated, like beads on string. The giant 30,000 residue muscle protein titin comprises about 120 fibronectin-III-type and Ig-type domains.^[26] In the serine proteases, a gene duplication event has led to the formation of a two β-barrel domain enzyme.^[27] The repeats have diverged so widely that there is no obvious sequence similarity between them. The active site is located at a cleft between the two β-barrel domains, in which functionally important residues are contributed from each domain. Genetically engineered mutants of the chymotrypsin serine protease were shown to have some proteinase activity even though their active site residues were abolished and it has therefore been postulated that the duplication event enhanced the enzyme's activity.^[27]

[edit] Connectivity

Modules frequently display different connectivity relationships, as illustrated by the kinesins and ABC transporters. The kinesin motor domain can be at either end of a polypeptide chain that includes a coiled-coil region and a cargo domain.^[28] ABC transporters are built with up to four domains consisting of two unrelated modules, ATP-binding cassette and an integral membrane module, arranged in various combinations.

[edit] Domain insertion

Not only do domains recombine, but there are many examples of a domain having been inserted into another. Sequence or structural similarities to other domains demonstrate that homologues of inserted and parent domains can exist independently. An example is that of the 'fingers' inserted into the 'palm' domain within the polymerases of the Pol I family.^[29]

[edit] Difference between structural and evolutionary domain

Since a domain can be inserted into another, there should always be at least one continuous domain in a multidomain protein. This is the main difference between definitions of structural domains and evolutionary/functional domains. An evolutionary domain will be limited to one or two connections between domains, whereas structural domains can have unlimited connections, within a given criterion of the existence of a common core. Several structural domains could be assigned to an evolutionary domain.

[edit] Domains are autonomous folding units

[edit] Folding

Main article: protein folding

[edit] History

‘’Protein folding - the unsolved problem’’ Since the seminal work of Anfinsen over forty years ago,^[5] the goal to completely understand the mechanism by which a polypeptide rapidly folds into its stable native conformation remains elusive. Many experimental folding studies have contributed much to our understanding, but the principles that govern protein folding are still based on those discovered in the very first studies of folding. Anfinsen showed that the native state of a protein is thermodynamically stable, the conformation being at a global minimum of its free energy.

[edit] Folding pathway

Folding is a directed search of conformational space allowing the protein to fold on a biologically feasible time scale. The Levinthal paradox states that if an averaged sized protein would sample all possible conformations before finding the one with the lowest energy, the whole process would take billions of years.^[30] Proteins typically fold within 0.1 and 1000 seconds, therefore the protein folding process must be directed some way through a specific folding pathway. The forces that direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.^[31]

Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,^[32] where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chains conformational options become increasingly narrowed ultimately toward one native structure.

[edit] Order of folding

Many experimental studies suggest that protein folding begins with the formation of secondary structure, followed by the co-operative assembly into tertiary structure mainly driven by hydrophobic interactions.^[33] During the initial stages of folding, regions in the polypeptide will spontaneously form elements of secondary structure stabilised by a combination of local and long range interactions, both of which are primarily hydrophobic.^[34] Secondary and tertiary structure are expected to appear simultaneously in a co-operative process,^[35] where largely pre-formed secondary structures are joined together in a cluster-induced collapse.^[36] This process of folding has been termed 'nucleation-condensation'.^[37] There will be many possible folding pathways because of the many different combinations of secondary structure, but some pathways will be related since they involve the same secondary structural units. Only the proper combination of secondary structural elements will condense to the native structure.

The fact that algorithms used to predict secondary structure are at best 75% accurate, suggests that some of the observed elements of secondary structure in proteins are formed from non-local interactions.^[38] Also, the free energy difference between α-helical, β-sheet and unstructured coil conformations for most sequences is small enough that their structures can be interchangeable, demonstrating that proteins have a structural plasticity that allows them to change conformation readily. The tertiary interaction will make the final selection to the actual native topology.

[edit] Advantage of domains in protein folding

The organisation of large proteins by structural domains represents an advantage for protein folding, with each domain being able to individually fold, accelerating the folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins,^[39] domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.^[40] However, the role of inter-domain interactions in protein folding and in energetics of stabilisation of the native structure, probably differs for each protein. In T4 lysozyme, the influence of one domain on the other is so strong that the entire molecule is resistant to proteolytic cleavage. In this case, folding is a sequential process where the C-terminal domain is required to fold independently in an early step, and the other domain requires the presence of the folded C-terminal domain for folding and stabilisation.^[41]

It has been found that the folding of an isolated domain can take place at the same rate or sometimes faster than that of the integrated domain.^[42] Suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains.^[43] This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable,^[44] such as the removal of water from the domain interface.

[edit] Domains and quaternary structure

[edit] About quaternary structures

Main article: quaternary structure

Many proteins have a quaternary structure, which consists of several polypeptide chains that associate into an oligomeric molecule. Each polypeptide chain in such a protein is called a subunit. Hemoglobin, for example, consists of two α and two β subunits. Each of the four chains has an all-α globin fold with a heme pocket.

[edit] Domain swapping

Domain swapping is a mechanism for forming oligomeric assemblies.^[45] In domain swapping, a secondary or tertiary element of a monomeric protein is replaced by the same element of another protein. Domain swapping can range from secondary structure elements to whole structural domains. It also represents a model of evolution for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site at subunit interfaces.^[46]

[edit] Domains and protein flexibility

The presence of multiple domains in proteins gives rise to a great deal of flexibility and mobility. One of the largest observed domain motions is the `swivelling' mechanism in pyruvate phosphate dikinase. The phosphoinositide domain swivels between two states in order to bring a phosphate group from the active site of the nucleotide binding domain to that of the phosphoenolpyruvate/pyruvate domain.^[47] The phosphate group is moved over a distance of 45A involving a domain motion of about 100 degrees around a single residue. Domain motions are important for:^[48]

catalysis;
regulatory activity;
transport of metabolites;
formation of protein assemblies and
cellular locomotion.

In enzymes, the closure of one domain onto another captures a substrate by an induced fit, allowing the reaction to take place in a controlled way. Such motions can be observed when two or more crystallographic 3D structures of a protein are experimentally determined in alternate environments, or from the analysis of nuclear magnetic resonance (NMR) derived structures. A detailed analysis by Gerstein et al. (1994) led to the classification of two basic types of domain motion; hinge and shear. Only a relatively small portion of the chain, namely the inter-domain linker and side chains undergo significant conformational changes upon domain rearrangement.^[49]

[edit] Hinges by secondary structures

A study by Hayward^[50] found that the termini of α-helices and β-sheets form hinges in a large number of cases. Many hinges were found to involve two secondary structure elements acting like hinges of a door, allowing an opening and closing motion to occur. This can arise when two neighbouring strands within a β-sheet situated in one domain, diverge apart as they join the other domain. The two resulting termini then form the bending regions between the two domains. α- helices that preserve their hydrogen bonding network when bent are found to behave as mechanical hinges, storing `elastic energy' that drives the closure of domains for rapid capture of a substrate.^[51]

[edit] Helical to extended conformation

The interconversion of helical and extended conformations at the site of a domain boundary is not uncommon. In calmodulin, torsion angles change for five residues in the middle of a domain linking α-helix. The helix is split into two, almost perpendicular, smaller helices separated by four residues of an extended strand.^[52]^[53]

[edit] Shear motions

Shear motions involve a small sliding movement of domain interfaces, controlled by the amino acid side chains within the interface. Proteins displaying shear motions often have a layered architecture: stacking of secondary structures. The interdomain linker has merely the role of keeping the domains in close proximity.

[edit] Domain definition from structural co-ordinates

The importance of domains as structural building blocks and elements of evolution has brought about many automated methods for their identification and classification in proteins of known structure. Automatic procedures for reliable domain assignment is essential for the generation of the domain databases, especially as the number of protein structures is increasing. Although the boundaries of a domain can be determined by visual inspection, construction of an automated method is not straightforward. Problems occur when faced with domains that are discontinuous or highly associated.^[54] The fact that there is no standard definition of what a domain really is has meant that domain assignments have varied enormously, with each researcher using a unique set of criteria.^[55]

A structural domain is a compact, globular sub-structure with more interactions within it than with the rest of the protein.^[49] Therefore, a structural domain can be determined by two visual characteristics; its compactness and its extent of isolation.^[56] Measures of local compactness in proteins have been used in many of the early methods of domain assignment^[57]^[58] and in several of the more recent methods.^[59]^[60]^[61]

[edit] Considering proteins as small segments

One of the first algorithms^[57] used a Cα-Cα distance map together with a hierarchical clustering routine that considered proteins as several small segments, 10 residues in length. The initial segments were clustered one after another based on inter-segment distances; segments with the shortest distances were clustered and considered as single segments thereafter. The stepwise clustering finally included the full protein. Go (1978) also exploited the fact that inter-domain distances are normally larger than intra-domain distances; all possible Cα-Cα distances were represented as diagonal plots in which there were distinct patterns for helices, extended strands and combinations of secondary structures.

[edit] Sowdhamini and Blundell’s method

The method by Sowdhamini and Blundell (1995) clusters secondary structures in a protein based on their Cα-Cα distances and identifies domains from the pattern in their dendrograms. As the procedure does not consider the protein as a continuous chain of amino acids there are no problems in treating discontinuous domains. Specific nodes in these dendrograms are identified as tertiary structural clusters of the protein, these include both super-secondary structures and domains. The DOMAK algorithm is used to create the 3Dee domain database.^[62] It calculates a 'split value' from the number of each type of contact when the protein is divided arbitrarily into two parts. This split value is large when the two parts of the structure are distinct.

[edit] Method of Wodak and Janin

The method of Wodak and Janin^[63] was based on the calculated interface areas between two chain segments repeatedly cleaved at various residue positions. Interface areas were calculated by comparing surface areas of the cleaved segments with that of the native structure. Potential domain boundaries can be identified at a site where the interface area was at a minimum.

Other methods have used measures of solvent accessibility to calculate compactness.^[64]^[65]

[edit] PUU algorithm

The PUU algorithm^[66] incorporates a harmonic model used to approximate inter-domain dynamics. The underlying physical concept is that many rigid interactions will occur within each domain and loose interactions will occur between domains. This algorithm is used to define domains in the FSSP domain database.^[67]

[edit] DETECTIVE

Swindells (1995) developed a method, DETECTIVE, for identification of domains in protein structures based on the idea that domains have a hydrophobic interior. Deficiencies were found to occur when hydrophobic cores from different domains continue through the interface region.

[edit] Example domains

Armadillo repeats. Named after the β-catenin-like Armadillo protein of the fruit fly Drosophila.

Basic Leucine zipper domain (bZIP domain) is found in many DNA-binding eukaryotic proteins. One part of the domain contains a region that mediates sequence-specific DNA-binding properties and the Leucine zipper that is required for the dimerization of two DNA-binding regions. The DNA-binding region comprises a number of basic aminoacids such as arginine and lysine

Cadherin repeats. Cadherins function as Ca²⁺-dependent cell-cell adhesion proteins. Cadherin domains are extracellular regions which mediate cell-to-cell homophilic binding between cadherins on the surface of adjacent cells.

Death effector domain (DED) allows protein-protein binding by homotypic interactions (DED-DED). Caspase proteases trigger apoptosis via proteolytic cascades. Pro-Caspase-8 and pro-caspase-9 bind to specific adaptor molecules via DED domains and this leads to autoactivation of caspases.

EF hand, a helix-turn-helix structural motif found in each structural domain of the signaling protein calmodulin and in the muscle protein troponin-C.

Immunoglobulin-like domains are found in proteins of the immunoglobulin superfamily (IgSF). ^[68] They contain about 70-110 amino acids and are classified into different categories (IgV, IgC1, IgC2 and IgI) according to their size and function. They possess a characteristic fold in which two beta sheets form a “sandwich” that is stabilized by interactions between conserved cysteines and other charged amino acids. They are important for protein-to-protein interactions in processes of cell adhesion, cell activation, and molecular recognition. These domains are commonly found in molecules with roles in the immune system.

Phosphotyrosine-binding domain (PTB). PTB domains usually bind to phosphorylated tyrosine residues. They are often found in signal transduction proteins. PTB-domain binding specificity is determined by residues to the amino-terminal side of the phosphotyrosine. Examples: the PTB domains of both SHC and IRS-1 bind to a NPXpY sequence. PTB-containing proteins such as SHC and IRS-1 are important for insulin responses of human cells.

Pleckstrin homology domain (PH). PH domains bind phosphoinositides with high affinity. Specificity for PtdIns(3)P, PtdIns(4)P, PtdIns(3,4)P2, PtdIns(4,5)P2, and PtdIns(3,4,5)P3 have all been observed. Given the fact that phosphoinositides are sequestered to various cell membranes (due to their long lipophilic tail) the PH domains usually causes recruitment of the protein in question to a membrane where the protein can exert a certain function in cell signalling, cytoskeletal reorganization or membrane trafficking.

Src homology 2 domain (SH2). SH2 domains are often found in signal transduction proteins. SH2 domains confer binding to phosphorylated tyrosine (pTyr). Named after the phosphotyrosine binding domain of the src viral oncogene, which is itself a tyrosine kinase. See also: SH3 domain.

Zinc finger DNA binding domain (ZnF_GATA). ZnF_GATA domain-containing proteins are typically transcription factors that usually bind to the DNA sequence [AT]GATA[AG] of promoters.

The preceding text and figures originate from "Predicting Structural Domains in Proteins" George RA, 2002

[edit] See also

[edit] External links

The Protein Families (Pfam) database clan browser provides easy access to information about protein structural domains. A clan contains two or more Pfam families that have arisen from a single evolutionary origin.

[edit] Structural domain databases

[edit] Sequence domain databases

InterPro
Pfam
PROSITE
ProDom
SMART
NCBI Conserved Domain Database
SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms

[edit] References

^ ^a ^b Richardson, J. S. (1981). "The anatomy and taxonomy of protein structure". Adv Protein Chem, 34:167-339.
^ Bork, P. (1991). "Shuffled domains in extracellular proteins". FEBS Lett, 286:47-54.
^ Banner, D. W., Bloomer, A. C., Petsko, G. A., Phillips, D. C., Pogson, C. I., Wilson, I. A., Corran, P. H., Furth, A. J., Milman, J. D., O ord, R. E., Priddle, J. D., and Waley, S. G. (1975). "Structure of chicken muscle triose phosphate isomerase determined crystallographically at 2.5 angstrom resolution using amino acid sequence data". Nature, 255:609-614.
^ Copley, R. R. and Bork, P. (2000). "Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways". J Mol Biol, 303:627-641.
^ ^a ^b Anfinsen, B. C., Haber, E., Sela, M., and White, Jr, F. H. (1961). "The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain". Biochemistry, 47:1309-1314. Anfinsen's Dogma
^ Cordes, M. H., Davidson, A. R., and Sauer, R. T. (1996). "Sequence space, folding and protein design". Curr Opin Struct Biol, 6:3-10.
^ Zhou, Y., Vitkup, D., and Karplus, M. (1999). "Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state". J Mol Biol, 285:1371-1375.
^ Levitt and Chothia, 1976
^ Hutchinson and Thornton, 1993
^ Orengo et al., 1997
^ Savageau, 1986
^ Jones et al., 1998
^ Siddiqui and Barton, 1995
^ Islam et al., 1995
^ Garel, 1992
^ Jacob, 1977
^ Campbell and Downing, 1994
^ http://www.pdb.org/
^ Orengo et al., 1994
^ Apic, G., Gough, J., and Teichmann, S. A. (2001). "Domain combinations in archaeal, eubacterial and eukaryotic proteomes". J Mol Biol, 310:311-325.
^ Davidson et al., 1993
^ Henikoff et al., 1997
^ Bork, P. and Doolittle, R. F. (1992). "Proposed acquisition of an animal protein domain by bacteria2. Proc Natl Acad Sci U S A, 89:8990-8994.
^ Henikoff et al., 1997
^ Heringa 1998
^ Politou, A. S., Gautel, M., Improta, S., Vangelista, L., and Pastore, A. (1996). "The elastic I-band region of titin is assembled in a 'modular' fashion by weakly interacting Ig-like domains". J Mol Biol, 255:604-616.
^ ^a ^b McLachlan, A. D. (1979). "Gene duplications in the structural evolution of chymotrypsin". J Mol Biol, 128:49-79.
^ Moore and Endow, 1996
^ Russell, R. B. (1994). "Domain insertion". Protein Eng, 7:1407-1410.
^ Levinthal, 1968
^ Dill, 1999
^ Leopold et al., 1992; Dill and Chan, 1997
^ Dobson, C. M. and Karplus, M. (1999). "The fundamentals of protein folding: bringing together theory and experiment". Curr Opin Struct Biol, 9:92-101.
^ Dyson et al., 1992; Yang and Honig, 1995b; Yang and Honig, 1995a
^ Kim and Baldwin, 1990
^ Heringa and Argos, 1991
^ Fersht, 1997
^ Honig, 1999
^ White and Jacobs 1990
^ George and Heringa 2002b; George et al 2005
^ Desmadril, M. and Yon, J. M. (1981). "Existence of intermediates in the refolding of T4 lysozyme at pH 7.4". Biochem Biophys Res Commun, 101:563-569.
^ Teale and Benjamin, 1977
^ Garel, 1992
^ Creighton, T. E. (1983). Proteins: Structures and molecular properties. Freeman, New York. Second edition.
^ Bennett, M. J., Schlunegger, M. P., and Eisenberg, D. (1995). 3D domain swapping: a mechanism for oligomer assembly. Protein Sci, 4:2455-2468.
^ Heringa and Taylor, 1997
^ Herzberg et al., 1996
^ Gerstein et al., 1994
^ ^a ^b Janin, J. and Wodak, S. J. (1983). "Structural domains in proteins and their role in the dynamics of protein function". Prog Biophys Mol Biol, 42:21-78.
^ Hayward, 1999
^ Hayward, 1999
^ Meador, W. E., Means, A. R., and Quiocho, F. A. (1992). "Target enzyme recognition by calmodulin: 2.4A structure of a calmodulin-peptide complex". Science, 257:1251-1255.
^ Ikura, M., Clore, G. M., Gronenborn, A. M., Zhu, G., Klee, C. B., and Bax, A. (1992). "Solution structure of a calmodulin-target peptide complex by multidimensional NMR". Science, 256:632-638.
^ Sowdhamini and Blundell, 1995
^ Swindells, M. B. (1995). "A procedure for detecting structural domains in proteins". Protein Sci, 4:103-112.
^ Tsai and Nussinov, 1997
^ ^a ^b Crippen, G. M. (1978). "The tree structural organisation of proteins". J Mol Biol, 126:315-332.
^ Rossmann et al., 1974; Rose, 1979; Go, 1978
^ Holm and Sander, 1994; Islam et al., 1995; Siddiqui and Barton, 1995;
^ Zehfus, M. H. (1997). "Identification of compact, hydrophobically stabilized domains and modules containing multiple peptide chains". Protein Sci, 6:1210-1219.
^ Taylor, 1999
^ Siddiqui and Barton, 1995
^ Wodak, S. J. and Janin, J. (1981). "Location of structural domains in protein". Biochemistry, 20:6544-6552.
^ Rashin, 1985; Islam et al., 1995;
^ Zehfus, M. H. and Rose, G. D. (1986). "Compact units in proteins. Biochemistry", 25:5759-5765.
^ Holm and Sander, 1994
^ Holm and Sander, 1997
^ Barclay A (2003). "Membrane proteins with immunoglobulin-like domains--a master superfamily of interaction molecules". Semin Immunol 15 (4): 215–23. doi:10.1016/S1044-5323(03)00047-2. PMID 14690046.

[edit] Key papers

Bastian, H. C. (1872). The beginnings of life: being some account of the nature, modes of origin and transformation of lower organisms. Macmillan and Co., England.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000). "The Protein Data Bank". Nucleic Acids Res, 28:235-242.
Branden, C.-I. and Tooze, J. (1991). Introduction to protein structure. Garland, New York.
Campbell, I. D. and Downing, A. K. (1994). "Building protein structure and function from modular units". Trends Biotech, 12:168-172.
Chothia, C. (1992) "Proteins. One thousand families for the molecular biologist". Nature 357:543-4.
Das, S. and Smith, T. F. (2000). "Identifying nature's protein Lego set". Adv Protein Chem, 54:159-183.
Davidson, J. N., Chen, K. C., Jamison, R. S., Musmanno, L. A., and Kern, C. B. (1993). "The evolutionary history of the first three enzymes in pyrimidine biosynthesis". Bioessays, 15:157-164.
Dietmann, S., Park, J., Notredame, C., Heger, A., Lappe, M., and Holm, L. (2001). "A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3". Nucleic Acids Res, 29:55-57.
Dill, K. A. and Chan, H. S. (1997). "From Leventhal to pathways to funnels". Nat Struc Biol, 4:10-19.
Dill, K. A. (1999). "Polymer principles and protein folding". Protein Sci, 8:1166-1180.
Drenth, J., Jansonius, J. N., Koekoek, R., Swen, H. M., and Wolthers, B. G. (1968). "Structure of papain". Nature, 218:929-932.
Dyson, H. J., Sayre, J. R., Merutka, G., Shin, H. C., Lerner, R. A., and Wright, P. E. (1992). "Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. II. Plastocyanin". J Mol Biol, 226:819-835.
Edelman, G. M. (1973). "Antibody structure and molecular immunology". Science, 180:830-840.
Fersht, A. R. (1997). "Nucleation mechanisms in protein folding". Curr Opin Struct Biol, 7:3-9.
Garel, J. (1992). "Folding of large proteins: Multidomain and multisubunit proteins". In Creighton, T., editor, Protein Folding, pages 405-454. W.H. Freeman and Company, New York, first edition.
George, D. G., Hunt, L. T., and Barker, W. C. (1996). "PIR-international protein sequence database". Methods Enzymol, 266:41-59.
George, R. A. (2002) "Predicting Structural Domains in Proteins". Thesis, University College London
George, R. A. and Heringa, J. (2002a) "An analysis of protein domain linkers: their classification and role in protein folding". Protein Eng 15, 871-879.
George, R. A. and Heringa, J. (2002b) "SnapDRAGON - a method to delineate protein structural domains from sequence data". J Mol Biol 316, 839-851.
George, R. A., Lin, K., and Heringa, J (2005) "Scooby-domain: prediction of globular domains in protein sequence". Nucleic Acids Res 33, W160-W163.
Gerstein, M., Lesk, A. M., and Chothia, C. (1994). "Structural mechanisms for domain movements in proteins". Biochemistry, 33:6739-6749.
Ghelis, C. and Yon, J. M. (1979). "Conformational coupling between structural units. A decisive step in the functional structure formation". C R Seances Acad Sci D, 289:197-199.
Go, M. (1978). "Correlation of DNA exonic regions with protein structural units in haemoglobin". Nature, 291:90-92.
Hadley, C. and Jones, D. T. (1999). "A systematic comparison of protein structure classifications SCOP, CATH and FSSP". Struct Fold Des, 7:1099-1112.
Hayward, S. (1999). "Structural principles governing domain motions in proteins". Proteins, 36:425-435.
Hegyi, H. and Gerstein, M. (1999). "The relationship between protein structure and function: a comprehensive survey with application to the yeast genome". J Mol Biol, 288:147-164.
Henikoff, S., Greene, E. A., Pietrokovski, S., Bork, P., Attwood, T. K., and Hood, L. (1997). "Gene families: the taxonomy of protein paralogs and chimeras". Science, 278:609-614.
Heringa, J. and Argos, P. (1991). "Side-chain clusters in protein structures and their role in protein folding". J Mol Biol, 220:151-171.
Heringa, J. (1998). "Detection of internal repeats: how common are they". Curr Opin Struct Biol, 8:338-345.
Heringa, J. and Taylor, W. R. (1997). "Three-dimensional domain duplication, swapping and stealing". Curr Opin Struct Biol, 7:416-421.
Herzberg, O., Chen, C. C., Kapadia, G., McGuire, M., Carroll, L. J., Noh, S. J., and Dunaway-Mariano, D. (1996). "Swiveling-domain mechanism for enzymatic phosphotransfer between remote reaction sites". Proc Natl Acad Sci U S A, 93:2652-2657.
Holm, L. and Sander, C. (1994). "Parser for protein folding units". Proteins, 19:256-268.
Holm, L. and Sander, C. (1997). "Dali/FSSP classification of three-dimensional protein folds". Nucleic Acids Res, 25:231-234.
Honig, B. (1999). "Protein folding: from the levinthal paradox to structure prediction". J Mol Biol, 293:283-293.
Hutchinson, E. G. and Thornton, J. M. (1993). "The Greek key motif - extraction, classification and analysis". Protein Eng, 6:233-245.
Islam, S. A., Luo, J., and Sternberg, M. J. E. (1995). "Identification and analysis of domains in proteins". Prot Eng, 8:513-525.
Jacob, F. (1977). "Evolution and tinkering". Science, 196:1161-1166.
Jones, S., Stewart, M., Michie, A., Swindells, M. B., Orengo, C., and Thornton, J. M. (1998). "Domain assignment for protein structures using a consensus approach: characterization and analysis". Protein Sci, 7:233-242.
Kim, P. S. and Baldwin, R. L. (1990). "Intermediates in the folding reactions of small proteins". Annu Rev Biochem, 59:631-660.
Larsen, T. M., Laughlin, L. T., Holden, H. M., Rayment, I., and Reed, G. H. (1994). "Structure of rabbit muscle pyruvate kinase complexed with Mn2+, K+, and pyruvate". Biochemistry, 33:6301-6309.
Leopold, P. E., Montal, M., and Onuchic, J. N. (1992). "Protein folding funnels: a kinetic approach to the sequence-structure relationship". Proc Natl Acad Sci U S A, 89:8721-8725.
Lesk, A. M., Branden, C. I., and Chothia, C. (1989). "Structural principles of alpha/beta barrel proteins: the packing of the interior of the sheet". Proteins, 5:139-148.
Levinthal, C. (1968). "Are there pathways for protein folding?" J Chim Phys, 65:44-45.
Levitt, M. and Chothia, C. (1976). "Structural patterns in globular proteins". Nature, 261:552-558.
Moore, J. D. and Endow, S. A. (1996). "Kinesin proteins: a phylum of motors for microtubule-based motility". Bioessays, 18:207-219.
Murvai, J., Vlahovicek, K., Barta, E., Cataletto, B., and Pongor, S. (2000). "The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments". Nucleic Acids Res 28:260-262
Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). "SCOP: a structural classification of proteins database for the investigation of sequences and structures". J Mol Biol, 247:536-540.
Nissen, P., Hansen, J., Ban, N., Moore, P. B., and Steitz, T. A. (2000). "The structural basis of ribosome activity in peptide bond synthesis". Science, 289:920-930.
Orengo, C. A., Jones, D. T., and Thornton, J. M. (1994). "Protein superfamilies and domain superfolds". Nature, 372:631-634.
Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997). 2CATH - a hierarchic classification of protein domain structures". Structure, 5:1093-1108.
Ostermeier, M. and Benkovic, S. J. (2000). "Evolution of protein function by domain swapping". Adv Protein Chem, 55:29-77.
Phillips, D. C. (1966). "The three-dimensional structure of an enzyme molecule". Sci Am, 215:78-90.
Porter, R. R. (1973). 2Structural studies of immunoglobulins". Science, 180:713-716.
Rashin, A. (1985). "Location of domains in globular proteins". Methods Enzymol, 115:420-440.
Rose, G. D. (1979). "Hierarchic organisation of domains in globular proteins". J Mol Biol, 234:447-470.
Rossmann, M. G., Moras, D., and Olsen, K. W. (1974). "Chemical and biological evolution of nucleotide binding proteins". Nature, 250:194-199.
Savageau, M. A. (1986). "Proteins of Escherichia coli come in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications". Proc Natl Acad Sci U S A, 83:1198-1202.
Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork, P. (2000). "SMART: a web-based tool for the study of genetically mobile domains". Nucleic Acids Res, 28:231-234.
Siddiqui, A. S. and Barton, G. J. (1995). 2Continuous and discontinuous domains - an algorithm for the automatic generation of reliable protein domain definitions". Protein Sci, 4:872-884.
Siddiqui, A. S., Dengler, U., and Barton, G. J. (2001). "3Dee: a database of protein structural domains". Bioinformatics, 17:200-201.
Sowdhamini, R. and Blundell, T. (1995). "An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins". Protein Sci, 4:506-520.
Srinivasarao, G. Y., Yeh, L. S., Marzec, C. R., Orcutt, B. C., Barker, W. C., and Pfei er, F. (1999). "Database of protein sequence alignments: PIR-ALN". Nucleic Acids Res, 27:284-285.
Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., and Koonin, E. V. (2001). "The COG database: new developments in phylogenetic classification of proteins from complete genomes". Nucleic Acids Res, 29:22-28.
Taylor, W. R. and Orengo, C. A. (1989). "Protein structure alignment". J Mol Biol, 208:1-22.
Taylor, W. R. (1999). "Protein structure domain identification". Protein Eng, 12:203-216.
Teale, J. M. and Benjamin, D. C. (1977). "Antibody as immunological probe for studying refolding of bovine serum albumin. Refolding within each domain". J Biol Chem, 252:4521-4526. * Tsai, C. J. and Nussinov, R. (1997). "Hydrophobic folding units derived from dissimilar monomer structures and their interactions". Protein Sci, 6:24-42.
Wetlaufer, D. B. (1973). "Nucleation, rapid folding, and globular intrachain regions in proteins". Proc Natl Acad Sci U S A, 70:697-701.
White, S. H. and Jacobs, R. E. (1990). "Statistical distribution of hydrophobic residues along the length of protein chains. Implications for protein folding and evolution". Biophys J, 57:911-921.
Yang, A. S. and Honig, B. (1995a) "Free energy determinants of secondary structure formation: I. alpha-Helices". J Mol Biol, 252:351-365.
Yang, A. S. and Honig, B. (1995b). "Free energy determinants of secondary structure formation: II. Antiparallel beta-sheets". J Mol Biol, 252:366-376.

v • d • e Protein tertiary structure

General	Structural domain \| Protein folding \| Structure determination methods

All-α folds:	Helix bundle \| Globin fold \| Homeodomain fold \| Alpha solenoid

All-β folds:	Immunoglobulin fold \| Beta barrel \| Beta-propeller

α/β folds:	TIM barrel \| Leucine-rich repeat \| Flavodoxin fold \| Rossmann fold \| Thioredoxin fold \| Trefoil knot fold

α+β folds:	DNA clamp \| Ferredoxin fold \| Ribonuclease A \| SH2-like fold

Irregular folds:	Conotoxin

←Secondary structure Quaternary structure→

v • d • e Proteins

Protein biosynthesis - Posttranslational modification - Protein folding - Protein structure - Protein structural domains - Protein targeting - Proteome - Protein methods - Proteasome - List of types of proteins - List of proteins - Membrane protein - Globular protein - Fibrous protein

Categories: Protein structure | Protein domains

Hidden category: Wikipedia references cleanup