Conserved sequence

A sequence alignment, produced by ClustalO, of mammalian histone proteins.
Sequences are the amino acids for residues 120-180 of the proteins. Residues that are conserved across all sequences are highlighted in grey. Below the protein sequences is a key denoting conserved sequence (*), conservative mutations (:), semi-conservative mutations (.), and non-conservative mutations ( ).^[1]

In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences (such as RNA and DNA sequences), protein sequences, protein structures or polymeric carbohydrates across species (orthologous sequences) or within different molecules produced by the same organism (paralogous sequences). In the case of cross species conservation, this indicates that a particular sequence may have been maintained by evolution despite speciation. The further back up the phylogenetic tree a particular conserved sequence may occur the more highly conserved it is said to be. Since sequence information is normally transmitted from parents to progeny by genes, a conserved sequence implies that there is a conserved gene.

It is widely believed that mutation in a "highly conserved" region leads to a non-viable life form, or a form that is eliminated through natural selection.

Nucleic acid and protein sequences

Highly conserved DNA sequences are thought to have functional value. The role for many of these highly conserved non-coding DNA sequences is not understood. Ultra-conserved elements or sequences (UCEs or UCRs, ultra-conserved regions) that share 100% identity among human, mouse and rat were first described by Bejerano and colleagues in 2004.^[2] One recent study that eliminated four highly conserved non-coding DNA sequences in mice yielded viable mice with no significant phenotypic differences; the authors described their findings as "unexpected".^[3] Many regions of the DNA, including highly conserved DNA sequences, consist of repeated sequence (DNA) elements. One possible explanation of the null hypothesis above is that removal of only one or a subset of a repeated sequence could theoretically preserve phenotypic functioning on the assumption that one such sequence is sufficient and the repetitions are superfluous to essential life processes; it was not specified in the paper whether the eliminated sequences were repeated sequences. Although most of the conserved sequences' biological function is still unknown, few conserved sequences derived transcripts showed that their expression is deregulated in human cancer tissues.^[4]

Residues conserved among various G protein coupled receptors are highlighted in green.

Notation

A common notation to denote the level of sequence conservation is used by the clustal alignment programs. Below a set of aligned sequences, residue columns are indicated as fully conserved (*), containing only conservative mutations (:), semi-conservative mutations (.), and non-conservative mutations ( ).^[5]

Biological role

Highly conserved sequences are often required for basic cellular function, stability or reproduction. Sequence similarity is used as evidence of structural and functional conservation, and evolutionary relationships between sequences. Consequently, functional elements are frequently identified by searching for conserved sequence in a genome.

Conservation of protein-coding sequences leads to the presence of identical amino acid residues at analogous regions of the protein structure and hence similar function. Conservative mutations alter amino acids to similar chemically residues and so may still not affect the protein's function. Among the most highly conserved sequences are the active sites of enzymes and the binding sites of protein receptors.

Conserved non-coding sequences do not encode protein, but often harbour cis-regulatory elements. Some deletions of highly conserved sequences in humans (hCONDELs) and other organisms have been suggested to be a potential cause of the anatomical and behavioural differences between humans and other mammals.^[6]^[7] The TATA promoter sequence is an example of a highly conserved DNA sequence found in most eukaryotes.

Polymeric carbohydrate sequences

The monosaccharide sequence of the glycosaminoglycan heparin is conserved across a wide range of species.

References

↑ "Clustal FAQ #Symbols". Clustal. Retrieved 8 December 2014.
↑ Bejerano, G; Pheasant, M; Makunin, I; Stephen, S; Kent, WJ; Mattick, JS; Haussler, D (2004-05-28). "Ultraconserved elements in the human genome.". Science 304 (5675): 1321–5. doi:10.1126/science.1098119. PMID 15131266. |access-date= requires |url= (help)
↑ Ahituv N, Zhu Y, Visel A; et al. (2007). "Deletion of ultraconserved elements yields viable mice". PLoS Biol. 5 (9): e234. doi:10.1371/journal.pbio.0050234. PMC 1964772. PMID 17803355.
↑ Calin, GA; Liu, CG; Ferracin, M; Hyslop, T; Spizzo, R; Sevignani, C; Fabbri, M; Cimmino, A; Lee, EJ; Wojcik, SE; Shimizu, M; Tili, E; Rossi, S; Taccioli, C; Pichiorri, F; Liu, X; Zupo, S; Herlea, V; Gramantieri, L; Lanza, G; Alder, H; Rassenti, L; Volinia, S; Schmittgen, TD; Kipps, TJ; Negrini, M; Croce, CM (September 2007). "Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas.". Cancer Cell 12 (3): 215–29. doi:10.1016/j.ccr.2007.07.027. PMID 17785203. |access-date= requires |url= (help)
↑ "Clustal FAQ #Symbols". Clustal. Retrieved 8 December 2014.
↑ McLean, Cory Y.; et al. (10 March 2011). "Human-specific loss of regulatory DNA and the evolution of human-specific traits". Nature 471 (7337): 216–219. doi:10.1038/nature09774. PMC 3071156. PMID 21390129.
↑ Gross, Liza (September 2007). "Are "Ultraconserved" Genetic Elements Really Indispensable?". PLOS Biology 5 (9): e253. doi:10.1371/journal.pbio.0050253. PMC 1964769. PMID 20076686.