Gene cluster

Not to be confused with Human genetic clustering.

A gene family is a set of homologous genes within one organism. A gene cluster is part of a gene family. A gene cluster is a group of two or more genes found within an organism's DNA that encode for similar polypeptides, or proteins, which collectively share a generalized function and are often located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes.^[1] Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.

Hox genes have been observed among various phylum. Eight genes make up the Hox gene Drosophila. The number of Hox genes may vary among organisms, but the Hox genes collectively make up the Homeobox family.

Formation

Historically, four models have been proposed for the formation and persistence of gene clusters.

Gene duplication and divergence

This model has been generally accepted since the mid-1970s. It postulates that gene clusters were formed as a result of gene duplication and divergence.^[2] These gene clusters include the Hox gene cluster, the human β-globin gene cluster, and four clustered human growth hormone (hGH)/chorionic somaomammotropin genes.^[3]

Conserved gene clusters, such as Hox and the human β-globin gene cluster, may be formed as a result of the process of gene duplication and divergence. A gene is duplicated during cell division, so that its descendants have two end-to-end copies of the gene where it had one copy, initially coding for the same protein or otherwise having the same function. In the course of subsequent evolution, they diverge, so that the products they code for have different but related functions, with the genes still being adjacent on the chromosome.^[4] Ohno theorized that the origin of new genes during evolution was dependent on gene duplication. If only a single copy of a gene existed in the genome of a species, the proteins transcribed from this gene would be essential to their survival. Because there was only a single copy of the gene, they could not undergo mutations which would potentially result in new genes; however, gene duplication allows essential genes to undergo mutations in the duplicated copy, which would ultimately give rise to new genes over the course of evolution.^[5] Mutations in the duplicated copy were tolerated because the original copy contained genetic information for the essential gene's function. Species who have gene clusters have a selective evolutionary advantage because natural selection must keep the genes together.^[1]^[6] Over a short span of time, the new genetic information exhibited by the duplicated copy of the essential gene would not serve a practical advantage; however, over a long, evolutionary time period, the genetic information in the duplicated copy may undergo additional and drastic mutations in which the proteins of the duplicated gene served a different role than those of the original essential gene.^[5] Over the long, evolutionary time period, the two similar genes would diverge so the proteins of each gene were unique in their functions. Hox gene clusters, ranging in various sizes, are found among several phyla.

Hox cluster

When gene duplication occurs to produce a gene cluster, one or multiple genes may be duplicated at once. In the case of the Hox gene, a shared ancestral ProtoHox cluster was duplicated, resulting in genetic clusters in the Hox gene as well as the ParaHox gene, an evolutionary sister complex of the Hox gene.^[7] It is unknown the exact number of genes contained in the duplicated Protohox cluster; however, models exist suggesting that the duplicated Protohox cluster originally contained four, three, or two genes.^[8]

In the case where a gene cluster is duplicated, some genes may be lost. Loss of genes is dependent of the number of genes originating in the gene cluster. In the four gene model, the ProtoHox cluster contained four genes which resulted in two twin clusters: the Hox cluster and the ParaHox cluster.^[7] As its name indicates, the two gene model gave rise to the Hox cluster and the ParaHox cluster as a result of the ProtoHox cluster which contained only two genes. The three gene model was originally proposed in conjunction with the four gene model;^[8] however, rather than the Hox cluster and the ParaHox cluster resulting from a cluster containing three genes, the Hox cluster and ParaHox cluster were as a result of single gene tandem duplication, identical genes found adjacent on the same chromosome.^[7] This was independent of duplication of the ancestral ProtoHox cluster.

Intrachromosomal duplication is the duplication of genes within the same chromosome over the course of evolution (a-1). Mutations may occur in the duplicated copy, such as observed with the substitution of Guanine with Adenine (a-2). Alignment of DNA sequences exhibits homology between the two chromosomes (a-3). All segments were duplicated from the same ancestral DNA sequence as observed by the comparisons in b(i-iii).

Cis vs. trans duplication

Gene duplication may occur via cis-duplication or trans duplication. Cis-duplication, or intrachromosomal duplication, entails the duplication of genes within the same chromosome whereas trans duplication, or interchromosomal duplication, consists of duplicating genes on neighboring but separate chromosomes.^[7] The formations of the Hox cluster and of the ParaHox cluster were results of intrachromosomal duplication, although they were initially thought to be interchromosomal.^[8]

Fisher Model

The Fisher Model was proposed in 1930 by Ronald Fisher. Under the Fisher Model, gene clusters are a result of two alleles working well with one another. In other words, gene clusters may exhibit co-adaptation.^[3] The Fisher Model was considered unlikely and later dismissed as an explanation for gene cluster formation.^[2]^[3]

Coregulation Model

Under the coregulation model, genes are organized into clusters, each consisting of a single promoter and a cluster of coding sequences, which are therefore co-regulated, showing coordinated gene expression.^[3] Coordinated gene expression was once considered to be the most common mechanism driving the formation of gene clusters.^[1] However coregulation and thus coordinated gene expression cannot drive the formation of gene clusters.^[3]

Molarity Model

The Molarity Model considers the constraints of cell size. Transcribing and translating genes together is beneficial to the cell.^[9] thus the formation of clustered genes generates a high local concentration of cytoplasmic protein products. Spatial segregation of protein products has been observed in bacteria; however, the Molarity Model does not consider co-transcription or distribution of genes found within an operon.^[2]

Gene clusters vs. tandem arrays

Tandem duplication is the process in which one gene is duplicated and the resulting copy is found adjacent to the original gene. Tandemly arrayed genes are formed as a result of tandem duplications.

Repeated genes can occur in two major patterns: gene clusters and tandem repeats, or formerly called tandemly arrayed genes. Although similar, gene clusters and tandemly arrayed genes may be distinguished from one another.

Gene Clusters

Gene clusters are found to be close to one another when observed on the same chromosome. They are dispersed randomly; however, gene clusters are normally within, at most, a few thousand bases of each other. The distance between each gene in the gene cluster can vary. The DNA found between each repeated gene in the gene cluster is non-conserved.^[10] Portions of the DNA sequence of a gene is found to be identical in genes contained in a gene cluster.^[5] Gene conversion is the only method in which gene clusters may become homogenized. Although the size of a gene cluster may vary, it rarely comprises more than 50 genes, making clusters stable in number. Gene clusters change over a long evolutionary time period, which does not result in genetic complexity.^[10]

Tandem arrays

Tandem arrays are a group of genes with the same or similar function that are repeated consecutively without space between each gene. The genes are organized in the same orientation.^[10] Unlike gene clusters, tandemly arrayed genes are found to consist of consecutive, identical repeats, separated only by a nontranscribed spacer region.^[11] While the genes contained in a gene cluster encode for similar proteins, identical proteins or functional RNAs are encoded by tandemly arrayed genes. Unequal recombination, which changes the number of repeats by placing duplicated genes next to the original gene. Unlike gene clusters, tandemly arrayed genes rapidly change in response to the needs of the environment, causing an increase in genetic complexity.^[11]

Gene conversion allows tandemly arrayed genes to become homogenized, or identical.^[11] Gene conversion may be allelic or ectopic. Allelic gene conversion occurs when one allele of a gene is converted to the other allele as a result of mismatch base pairing during meiosis homologous recombination.^[12] Ectopic gene conversion occurs when one homologous DNA sequence is replaced by another. Ectopic gene conversion is the driving force for concerted evolution of gene families.^[13]

Tandemly arrayed genes are essential to maintaining large gene families, such as ribosomal RNA. In the eukaryotic genome, tandemly arrayed genes make up ribosomal RNA. Tandemly repeated rRNAs are essential to maintain the RNA transcript. One RNA gene may not be able to provide a sufficient amount of RNA. In this situation, tandem repeats of the gene allow a sufficient amount of RNA to be provided. For example, human embryonic cells contain 5-10 million ribosomes and double in number within 24 hours. In order to provide a substantive amount of ribosomes, multiple RNA polymerases must consecutively transcribe multiple rRNA genes.^[11]

Types

Prokaryotic gene clusters

Gene clusters may be similar to an operon in which all genes are controlled by a single promoter and operator. All genes are transcribed simultaneously. In the case of bacterial operons, genes are transcribed as a polycistronic messenger RNA. Operon-like gene clusters are primarily, but not exclusively, formed by horizontal gene transfer in prokaryotes. This type of gene cluster has been observed in the bacterium Escherichia coli.^[14] The lac operon of Escherichia coli is the most well-studied operon-like gene cluster.^[15]

The lac operon is required for the metabolism of lactose in Escherichia coli as well as several other bacteria. It is composed of three genes: lacZ, lacY, and lacA. Each gene encodes for an enzyme that plays a role in lactose metabolism. LacZ encodes for β-galactosidase while lacY and lacA respectively encode for galactose permease and thiogalactoside transacetylase. One polycistronic mRNA is transcribed and produces multiple polypeptide chains from one mRNA. That is, one translation event results in three polypeptide chains, one for each gene of the lac operon^[16]

Eukaryotic gene clusters

Although operon-like gene clusters are more common in prokaryotes, they have been observed in the nematode Caenorhabditis elegans^[1] as well as the tunicate Ciona intestinalis.^[14] These eukaryotic organisms are thought to exhibit the most characteristics of a true operon.^[1] Eukaryotic operons were first discovered in 1993 while investigating the nematode Caenorhabditis elegans. These operons were found to produce polycistronic pre-mRNAs. The polycistronic mRNA is processed to produce a monocistronic mature mRNAs which will then form only one mature RNA. Primitive chordates have also exhibited these types of gene clusters.^[17]

Gene clusters have also been observed in eukaryotic organisms, such as yeast, fungi, insects, vertebrates, and plants. A variety of well-known gene clusters, such as the clusters DAL and GAL, are exhibited in yeast.^[1] Filamentous fungal gene clusters play a key role in the biosynthesis of primary or secondary metabolites.^[14] Metabolic pathway gene clusters vastly differ from the structure of operon-like gene clusters.^[1] In general, eukaryotic gene clusters greatly differ from prokaryotic gene clusters. While prokaryotic gene clusters are thought to form as a result of horizontal gene transfer, this mechanism is highly unlikely in eukaryotes. Despite the isolated observations of fungal gene clusters arising as a result of horizontal gene transfer the messenger RNA of eukaryotic gene clusters is transcribed as an independent, or monocistronic, messenger RNA.^[14]

While insects and plants are eukaryotic members, some of these organisms have exhibited gene clusters similar to bacterial operons in that they produce polycistronic pre-mRNAs that result in multiple polypeptides.^[17]

References

↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Yi, Gangman; Sing-Hoi Sze, Michael Thon (2007). "iIdentifying clusters in functionally related genes in genomes". Bioinformatics 23 (9): 1053–1060. doi:10.1093/bioinformatics/btl673.
↑ 2.0 2.1 2.2 Lawrence, Jeffrey (1999). "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes." (PDF). Current opinion in genetics and development 9 (6): 642–8. doi:10.1016/s0959-437x(99)00025-8. PMID 10607610.
↑ 3.0 3.1 3.2 3.3 3.4 Lawrence, Jeffrey; John Roth (1996). "Selfish Operons: Horizontal Transfer May Drive the Evolution of Gene Clusters". Genetics 143 (4): 1843–60. PMC 1207444. PMID 8844169.
↑ Susumu Ohno (1970). Evolution by gene duplication. Springer-Verlag. ISBN 0-04-575015-7.
↑ 5.0 5.1 5.2 Klug, William; Michael Cummings; Charlotte Spencer; Michael Pallodino (2009). "Chromosome Mutations: Variation in chromosome number and arrangement". In Beth Wilbur. Concepts of Genetics (9 ed.). San Francisco, CA: Pearson Benjamin Cumming. pp. 213–214. ISBN 9780321540980.
↑ Overbeek, Ross; M. Fonstein; M. D'Souza; G. Pusch; D. Maitsev (1999). "The Use of Gene Clusters to Infer Functional Coupling" (PDF). Proceedings of the National Academy of Sciences USA 96 (6): 2896–2901. doi:10.1073/pnas.96.6.2896. PMC 15866. PMID 10077608.
↑ 7.0 7.1 7.2 7.3 Garcia-Fernàndez, J. (2005). "Hox, ParaHox, ProtoHox: facts and guesses". Heredity 94 (2): 145–152. doi:10.1038/sj.hdy.6800621.
↑ 8.0 8.1 8.2 Garcia-Fernàndez, Jordi (2005). "The genesis and evolution of homeobox gene clusters". Nature Reviews Genetics 6: 881–892. doi:10.1038/nrg1723.
↑ Gomez, Manuel; Ildefonso Cases and Alfonso Valencia (2004). "Gene order in Prokaryotes: conservation and implications". In Miguel Vincent, Javier Tamames, Alfonso Valencia, Jesus Mingorance. Molecules in Time and Space: Bacterial Shape, Division, and Phylogeny. New York: Klumer Academic/Plenum Publishers. pp. 221–224. ISBN 0-306-48578-8.
↑ 10.0 10.1 10.2 Graham, Geoffrey (July 1995). "Tandem genes and clustered genes". Journal of Theoretical Biology 175 (1): 71–87. doi:10.1006/jtbi.1995.0122.
↑ 11.0 11.1 11.2 11.3 Lodish, Harvey; Arnold Berk; Chris Kaiser; Monty Krieger; Anthony Bretscher; Hidde Ploegh; Angelika Amon; Matthew Scott (2013). "Genes, Genomics, and Chromosomes". In Beth McHenry. Molecular Cell Biology (7 ed.). New York: W.H. Freeman Company. pp. 227–230. ISBN 9781429234139.
↑ Galtier, N.; G. Piganeau; D. Mouchiroud; L. Duret (2001). "GC-Content Evolution in Mammalian Genomes: the Biased Gene Conversion Hypothesis". Genetics 159 (2): 907–911.
↑ Duret, L.; N. Gaultier (2009). "Biased Gene Conversion and the Evolution of Mammalian Genomic Landscapes". Annual Review of Genomics and Human Genetics 10: 285–311. doi:10.1146/annurev-genom-082908-150001. PMID 19630562.
↑ 14.0 14.1 14.2 14.3 Boycheva, Svetlana; Laurent Daviet, Jean-Luc Wolfender, Teresa B. Fitzpatrick (2014). "The rise of operon-like gene clusters in plants". Trends in Plant Science. doi:10.1016/j.tplants.2014.01.013.
↑ Ralston, A (2008). "Operons and Prokaryotic Gene Regulation". Nature Education 1 (1): 216.
↑ Hames, David; Nigel Hooper (2000). "Section G- RNA synthesis and processing". Instant Notes in Biochemistry (2 ed.). Oxford, UK: BIOS Scientific Publishers Limited. pp. 173–174. ISBN 0-203-68108-8.
↑ 17.0 17.1 Blumenthal, Thomas (2004). "Operons in eukaryotes". Briefings in Functional Genomics and Proteomics 3 (3): 199–211. doi:10.1093/bfgp/3.3.199. PMID 15642184.