Coalescent theory

From Wikipedia, the free encyclopedia

In genetics, coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor. If the inheritance relationships are written in the form of a phylogenetic tree, termed a gene genealogy, the gene or allele of interest is said to undergo coalescence to the common ancestor (sometimes termed the coancestor to emphasize the coalescent relationship[1]). Basic coalescence theory assumes that genes do not undergo recombination and models genetic drift as a stochastic process known as a Markov process[2]. Because the process of gene fixation due to genetic drift is a crucial component of coalescence theory, it is most useful when the genetic locus under study is not under natural selection. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary model in population genetic analyses. The theory was developed by Sir John Kingman.

Contents

[edit] Theory

Consider two distinct haploid organisms who differ at a single nucleotide. By tracing the ancestry of these two individuals backwards there will be a point in time when the Most Recent Common Ancestor (MRCA) is encountered and the two lineages will have coalesced.

[edit] Probability of fixation

Under conditions of genetic drift alone, every finite set of genes or alleles has a "coalescent point" at which all descendants converge to a single ancestor (i.e. they 'coalesce'). This fact can be used to derive the rate of gene fixation of a neutral allele (that is, one not under any form of selection) for a population of varying size (provided that it is finite and nonzero). Because the effect of natural selection is stipulated to be negligible, the probability at any given time of an allele becoming fixed is just its frequency p in the population at that time. For a diploid population of size N and (neutral) mutation rate μ, the initial frequency of a novel mutation is simply \frac{1}{2N} and the number of new mutations per generation is 2Nμ. Since the fixation rate is the rate of novel neutral mutation multiplied by their probability of fixation, the overall fixation rate is 2N\mu \times \frac{1}{2N} = \mu. Thus the rate of fixation for a mutation not subject to selection is simply the rate of introduction of such mutations.

[edit] Time to coalescence

A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed.

The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parent. In a diploid population of constant size with 2N copies of each locus, there are 2N "potential parents" in the previous generation, so the probability that two alleles share a parent is \frac{1}{2N} and correspondingly, the probability that they do not coalesce is 1-\frac{1}{2N}.

At each successive preceding generation, the probability of coalescence is geometrically distributed - that is, it is the probability of noncoalescence at the t − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:

P_{c}(t) = \left( 1 - \frac{1}{2N} \right)^{t-1} \left(\frac{1}{2N}\right).

For sufficiently large values of t, this distribution is well approximated by the continuously defined exponential distribution

P_{c}(t) = \frac{1}{2N} e^{-\frac{t}{2N}}.

The standard exponential distribution has both the expectation value and the standard deviation equal to 2N - therefore, although the expected time to coalescence is 2N, actual coalescence times have a wide range of variation.

[edit] Neutral variation

Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift alone. This value is termed the mean heterozygosity, represented as \bar{H}. Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages: . Thus the mean heterozygosity is equal to

\bar{H} = \frac{2\mu}{2\mu + \frac{1}{2N}};
\bar{H} = \frac{4N\mu}{1+4N\mu}.

For 4N\mu \gg 1, the vast majority of allele pairs have at least one difference in nucleotide sequence.

[edit] Graphical representation

Coalescents can be visualised using dendograms which show the relationship of branches of the population to each other. The point where two branches meet indicates the Most Recent Common Ancestor (MRCA).

[edit] Applications

[edit] Phylogeny

Coalescent theory seeks to reconstruct the ancestral relationship of individuals and is therefore of great utility in reconstructing the phylogenetic relationships of species based on information at the molecular level.

[edit] Disease gene mapping

The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation, although the application of the theory is still in its infancy there are a number of researcher's who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory[3][4][5].

[edit] History

Coalescent theory is a natural extension the more classical population genetics concept of neutral evolution when, and can be considered as an approximation of the Fisher-Wright (or Wright-Fisher) model for large populations. It was 'discovered' independently by several researchers in the 1980's [6][7][8][9], but the definitive formalisation is attributed to Kingman [10][11]. Major contributions to the development of coalescent theory have been made by Peter Donnelly[12], Robert Griffiths, Richard R Hudson[13] and Simon Tavaré[14], this has included incorporating variations in population size[15] recombination and selection[16][17].

[edit] Software

A large body of software exists for simulating data sets under the coalescent process, and gradually software is emerging that allows the analysis of human genetics data for the mapping of disease susceptibility loci.

[edit] References and notes

[edit] Articles

  •  Browning, S.R. (2006) Multilocus association mapping using variable-length markov chains. [American Journal of Human Genetics 78:903-913]
  •  Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29:401-421
  •  Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots [Bioinformatics AOP]
  •  Hudson RR (1983a) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203-207 JSTOR copy
  •  Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23:183 - 201.
  •  Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1-44
  •  Hudson RR (2002) Generating samples under a Wright-Fisher neutral model. [Bioinformatics 18:337-338]
  • Hein, J. , Schierup, M., Wiuf C. (2004) Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Oxford University Press ISBN 978-0198529965
  •  Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection. Genetics 120:819-829
  •  Kingman, J.F.C. (1982) On the Genealogy of Large Populations. Journal of Applied Probability 19A:27-43 JSTOR copy
  •  Kingman, J.F.C. (2000) Origins of the coalescent 1974-1982. Genetics 156 1461-1463
  •  Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P.J.M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent Models BMC Bioinformatics 6:252
  •  Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies [American Journal of Human Genetics 70:686-707]
  •  Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selection Genetics 145 519-534
  •  Rosenberg, N.A., Nordburg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms. Nature Reviews Genetics 3:380-390
  •  Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable size Genetic Research 145:519-534
  •  Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations. Genetics 105:437-460
  •  Zöllner S. and Pritchard J.K. (2005) Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci [Genetics 169:1071–1092]

[edit] Books

  •   Dawkins R. (2004). The Ancestor's Tale: A Pilgrimage to the Dawn of Evolution. Houghton Mifflin: New York, NY.
  •   Rice SH. (2004). Evolutionary Theory: Mathematical and Conceptual Foundations. Sinauer Associates: Sunderland, MA. See esp. ch. 3 for detailed derivations.
  • Nordberg, M. (2001) Introduction to Coalecsent Theory Chapter 7 in Balding, D., Bishop, M., Cannings, C., editors, Handbook of Statistical Genetics. Wiley ISBN 978-0471860945
  • Wakeley J. (2006) An Introduction to Coalescent Theory Roberts & Co ISBN: 0-9747077-5-9 [Accompanying website with sample chapters]

[edit] External links

[edit] Software

Topics in population genetics
v  d  e
Key concepts: Hardy-Weinberg law | genetic linkage | linkage disequilibrium | Fisher's fundamental theorem | neutral theory
Selection: natural | sexual | artificial | ecological
Effects of selection on genomic variation: genetic hitchhiking | background selection
Genetic drift: small population size | population bottleneck | founder effect | coalescence
Founders: R.A. Fisher | J. B. S. Haldane | Sewall Wright
Related topics: evolution | microevolution | evolutionary game theory | fitness landscape | genetic genealogy
List of evolutionary biology topics