Galton–Watson process

Galton–Watson survival probabilities for different exponential rates of population growth, if the number of children of each parent node can be assumed to follow a Poisson distribution. For λ ≤ 1, eventual extinction will occur with probability 1. But the probability of survival of a new type may be quite low even if λ > 1 and the population as a whole is experiencing quite strong exponential increase.

The Galton–Watson process is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of family names. The process models family names as patrilineal (passed from father to son), while offspring are randomly either male or female, and names become extinct if the family name line dies out (holders of the family name die without male descendants). This is an accurate description of Y chromosome transmission in genetics, and the model is thus useful for understanding human Y-chromosome DNA haplogroups, and is also of use in understanding other processes (as described below); but its application to actual extinction of family names is fraught. In practice, family names change for many other reasons, and dying out of name line is only one factor, as discussed in examples, below; the Galton–Watson process is thus of limited applicability in understanding actual family name distributions.

There was concern amongst the Victorians that aristocratic surnames were becoming extinct. Galton originally posed the question regarding the probability of such an event in an 1873 issue of The Educational Times,^[1] and the Reverend Henry William Watson replied with a solution.^[2] Together, they then wrote an 1874 paper entitled "On the probability of the extinction of families" in the Journal of the Anthropological Institute of Great Britain and Ireland (now the Journal of the Royal Anthropological Institute).^[3] Galton and Watson appear to have derived their process independently of the earlier work by I. J. Bienaymé; see Heyde and Seneta 1977. For a detailed history see Kendall (1966 and 1975).

Concepts

Assume, for the sake of the model, that surnames are passed on to all male children by their father. Suppose the number of a man's sons to be a random variable distributed on the set { 0, 1, 2, 3, ... }. Further suppose the numbers of different men's sons to be independent random variables, all having the same distribution.

Then the simplest substantial mathematical conclusion is that if the average number of a man's sons is 1 or less, then their surname will almost surely die out, and if it is more than 1, then there is more than zero probability that it will survive for any given number of generations.

Modern applications include the survival probabilities for a new mutant gene, or the initiation of a nuclear chain reaction, or the dynamics of disease outbreaks in their first generations of spread, or the chances of extinction of small population of organisms; as well as explaining (perhaps closest to Galton's original interest) why only a handful of males in the deep past of humanity now have any surviving male-line descendants, reflected in a rather small number of distinctive human Y-chromosome DNA haplogroups.

A corollary of high extinction probabilities is that if a lineage has survived, it is likely to have experienced, purely by chance, an unusually high growth rate in its early generations at least when compared to the rest of the population.

Mathematical definition

A Galton–Watson process is a stochastic process {X_n} which evolves according to the recurrence formula X₀ = 1 and

X_{n+1}=\sum _{j=1}^{X_{n}}\xi _{j}^{(n)}

where $\{\xi _{j}^{(n)}:n,j\in \mathbb {N} \}$ is a set of IID natural number-valued random variables.

In the analogy with family names, X_n can be thought of as the number of descendants (along the male line) in the nth generation, and $\xi _{j}^{(n)}$ can be thought of as the number of (male) children of the jth of these descendants. The recurrence relation states that the number of descendants in the n+1st generation is the sum, over all nth generation descendants, of the number of children of that descendant.

The extinction probability (i.e. the probability of final extinction) is given by

\lim _{n\to \infty }\Pr(X_{n}=0).\,

This is clearly equal to zero if each member of the population has exactly one descendant. Excluding this case (usually called the trivial case) there exists a simple necessary and sufficient condition, which is given in the next section.

Extinction criterion for Galton–Watson process

In the non-trivial case the probability of final extinction is equal to one if E{ξ₁} ≤ 1 and strictly less than one if E{ξ₁} > 1.

The process can be treated analytically using the method of probability generating functions.

If the number of children ξ _j at each node follows a Poisson distribution with parameter λ, a particularly simple recurrence can be found for the total extinction probability x_n for a process starting with a single individual at time n = 0:

x_{n+1}=e^{\lambda (x_{n}-1)},\,

giving the above curves.

Bisexual Galton–Watson process

In the classical Galton–Watson process described above, only men are considered, effectively modeling reproduction as asexual. A model more closely following actual sexual reproduction is the so-called "bisexual Galton–Watson process", where only couples reproduce. (Bisexual in this context refers to the number of sexes involved, not sexual orientation.) In this process, each child is supposed as male or female, independently of each other, with a specified probability, and a so-called "mating function" determines how many couples will form in a given generation. As before, reproduction of different couples are considered to be independent of each other. Now the analogue of the trivial case corresponds to the case of each male and female reproducing in exactly one couple, having one male and one female descendant, and that the mating function takes the value of the minimum of the number of males and females (which are then the same from the next generation onwards).

Since the total reproduction within a generation depends now strongly on the mating function, there exists in general no simple necessary and sufficient condition for final extinction as it is the case in the classical Galton–Watson process. However, excluding the non-trivial case, the concept of the averaged reproduction mean (Bruss (1984)) allows for a general sufficient condition for final extinction, treated in the next section.

Extinction criterion

If in the non-trivial case the averaged reproduction mean per couple stays bounded over all generations and will not exceed 1 for a sufficiently large population size, then the probability of final extinction is always 1.

Examples

Citing historical examples of Galton–Watson process is complicated due to the history of family names often deviating significantly from the theoretical model. Notably, new names can be created, existing names can be changed over a person's lifetime, and people historically have often assumed names of unrelated persons, particularly nobility. Thus, a small number of family names at present is not in itself evidence for names having become extinct over time, or that they did so due to dying out of family name lines – that requires that there were more names in the past and that they die out due to the line dying out, rather than the name changing for other reasons, such as vassals assuming the name of their lord.

Chinese names are a well-studied example of surname extinction: there are currently only about 3,100 surnames in use in China, compared with close to 12,000 recorded in the past,^[4]^[5] with 22% of the population sharing the names Li, Wang and Zhang (numbering close to 300 million people), and the top 200 names covering 96% of the population. Names have changed or become extinct for various reasons such as people taking the names of their rulers, orthographic simplifications, taboos against using characters from an emperor's name, among others.^[5] While family name lines dying out may be a factor in the surname extinction, it is by no means the only or even a significant factor. Indeed, the most significant factor affecting the surname frequency is other ethnic groups identifying as Han and adopting Han names.^[5] Further, while new names have arisen for various reasons, this has been outweighed by old names disappearing.^[5]

By contrast, some nations have adopted family names only recently. This means both that they have not experienced surname extinction for an extended period, and that the names were adopted when the nation had a relatively large population, rather than the smaller populations of ancient times.^[5] Further, these names have often been chosen creatively and are very diverse. Examples include:

Japanese names, which in general use date only to the Meiji restoration in the late 19th century (when the population was over 30,000,000), have over 100,000 family names, surnames are very varied, and the government restricts married couples to using the same surname.
Many Dutch names have included a family name only since the Napoleonic Wars in the early 19th century, and there are over 68,000 Dutch family names.
Thai names have included a family name only since 1920, and only a single family can use a given family name, hence there are a great number of Thai names. Furthermore, Thai people change their family names with some frequency, complicating the analysis.

On the other hand, some examples of high concentration of family names is not primarily due to the Galton–Watson process:

Vietnamese names have about 100 family names, and 60% of the population sharing three family names. The name Nguyễn alone is estimated to be used by almost 40% of the Vietnamese population, and 90% share 15 names. However, as the history of the Nguyễn name makes clear, this is in no small part due to names being forced on people or adopted for reasons unrelated to genetic relation.

References

↑ Francis Galton, Problem 4001, Educational Times 25(143) p.300 (March 1, 1873)
↑ H.W. Watson, Problem 4001, Educational Times 26(148) p.115 (August 1, 1873)
(A first offering submitted by G.S. Carr, Educational Times 26(144) p.17 (April 1, 1873), according to Galton was "totally erroneous")
↑ Galton, F., & Watson, H. W. (1875). On the probability of the extinction of families. Journal of the Royal Anthropological Institute, 4, 138–144.
↑ "O rare John Smith", The Economist (US ed.): 32, June 3, 1995, Only 3,100 surnames are now in use in China [...] compared with nearly 12,000 in the past. An 'evolutionary dwindling' of surnames is common to all societies. [...] [B]ut in China, [Du] says, where surnames have been in use far longer than in most other places, the paucity has become acute.
1 2 3 4 5 Du, Ruofu; Yida, Yuan; Hwang, Juliana; Mountain, Joanna L.; Cavalli-Sforza, L. Luca (1992), Chinese Surnames and the Genetic Differences between North and South China (PDF), Journal of Chinese Linguistics Monograph Series (5), pp. 18–22 (History of Chinese surnames and sources of data for the present research), archived from the original (PDF) on 2015-09-11, also part of Morrison Institute for Population and Resource Studies Working papers

F. Thomas Bruss (1984). "A Note on Extinction Criteria for Bisexual Galton–Watson Processes". Journal of Applied Probability 21: 915–919.
C C Heyde and E Seneta (1977). I.J. Bienayme: Statistical Theory Anticipated. Berlin, Germany.
Kendall, D. G. (1966). "Branching Processes Since 1873". Journal of the London Mathematical Society. s1-41: 385–406. ISSN 0024-6107. doi:10.1112/jlms/s1-41.1.385.
Kendall, D. G. (1975). "The Genealogy of Genealogy Branching Processes before (and after) 1873". Bulletin of the London Mathematical Society. 7 (3): 225–253. ISSN 0024-6093. doi:10.1112/blms/7.3.225.

External links

"Survival of a Single Mutant" by Peter M. Lee of the University of York

Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Bessel process Birth–death process Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Empirical process Feller process Fleming–Viot process Gamma process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Point process Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Galves–Löcherbach model Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Black–Derman–Toy Black–Karasinski Black–Scholes Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem
Inequalities	Burkholder–Davis–Gundy Doob's martingale Kunita–Watanabe
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.