F-statistics

From Wikipedia, the free encyclopedia

This article is not about F-statistics as that term is understood in statistical inference, especially analysis of variance and linear regression. See F-test and F-distribution.

In population genetics, F-statistics (also known as fixation indices) describe the level of heterozygosity in a population; more specifically the degree of (usually) a reduction in homozygosity when compared to Hardy-Weinberg expectation. Such changes can be caused by the Wahlund effect, inbreeding, natural selection or any combination of these.

The concept of F-statistics was developed during the 1920s by the American geneticist Sewall Wright, who was interested in inbreeding in cattle. However, because complete dominance causes the phenotypes of homozygote dominants and heterozygotes to be the same, it was not until the advent of molecular genetics from the 1960s onwards that heterozygosity in populations could be measured.

1 Definitions and equations
2 Partition due to population structure
3 Fst
4 Effective population size
5 Path coefficients
6 External links
7 References

[edit] Definitions and equations

The measures F_IS, F_st, and F_IT are related to the amounts of heterozygosity at various levels of population structure. Together, they are called F-statistics, and are derived from F, the inbreeding coefficient. In a simple two-allele system with inbreeding, the genotypic frequencies are:

p 2 + F p q

for AA;

2 p q (1 - F)

for Aa; and

q 2 + F p q

for aa.

The value for F is found by solving the equation for F using heterozygotes in the above inbred population. This becomes one minus the observed heterozygosity in a population divided by the heterozygosity that would be expected from Hardy–Weinberg equilibrium:

$F = 1- \frac{\operatorname{O}(f(\mathbf{Aa}))} {\operatorname{E}(f(\mathbf{Aa}))} = 1- \frac{\operatorname{ObservedNumber}(\mathbf{Aa})} {n \operatorname{E}(f(\mathbf{Aa}))}, \!$

where the expected value from Hardy–Weinberg equilibrium is given by

$\operatorname{E}(f(\mathbf{Aa})) = 2\, p\, q\, \!$

where p and q are the allele frequencies of A and a, respectively. It is also the probability that at any locus, two alleles from the population are identical by descent.

For example, consider the data from E.B. Ford (1971) on the scarlet tiger moth:

**Table 1:**
Genotype	White-spotted (AA)	Intermediate (Aa)	Little spotting (aa)	Total
Number	1469	138	5	1612

From this, the allele frequencies can be calculated, and the expectation of f(AA) derived:

$p = {2 \times obs(AA) + obs(Aa) \over 2 \times (obs(AA) + obs (Aa) + obs(aa))} = 0.954$

$q = 1 - p = 0.046$

$F = 1- \frac{ obs(Aa) } { n*2pq } = 1- {138 \over 1612*2(0.954)(0.046)} = 0.023$

The different F-statistics look at different levels of population structure. F_IT is the inbreeding coefficient of an individual (I) relative to the total (T) population, as above; F_IS is the inbreeding coefficient of an individual (I) relative to the subpopulation (S), using the above for subpopulations and averaging them; and F_ST is the effect of subpopulations (S) compared to the total population (T), and is calculated by solving the equation:

(1 - F I S)(1 - F S T) = (1 - F I T),

as shown in the next section.

[edit] Partition due to population structure

F_IT can be partitioned into F_ST due to the Wahlund effect and F_IS due to inbreeding.

Consider a population that has a population structure of two levels; one from the individual (I) to the subpopulation (S) and one from the subpopulation to the total (T). Then the total F, known here as F_IT, can be partitioned into F_IS (or θ) and F_ST (or f):

$1 - F_{IT} = (1 - F_{IS})\,(1 - F_{ST}). \!$

This may be further partitioned for population substructure, and it expands according to the rules of binomial expansion, so that for I partitions:

$1 - F = \prod_{i=0}^{i=I} (1 - F_{i,i+1}) \!$

[edit] Fst

A reformulation of the definition of F would be the ratio of the average number of differences between pairs of chromosomes sampled within diploid individuals with the average number obtained when sampling chromosomes randomly from the population (excluding the grouping per individual). One can modify this definition and consider a grouping per sub-population instead of per individual. Population geneticists have used that idea to measure the degree of structure in a population.

Unfortunately, there is a large number of definitions for Fst, causing some confusion in the scientific literature. A common definition is the following:

$F_{ST} = \frac{\operatorname{var}(p)}{p\,(1 - p)} \!$