Algebraic statistics

From Wikipedia, the free encyclopedia

Algebraic statistics is a fairly recent field of statistics which utilizes the tools of algebraic geometry and commutative algebra in order to study problems related to discrete random variables with finite state spaces. Such problems include parameter estimation, hypothesis testing, and experimental design. The key connection between statistics and algebra is the observations that many commonly used classes of discrete random variables can be viewed as algebraic varieties.

[edit] Introductory example

Consider a random variable X which can take on the values 0, 1, 2. Such a variable is completely characterized by the three probabilities

p_i=\mathrm{Pr}(X=i),\quad i=0,1,2

and these numbers clearly satisfy

\sum_{i=0}^2 p_i = 1 \quad \mbox{and}\quad 0\leq p_i \leq 1.

Conversely, any three such numbers unambiguously specify a random variable, so we can identify the random variable X with the tuple (p0,p1,p2)∈R3.

Now suppose X is a Bernoulli random variable with parameter q, i.e. X represents the number of successes when repeating a certain experiment two times, where each experiment has an individual success probability of q. Then

p_i=\mathrm{Pr}(X=i)={2 \choose i}q^i (1-q)^{2-i}

and it is not hard to show that the tuples (p0,p1,p2) which arise in this way are precisely the ones satisfying

4 p_0 p_2-p_1^2=0.\

The latter is a polynomial equation defining an algebraic variety (or surface) in R3, and this variety, when intersected with the simplex given by

\sum_{i=0}^2 p_i = 1 \quad \mbox{and}\quad 0\leq p_i \leq 1,

yields a piece of an algebraic curve which may be identified with the set of all 3-state Bernoulli variables. Determining the parameter q amounts to locating one point on this curve; testing the hypothesis that a given variable X is Bernoulli amounts to testing whether a certain point lies on that curve or not.

[edit] References

  • Algebraic Statistics Short Course, lecture notes by Seth Sullivant
  • L. Pachter and B. Sturmfels. Algebraic Statistics and Computational Biology. Cambridge University Press 2005.
  • G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics. CRC Press, 2001.