Rule of succession
From Wikipedia, the free encyclopedia
In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem.
The formula is still used, particularly to estimate underlying probabilities for events which have not been observed to occur at all in (finite) sample data. Assigning such events a zero probability would contravene Cromwell's rule, and is not justified by the evidence.
[edit] Statement of the rule of succession
Suppose p is uniformly distributed on the interval [0, 1]. Suppose X1, ..., Xn+1 are conditionally independent random variables given the value of p, and conditional on p are Bernoulli-distributed with expected value p, i.e., each has probability p of being equal to 1 and probability 1 − p of being equal to 0. Then
[edit] Mathematical details
The proportion p is treated as a uniformly distributed random variable. (Some who take an extreme Bayesian approach to applied probability insist that the word random should be banished altogether from probability theory, on the grounds of examples like this one. This proportion is not random, but uncertain. We assign a probability distribution to p to express our uncertainty, not to attribute randomness to p.)
Let Xi be the number of "successes" on the ith trial, with probability p of success on each trial. Thus each X is 0 or 1; each X has a Bernoulli distribution. Suppose these Xs are conditionally independent given p.
Bayes' theorem says that in order to get the conditional probability distribution of p given the data Xi, i = 1, ..., n, one multiplies the "prior" (i.e., marginal) probability measure assigned to p by the likelihood function
where s = x1 + ... + xn is the number of "successes" and n is of course the number of trials, and then normalizes, to get the "posterior" (i.e., conditional on the data) probability distribution of p. (We are using capital X to denote a random variable and lower-case x either as the dummy in the definition of a function or as the data actually observed.)
The prior probability density function is equal to 1 for 0 < p < 1 and equal to 0 for p < 0 or p > 1. To get the normalizing constant, we find
(see beta function for more on integrals of this form).
The posterior probability density function is therefore
This is a beta distribution with expected value
Since the conditional probability of tomorrow's sunrise, given the value of p, is just p, the law of total probability tell us that the probability of tomorrow's sunrise is just the expected value of p. Since all of this is conditional on the observed data Xi for i = 1, ..., n, we have