Probability interpretations

From Wikipedia, the free encyclopedia

The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we interpret the probability values of probability theory.

There are two broad categories of probability interpretations which can be called physical and evidential probabilities. Physical probabilities, which are also called objective or frequency probabilities, are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given type of event (such as the die yielding a six) tends to occur at a persistent rate, or relative frequency, in a long run of trials. Physical probabilities either are, or are invoked to explain, these stable frequencies. Thus talk about physical probability makes sense only when dealing with well defined random experiments. The two main kinds of theory of physical probability are frequentist accounts (such as those of Venn, Reichbach and von Mises) and propensity accounts (such as those of Popper, Miller, Giere and Fetzer).

Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent its subjective plausibility, or the degree to which the statement is supported by the available evidence. On most accounts, evidential probabilities are considered to be degrees of belief, defined in terms of dispositions to gamble at certain odds. The four main evidential interpretations are the classical (e.g. Laplace's) interpretation, the subjective interpretation (de Finetti and Savage), the epistemic interpretation (Ramsey, Cox) and the logical interpretation (Keynes and Carnap).

Some interpretations of probability are associated with approaches to statistical inference, including theories of estimation and hypothesis testing. The physical interpretation, for example, is taken by followers of "frequentist" statistical methods, such as R. A. Fisher, Jerzy Neyman and Egon Pearson. Statisticians of the opposing Bayesian school typically accept the existence and importance of physical probabilities, but also consider the calculation of evidential probabilities to be both valid and necessary in statistics. This article, however, focuses on the interpretations of probability rather than theories of statistical inference.

The terminology of this topic is rather confusing, in part due to the fact that probabilities are studied within so many different academic fields. The word "frequentist" is especially tricky. To philosophers it refers to a particular theory of physical probability, one that has more-or-less been abandoned. To scientists, on the other hand, "frequentist probability" is just what philosophers call physical (or objective) probability, and "frequentist statistics" is an approach to statistical inference that recognises only physical probabilities. Also the word "objective", as applied to probability, sometimes means exactly what "physical" means here, but is also used of evidential probabilities that are fixed by rational constraints, such as logical and epistemic probabilities.

These interpretations of probability are presented in more detail below.

1 Classical definition
2 Frequentism
3 Propensity
4 Subjectivism
5 Practical controversy
6 Axiomatic probability
7 See also
8 External links

[edit] Classical definition

Main article: Classical definition of probability

The first stab at mathematical rigour in the field of probability, championed by Pierre-Simon Laplace, was known as the classical definition. Developed from studies of games of chance (such as rolling dice) it states that probability is shared equally between all the possible outcomes^[1].

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.

– Pierre-Simon Laplace, A Philosophical Essay on Probabilities^[2]

The classical definition of probability works well for situations with only a finite number of equally-likely outcomes.

This can be represented mathematically as follows: If a random experiment can result in N mutually exclusive and equally likely outcomes and if N_A of these outcomes result in the occurrence of the event A, the probability of A is defined by $\Pr(A) = {N_A \over N}$ .

There are two clear limitations to the classical definition^[3]. Firstly, it is applicable only to situations in which there is only a finite number of possible outcomes. But some important random experiments, such as tossing a coin until it rises heads, give rise to a set of infinite outcomes. And secondly the condition that each possible outcome is equally likely renders the definition circular - since probability is used to define the idea of probability.

[edit] Frequentism

For frequentists, the probability of the ball landing in any pocket can be determined only by repeated trials in which the observed result converges to the underlying probability in the long run.

Main article: Frequency probability

Frequentists posit that the probability of an event is its relative frequency over time^[1], i.e., its relative frequency of occurrence after repeating a process a large number of times under similar conditions. This is also known as aleatory probability. The events are assumed to be governed by some random physical phenomena, which are either phenomena that are predictable, in principle, with sufficient information (see Determinism); or phenomena which are essentially unpredictable. Examples of the first kind include tossing dice or spinning a roulette wheel; an example of the second kind is radioactive decay. In the case of tossing a fair coin, frequentists say that the probability of getting a heads is 1/2, not because there are two equally likely outcomes but because repeated series of large numbers of trials demonstrate that the empirical frequency converges to the limit 1/2 as the number of trials goes to infinity.

If we denote by n_A the number of occurrences of an event A in n trials, then if $\lim_{n \to \infty}{n_a \over n}=P_A$ we say that $\Pr(A)=P_A$

The frequentist view has its own problems. It is of course impossible to actually perform an infinity of repetitions of a random experiment to determine the probability of an event. But if only a finite number of repetitions of the process are performed, different relative frequencies will appear in different series of trials. If these relative frequencies are to define the probability, the probability will be slightly different every time it is measured. But the real probability should be the same every time. If we acknowledge the fact that we only can measure a probability with some error of measurement attached, we still get into problems as the error of measurement only can be expresses as a probability, the very concept we are trying to define. This renders even the frequency definition circular.

[edit] Propensity

Propensity theorists think of probability as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a long run relative frequency of such an outcome^[4]. This kind of objective probability is sometimes called 'chance'.

Propensities, or chances, are not relative frequencies, but puported causes of the observed stable relative frequencies. Propensities are invoked to explain why repeating a certain kind of experiment will generate a given outcome type at a persistent rate. A central aspect of this explanation is the Law of large numbers. This law, which is a consequence of the axioms of probability, says that if (for example) a coin is tossed repeatedly many times, in such a way that its probability of landing heads is the same on each toss, and the outcomes are probabilistically independent, then the relative frequency of heads will (with high probability) be close to the probability of heads on each single toss. This law suggests that stable long-run frequencies are a manifestation of invariant single-case probabilities. Frequentists are unable to take this approach, since relative frequencies do not exist for single tosses of a coin, but only for large ensembles or collectives. Hence, these single-case probabilities are known as propensities or chances.

In addition to explaining the emergence of stable relative frequencies, the idea of propensity is motivated by the desire to make sense of single-case probability attributions in quantum mechanics, such as the probability of decay of a particular atom at a particular time.

The main challenge facing propensity theories is to say exactly what propensity means. (And then, of course, to show that propensity thus defined has the required properties.) At present, unfortunately, none of the well-recognised accounts of propensity comes close to meeting this challenge.

The first propensity theory, due to philosopher Karl Popper, noted that the outcome of a physical experiment is produced by a certain set of "generating conditions". When we repeat an experiment, as the saying goes, we really perform another experiment with a (more or less) similar set of generating conditions. To say that a set of generating conditions has propensity p of producing the outcome E means that those exact conditions, if repeated indefinitely, would produce an outcome sequence in which E occurred with limiting relative frequency p. For Popper then, a deterministic experiment would have propensity 0 or 1 for each outcome, since those generating conditions would have same outcome on each trial. In other words, non-trivial propensities (those that differ from 0 and 1) only exist for genuinely indeterministic experiments.

Popper's propensities, while they are not relative frequencies, are yet defined in terms of relative frequency. As a result, they face many of the serious problems that plague frequency theories. First, propensities cannot be empirically ascertained, on this account, since the limit of a sequence is a Tail event, and is thus independent of its finite initial segments. Seeing a coin land heads every time for the first million tosses, for example, tells one nothing about the limiting proportion of heads on Popper's view. Moreover, the use of relative frequency to define propensity assumes the existence of stable relative frequencies, so one cannot then use propensity to explain the existence of stable relative frequencies, via the Law of large numbers.

A number of other philosophers, including David Miller and Donald Gillies, have proposed propensity theories somewhat similar to Popper's, in that propensities are defined in terms of either long-run or infinitely long-run relative frequencies.

Other propensity theorists (e.g. Ronald Giere) do not explicitly define propensities at all, but rather see propensity as defined by the theoretical role it plays in science. They argue, for example, that physical magnitudes such as electrical charge cannot be explicitly defined either, in terms of more basic things, but only in terms of what they do (such as attracting and repelling other electrical charges). In a similar way, propensity is whatever fills the various roles that physical probability plays in science.

What roles does physical probability play in science? What are its properties? One central property of chance is that, when known, it constrains rational belief to take the same numerical value. David Lewis called this the Principal Principle, a term that philosophers have mostly adopted. For example, suppose you are certain that a particular biased coin has propensity 0.32 to land hands every time it is tossed. What is then the correct price for a gamble that pays $1 if the coin lands heads, and nothing otherwise? According to the Principal Principle, the fair price is 32 cents.

[more to come ...]

[edit] Subjectivism

Gambling odds reflect bookies' 'degree of belief' in the outcome.

Main article: Bayesian probability

Subjectivists, also known as Bayesians or followers of epistemic probability, give the notion of probability a subjective status by regarding it as a measure of the 'degree of belief' of the individual assessing the uncertainty of a particular situation. Subjective probability is sometimes called 'credence' (as opposed to the term 'chance' for a propensity probability).

Some examples of epistemic probability are to assign a probability to the proposition that a proposed law of physics is true, and to determine how "probable" it is that a suspect committed a crime, based on the evidence presented^{[citation needed]}.

Gambling odds don't reflect the bookies' belief in a likely winner, so much as the other bettors' belief, because the bettors are actually betting against one another. The odds are set based on how many people have bet on a possible winner, so that even if the high odds players always win, the bookie would always make his percentage anyway.

The use of Bayesian probability raises the philosophical debate as to whether it can contribute valid justifications of belief.

Bayesians point to the work of Ramsey and de Finetti as proving that subjective beliefs must follow the laws of probability if they are to be coherent.

The use of Bayesian probability involves specifying a prior probability. This may be obtained from consideration of whether the required prior probability is greater or lesser than a reference probability associated with an urn model or a thought experiment. The issue is that for a given problem, multiple thought experiments could apply, and choosing one is a matter of judgement: different people may assign different prior probabilities, known as the reference class problem. The "sunrise problem" provides an example.

[edit] Practical controversy

This difference in point of view has also many implications both for the methods by which statistics is practiced, and for the way in which conclusions are expressed. When comparing two hypotheses and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis at a particular significance level, and frequentists would all agree that the hypothesis should be rejected or not at that level of significance. Bayesian methods would suggest that one hypothesis was more probable than the other, but individual Bayesians might differ about which was the more probable and by how much, by virtue of having used different priors. Bayesians would argue that this is right and proper - if the issue is such that reasonable people can put forward different, but plausible, priors and the data are such that the likelihood does not swamp the prior, then the issue is not resolved unambiguously at the present stage of knowledge and Bayesian statistics highlights this fact. They would argue that any approach that purports to produce a single, definitive answer to the question at hand in these circumstances is obscuring the truth.

An alternative solution, is the eclectic view, which accepts both interpretations: depending on the situation, one selects one of the two interpretations for pragmatic, or principled, reasons.

[edit] Axiomatic probability

The mathematics of probability can be developed on an entirely axiomatic basis that is independent of any interpretation: see the articles on probability theory and probability axioms for a detailed treatment.