Bayesian average

From Wikipedia, the free encyclopedia

A Bayesian average is a method of calculating the mean of a data set where there is a known prior probability of the value being estimated. It is of particular value when calculating means of multiple differently sized data sets from a larger population.

[edit] Calculation

Calculating the Bayesian Average uses the prior mean \bar{m} and a constant C. C is assigned a value proportional to the typical data set size. The value is larger when the expected variation between data sets within the larger population is small and smaller when the data sets are expected to vary substantially from one another.

\bar{x} = {{C\bar{m} + \sum_{i=1}^n{x_i}} \over {(n + C)}}

In cases where the averages' relative values are the only result of importance, \bar{m} can be replaced with zero. C can be calculated based on the priors regarding variance between data sets, but in circumstances where that kind of rigor is desired, other more expressive measures of statistical power are likely to be used. As a result, C is usually assigned a value in an ad-hoc manner.

[edit] Example

The goal is to calculate the Bayesian average of the heights of various occupations of adult American men. In the larger population of adult American men, the average height is 176cm. A value of C is chosen as 10. For the purpose of this example, the occupations used will be "Basketball Players", "Actors" and "Students". For the basketball players, a group of 15 individuals is identified with an average height of 191cm among them. For the students, a group of 10 individuals is identified with an average height of 179cm. For the actors, only James Cromwell is available, for an average height of 201cm.

Group N Group mean Bayesian Average
Basketball players 15 191cm 185cm
Students 10 179cm 177.5cm
Actors 1 201cm 178cm

Here, the Bayesian average correctly reduces the effect of a single anomolously large value. Had the sample sizes for basketball players been similarly small, the Bayesian average would have mis-estimated basketball players as being far closer to average than is likely the case.