Pythagorean expectation

From Wikipedia, the free encyclopedia

Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. The term is derived from the formula's resemblance to Pythagoras' formula to compute the length of the hypotenuse of a triangle from the lengths of its other two sides.

The basic formula is:

$\mathrm{Win\%} = \frac{\mathrm{Runs Scored}^2}{\mathrm{Runs Scored}^2 + \mathrm{Runs Allowed}^2} = \frac{1}{1+(\mathrm{Runs Allowed}/\mathrm{Runs Scored})^2}$

where Win% is the winning percentage generated by the formula. You can then multiply this by the number of games played by a team (today, a season in the Major Leagues is 162 games) to compute how many wins one would expect them to win based on their runs scored and runs allowed.

1 Empirical origin
2 Use in basketball
3 Use in hockey
4 Statistical derivation
5 See also
6 External links

[edit] Empirical origin

Empirically, this formula correlates fairly well with how baseball teams actually perform, although an exponent of 1.81 is slightly more accurate. This correlation is one justification for using runs as a unit of measurement for player performance. Efforts have been made to find the ideal exponent for the formula, the most widely known being the pythagenport formula[1] (invented by Clay Davenport) 1.5 log((r + ra)/g) + 0.45 and the less well known but equally effective: ((r + ra)/g)^0.287, invented by David Smyth.

It is important to keep in mind that Pythagorean estimation is an empirical observation that correlates with winning percentage. It was not theoretically derived; there was no known reason why a team's winning percentage correlates to this formula, except that it does. However, Steven J. Miller provided a statistical derivation of the formula under some assumptions about how runs are distributed.

That being said, there are statistical deviations between actual winning percentage and expected winning percentage, which include a quality bullpen and luck. Teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won less), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).

[edit] Use in basketball

When noted basketball analyst Dean Oliver applied James' pythagorean theory to his own sport, the result was similar, except for the exponents:

$\mathrm{Win\%} = \frac{\mathrm{Points For}^{14}}{\mathrm{Points For}^{14} + \mathrm{Points Against}^{14}}.$

[edit] Use in hockey

[edit] Statistical derivation

A recent paper by Steven J. Miller models game runs of professional sports leagues as random variables of Weibull distribution, and arrives at the formula:

$\mathrm{Win\%} = \frac{\mathrm{Runs Scored}^a}{\mathrm{Runs Scored}^a + \mathrm{Runs Allowed}^a}$