Pythagorean expectation

From Wikipedia, the free encyclopedia

Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. The term is derived from the formula's resemblance to Pythagoras' formula to compute the length of the hypotenuse of a triangle from the lengths of its other two sides.

The basic formula is:

$\mathrm{Win\%} = \frac{\mathrm{Runs Scored}^2}{\mathrm{Runs Scored}^2 + \mathrm{Runs Allowed}^2} = \frac{1}{1+(\mathrm{Runs Allowed}/\mathrm{Runs Scored})^2}$

where Win% is the winning percentage generated by the formula. You can then multiply this by the number of games played by a team (today, a season in the Major Leagues is 162 games) to compute how many wins one would expect them to win based on their runs scored and runs allowed.

1 Empirical origin
2 Use in basketball
3 See also
4 External links

[edit] Empirical origin

Empirically, this formula correlates fairly well with how baseball teams actually perform, although an exponent of 1.81 is slightly more accurate. This correlation is one justification for using runs as a unit of measurement for player performance. Efforts have been made to find the ideal exponent for the formula, the most widely known being the Pythagenport formula[1] developed by Clay Davenport of Baseball Prospectus (1.5 log((r + ra)/g) + 0.45) and the less well known but equally effective Pythagenpat formula ((r + ra)/g)^0.287), developed by David Smyth.[2] Indeed, Davenport has endorsed the Smyth/Patriot or pythagenpat formula as "a better fit to the data. In that, X = ((rs + ra)/g)^0.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg."[3]

However, these formulas are only useful when dealing with extreme situations in which the average amount of runs scored per game is either very high or very low. For most situations, simply squaring each variable yields accurate results.

It is important to keep in mind that Pythagorean estimation is an empirical observation that correlates with winning percentage. It was not theoretically derived; there was no known reason why a team's winning percentage correlates to this formula, except that it does. However, Steven J. Miller provided a statistical derivation of the formula under some assumptions about how runs are distributed.

That said, there are statistical deviations between actual winning percentage and expected winning percentage, which include a quality bullpen and luck. Teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won less), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).