PECOTA

From Wikipedia, the free encyclopedia

PECOTA, an acronym for Player Empirical Comparison and Optimization Test Algorithm, is a sabermetric system for predicting Major League Baseball player performance.[1] It was invented by Nate Silver of Baseball Prospectus. It relies on fitting a given player's past performance statistics to the performance of "comparable" Major League ballplayers by means of similarity scores. Although drawing on the underlying concept of Bill James' similarity scores, PECOTA calculates these scores in a distinct way that leads to a very different set of "comparables" than James' method.[2] Separate sets of predictions are developed for hitters and pitchers. The comparable players are drawn from a database of all major league player-seasons since 1946. The raw statistics in this database are first adjusted to take into account park effects and the era in which a player played.

PECOTA also draws on Clay Davenport's translations (the so-called Davenport Translations or DT's) of minor league and international baseball statistics to estimate the major league equivalent performance of each player.[3] In this way, PECOTA is able to make projections for more than 1,600 players each year, including many players with little or no prior major league experience.

Unlike performance forecasts that commonly assume a single pattern of change during a player's career, PECOTA employs several models that take into account not just a player's performance in the previous three years but also his age, speed, handedness, and body type (basically, body mass index). Furthermore, instead of focusing on making point estimates of a player's future performance (such as batting average, home runs, and strike-outs), PECOTA relies on the historical performance of the given player's historical "comparables" to produce a probability distribution of the given player's predicted performance during the next five years.

First introduced in 2003, PECOTA projections are produced each year and published both in the Baseball Prospectus annual monographs and on the BaseballProspectus.com website.[16] PECOTA has undergone several improvements since 2003. The 2006 version introduced metrics for the market valuation of players based on the predicted performance levels. The 2007 version introduces adjustments for league effects, to account for differences in the competitive environment of the two major leagues.[4].

The logic and methodology underlying PECOTA have been described in several publications (see References), but the detailed formulas are proprietary and have not been shared with the baseball research community. The test of PECOTA is its ability to make accurate forecasts in comparison with alternative forecasting methods. A comparison for the 2006 season shows that PECOTA outperformed several other forecasting systems in predicting hitting (OPS) and performed nearly as well as the best of the other systems in predicting pitching (ERA).[5]

Although designed primarily for predicting individual player performance, PECOTA has been applied also to predicting team performance. For this purpose, projected team rosters are established with projected playing times for each team member, drawing on the expert advice of the Baseball Prospectus staff. The number of runs a team will score and allow during the coming season is estimated based on the playing times and PECOTA's predicted individual performance of each player, using a "Marginal Lineup Value" algorithm created by David Tate and further developed by Keith Woolner.[6] A team's expected wins is based on applying an improved version of Bill James' Pythagorean Formula to the estimated number of runs scored and allowed by the roster of players under the given playing-time assumptions.[7] PECOTA has been used in preseason forecasts of how many wins teams will attain and in mid-season simulations of the number of wins each team will attain and its odds of reaching the playoffs.[8] In 2006, PECOTA's preseason forecasts compared favorably to other forecasting systems (including Las Vegas betting line odds) in predicting the number of wins teams would earn during the season.[9]

[edit] Notes

  1. ^ The acronym was actually based on the name of journeyman major league player Bill Pecota,[1] who with a lifetime batting average of .249 is perhaps representative of the typical PECOTA entry.
  2. ^ This difference is explained and illustrated in Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507-514. Elsewhere, Silver also describes the following distinct feature: "The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance. Thus, we might look at what a pitcher did from ages 35-37, and compare that against the most similar age 35-37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at baseball-reference.com or in other places, which attempt to evaluate the totality of a player’s career up to a given age."[2]. Also see Baseball Prospectus' glossary entry for "Comparable Players"[3].
  3. ^ See Clay Davenport, "DT's vs. MLEs — A Validation Study," BaseballProspectus.com, January 30, 1998[4]; Clay Davenport, "Winter and Fall League Translations: Just How Good Are These Leagues, Anyway?," BaseballProspectus.com, January 27, 2004[5]; and Clay Davenport, "Over There! A Second Review of Translating Japanese Statistics, and Translating the Mexican League," Baseball Prospectus 2004 (New York: Workman, 2004): 585-590.
  4. ^ "Baseball Prospectus Chat: Nate Silver," BaseballProspectus.com, January 19, 2007.[6]
  5. ^ Dan Szymborski, "2006 Projections," BaseballThinkFactory.com (December 14, 2006).[7]
  6. ^ Keith Woolner, "Marginal Lineup Value," StatHead.com[8].
  7. ^ On the Pythagenport formula, see Clay Davenport and Keith Woolner, "Revisiting the Pythagorean Theorem: Putting Bill James' Pythagorean Theorem To the Test," BaseballProspectus.com, June 30, 1999[9] as well as the Baseball Prospectus glossary entry for "Pythagenport"[10]. On the construction of the depth charts for each team and the application of PECOTA to estimating team wins, see Nate Silver, "PECOTA Projects the American League," BaseballProspectus.com, March 21, 2005[11]; and Nate Silver, "PECOTA Breaks Hearts," BaseballProspectus.com, March 29, 2006[12].
  8. ^ See Clay Davenport, "Playoff Odds Report: The Addition of PECOTA," BaseballProspectus.com, May 3, 2006[13] and Baseball Prospectus Statistics[14].
  9. ^ Nate Silver, "Projection Reflection," BaseballProspectus.com, October 11, 2006[15].

[edit] References

  • William Hageman, "Baseball by the Numbers," Chicago Tribune, January 4, 2006.
  • Jonah Keri, "'Tis the Season to Project Stats," ESPN.com, February 14, 2007[17].
  • Rich Lederer, "An Unfiltered Interview with Nate Silver," BaseballAnalysts.com, February 12, 2007[18].
  • Alan Schwarz, "Numbers Suggest Mets Are Gambling on Zambrano," New York Times, August 22, 2004.
  • Alan Schwarz, "Predicting Futures in Baseball, and the Downside of Damon," New York Times, November 13, 2005.
  • Nate Silver, "The Science of Forecasting," BaseballProspectus.com, March 11, 2004[19].
  • Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507-514.
  • Nate Silver, "PECOTA Takes on the Field: How'd It Fare Against Six Other Projections Systems?" BaseballProspectus.com, January 16, 2004[20].
  • Nate Silver, "PECOTA 2004: A Look Back and a Look Ahead," Baseball Prospectus 2004 (New York: Workman Publishers, 2004): 5-10.
  • Nate Silver, "Rearranging PECOTA," Baseball Prospectus 2006 (New York: Workman Publishers, 2006): 6-11.
  • Nate Silver, "Why Was Kevin Maas a Bust?" Baseball Between the Numbers, Ed. Jonah Keri (New York: Basic Books, 2006): 253-271.
  • Childs Walker, "Baseball Prospectus Makes Predicting Future Thing of Past," Baltimore Sun, February 21, 2006.