Base Runs

From Wikipedia, the free encyclopedia

Base Runs (BsR) is a baseball statistic invented by sabermetrician David Smyth to estimate the number of runs a team "should" have scored given their component offensive statistics, as well as the number of runs a hitter/pitcher creates/allows. It measures essentially the same thing as Bill James' Runs Created, but as sabermetrician Tom M. Tango points out, BaseRuns models the reality of the run-scoring process significantly better than any other "run estimator".

Contents

[edit] Purpose and Formulae

From Smyth's BaseRuns Primer (no longer available on the internet):

"In the late 1990s, I published Base Runs articles on J Fraser's site and on rsbba. Neither received much attention, and both articles are probably no longer available. The formula has become somewhat better-known as a result of Tangotiger's series of Baseball Primer articles on run creation.

Some interested people have suggested that Base Runs is not user-friendly enough-- there are no 'official' versions, no source for a brief overview, and too many minor differences in the component weights in basic and advanced versions. So I figured I'd try to 'set in stone' as much of the formula as possible, and keep the weights as simple as possible. I believe that the metric is robust enough to handle this.

First, a few words on the 'model'. Some people don't care about models; they only care whether one formula has an RMSE of 21.4 vs another formula which has an RMSE of 22.4. The problem is that once you step outside the bounds of those MLB teams, you're in trouble. Not so with Base Runs, which seems to work well over the entire range of run production, from OBA of 0 to OBA of 1.000. It therefore seems reasonable to conclude that Base Runs, despite its relative simplicity, does a good job of modeling the scoring process. In short, it is an effective 'simulation of a simulation'.

And the model is simply: runs equals baserunners times the proportion of baserunners who score, plus home runs. This statement is so obviously true that some people have called it an 'identity' instead of an equation or theory. An identity is something like A=A. The only difference is that you can't build a successful run formula from A=A. There are possibly other such 'true statements' which could also serve as foundations. Having tried to do this, however, I'm pretty sure that the math would be more involved than the simple add, subtract, multiply, and divide of Base Runs.

Following the coding of Runs Created, we have the A component (baserunners), the B component (advancement), the C component (outs), and the D component (homers). The proportion of runners who score is given by B/(B+C). The entire equation is therefore:

Base Runs = A*B/(B+C) +D.


Here is the Basic version:

A = H + BB - HR

B = [1.4*TB -.6*H -3*HR +.1*BB] *1.02

C = AB - H

D = HR

Here is a version which includes some minor categories in which certain players excel:

A = H + BB + HBP - HR - .5*IBB

B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] *1.1

C = AB - H + CS + GDP

D = HR

And here is a version for pitchers:

A = H + BB - HR

B = [1.4*TBe -.6*H -3*HR +.1*BB] * 1.1

C = 3*IP

D = HR

where TBe = Total Base estimate = 1.12*H + 4*HR

If you want to tailor a version to a particular dataset (such as 1993-2004, or the 1975 AL), all you have to do is determine the overall B multiplier." The B multiplier, sometimes referred to as the variable X, is found by setting X = ((LgRuns - D) * C) / B / (A - (LgRuns - D)).

[edit] Individual Base Runs

It is important to note that the formulas above are intended to estimate runs scored and allowed by teams, not individual players. Applying them to individuals will result in the same problems one encounters with Bill James' Runs Created formula -- good hitters will see their BsR artificially inflated because of the multiplicative nature of the formula.

To calculate individual BaseRuns, one must calculate team BaseRuns with and without the player's statistics included in team totals; the difference between the two will be the number of BaseRuns that the individual player has contributed to the team.

[edit] Advantages of Base Runs

Base Runs was primarily designed to provide an accurate model of the run scoring process at the Major League Baseball level, and it accomplishes that goal very well: in recent seasons, BsR has the lowest RMSE of any of the major run estimation methods. But in addition, Base Runs can claim something no other run estimator can -- its accuracy holds up in even the most extreme of circumstances and/or leagues. For instance, when a solo home run is hit, Base Runs will correctly predict one run having been scored by the batting team. By contrast, when Runs Created assesses a solo HR, it predicts 4 runs to be scored; likewise, most linear weights-based formulas will predict a number close to 1.4 runs having been scored on a solo HR. This is because each of these models were developed to fit the sample of a 162-game MLB season; they work well when applied to that sample, of course, but are woefully inaccurate when taken out of the environment for which they were designed. Base Runs, on the other hand, can be applied to any sample at any level of baseball (provided you can calculate the B multiplier), because it models the way the game of baseball operates, and not just for a 162-game season at the highest professional level. This means Base Runs can be applied to high school or even Little League statistics.

[edit] See also

[edit] External links