The Evolution of Batting Metrics

Last week, I started off this year’s set of MLB write-ups with an introduction to cluster luck and the BaseRuns metric. This week, I thought it’d be best to expand on the offensive side of baseball and dive deeper into the evolution of batting metrics. The collective understanding of batting performance and efficiency has evolved over time and continues to see significant developments to this day. As a result, it can be difficult to determine which metrics are best for your own personal use when trying to model the offensive side of baseball. There is certainly no all-encompassing “right” answer when it comes to batting metrics, but there is certainly a great chance there is a “right” answer when it comes to finding metrics that tailor most to your own beliefs as to what should matter and how much it should matter.

Batting Average (AVG)

Batting Average is obviously the most widely-know batting metric there is, and is calculated by simply dividing total hits by total at bats. At best, batting average is a surface-level measurement of showing how often a player gets a hit (duh). The primary shortcomings of batting average are that it fails to quantify the plate appearances that don’t register as at bats (walks, sacrifice hits, etc.) and it fails to give any weight to the varying types of hits (a single and a home run are equally just one hit).

On-Base Percentage (OBP) and Slugging Percentage (SLG)

On-base percentage is exactly what is says it is and aims to address the first aforementioned shortcoming of batting average. OBP takes the batter’s instances of getting on-base (H + BB + HBP) and divides it by total plate appearances, and tells us the rate at which a batter gets on base. Slugging percentage aims to address the second shortcoming of batting average by weighing each type of hit by the number of bases a batter takes for each:
SLG = [ (1B) + (2B x 2) + (3B x 3) + (4 x HR) ] / AB
But although OBP and SLG each present a solution to one of the two major shortcomings of BA, they also each fail to address the remaining one. Which brings us to…

On-Base Plus Slugging (OPS)

OPS brings us the first official step into “sabermetrics” territory, first popularized in 1994 in The Hidden Game of Baseball by John Thorn and Pete Palmer. As the name suggests, OPS is calculated by adding OBP and SLG together, and aims to represent a player’s ability to get on base and hit for power. For your own reference, an OPS of ≥0.900 is typically considered to be a great mark to hit. One of the underlying problems with OPS lies more with the underlying problem with OPS, in that it linearly weighs the different types of hits according to the amount of bases they equate to.
A variant of OPS called OPS+ has since been developed that accounts for park factors and normalizes the stat across each of the two leagues (NL and AL). OPS+ is also scaled in a way where 100 is league average and each point of deviation above/below 100 equates to the percentage that the player is better/worse than league average. For example, a player with a 120 OPS+ is 20% better than league average whereas a player with a 85 OPS+ is 15% worse than league average.

Weighted On-Base Average (wOBA)

wOBA is a much more recently developed sabermetric, originally introduced in The Book in 2007.  The metric was actually developed and presented as an improvement from what OPS represented, as the authors felt OBP and SLG had significant overlap and that the on-base element of the statistic was being underrepresented. wOBA’s formula assigns “linear weights” that represent the average number of runs scored in a half-inning after such event occurs. These run value weights are then scaled to fit wOBA on the same scale as OBP (0.000 to 1.000). The formula for wOBA has evolved over time, with the first formula below being the original iteration and the following one being the Fangraphs version of the formula for the 2018 season.


Desipite the creators of wOBA hailing their newly-created sabermetric as superior to OPS in nearly every way, some research since has shown otherwise. In 2013, a professor from San Antonio College compared the predictive performance of OPS and wOBA and his results using the 2003-2012 seasons showed that OPS had a higher correlation to team run production rates than wOBA did. In 2018 Baseball Prospectus stepped in and conducted their own research, expanding the sample (1986-2016) and expanding the analysis to include a look at:

  • Descriptive performance: the correlation between the metric and same-year team runs/PA
  • Reliability performance: the correlation between the metric and itself in the following year
  • Predictive performance: the correlation between the metric and the following year’s runs/PA.

The findings essentially confirmed that OPS was superior to wOBA:

Runs Created (RC), Weighted Runs Created (wRC), and Weighted Runs Created Plus (wRC+)

The original Runs Created metric was created by Bill James and serves as an estimate of how many runs a player contributed to his team. Weighted Runs Created was an evolution of Bill James’ original work that incorporated the aforementioend wOBA into the formulation. The inherent problem with both itereations was that the stat was ultimately still just a counting stat, much like HRs or RBIs. Weighted Runs Created Plus (wRC+) did to Weighted Runs what OPS+ did to OPS, in that it took an otherwise context-less stat and turned it into a rate (while also controlling for park and league factors). Just like OPS+, a player with a wRC+ of 118 has contributed 18% more runs to his team than the league average player.

Wins Above Replacement Player (WAR or WARP)

The Runs Created trio above aren’t the only metric that tries to serve as an estimator of a player’s contributions to his teams offensive production. WAR is the number of wins a player has incrementally added to his team above the amount of expected wins if that player were to be replaced with a replacement level player. WAR as a whole incorporates batting, baserunning, and defense for position players, but the batting element of WAR can be singled out and is often represented as bWARP. The various baseball analytics sites (Baseball Prospectus, Baseball Reference, FanGraphs, etc.) have different formulations for WAR, so you may see some varying WAR figures.

Batting Average on Balls in Play (BABIP)

BABIP is another self-explanatory metric, as it essentially shows how many of a batter’s batted balls go for hits and outs. However, BABIP is much different than any of the stats discussed thus far. The primary purpose of BABIP is to serve as a potential warning that a batter is possibly performing above (high BABIP) or below expectation (low BABIP). In essence, a batter with a BABIP with significant deviation from the “normal” .300 typically signals that the batter is due for regression towards the mean. That regression expectation can be applied within a season or from year to year. To show BABIP in action, here is a chart showing the BABIP leaders from 2017 (minimum 250 plate appearances) and their performance in 2018. Significant increases in performances are highlighted in green, significant decreases are highlighted in red, and relatively similar performances are left uncolored:


BaseRuns (BsR)

BABIP begins to lean into the world of expectation and the “should’ve could’ve would’ve” world of sabermetrics, and that is arguably the largest development to date in the space. We covered BaseRuns quite in depth last week but as a refresher, BaseRuns is a metric that was originally developed by David Smyth and aims to estimate how many runs a team should have scored over the course of a season. Since we played with BaseRuns a lot in the previous write-up, I won’t expand any further in this write-up.

Deserved Runs Created Plus (DRC+)

Deserved Runs Created Plus is the newest development in batting sabermetrics, having just been introduced by the Baseball Prospectus team this past December. I could try my best to explain the premise of DRC+, but Baseball Prospectus did exactly that in their aptly titled “Why DRC+?” article that accompanied the introduction of the metric:

“Why another batting metric? Because existing batting metrics (including ours) have two serious problems: (1) they purport to offer summaries of player contributions, when in fact they merely average play outcomes in which the players participated; and (2) they treat all outcomes, whether it be a walk or a single, as equally likely to be driven by the player’s skill, even though no one believes that is actually true. DRC+ addresses the first problem by rejecting the assumption that play outcomes automatically equal player contributions, and forces players to demonstrate a consistent ability to generate those outcomes over time to get full credit for them. DRC+ addresses the second problem by recognizing that certain outcomes (walks, strikeouts) are more attributable to player skill than others (singles, triples).”

Like the rest of the “plus” metrics, DRC+ is scaled in a way where 100 is league average and deviations above/below signal how much better/worse a player is in terms of percentage. DRC+ was met with some criticisms following its unveiling, and the Baseball Prospectus team has since updated the metric in response. The team also did extensive research to compare the updated DRC+’s descriptive, reliability, and predictive performance compared to OPS+, wRC+, and the original DRC+:


Bringing It All Together

For your reference, above is a chart showing how many runs each team scored in 2018 as well as how they performend in each of the batting metrics we covered with the exception of WAR (too many variations to choose from) and BABIP (not really purposeful for this chart). It’s easy to see how our starting point of batting average relatively fails to be an accurate measure of batting production and/or efficiency, as team performance in that category has the lowest correlation to runs scored than any other metric represented. More importantly, this chart illustrates that most batting metrics will give you a similar general idea of a team’s batting ability, showing that each metric typically has some value when it comes to evaluating batting performance.

And that is essentially the abridged version of the evolution of batting metrics. Hopefully this gives you a better understanding of what it is that you’re exactly looking at the next time you pull up a stats page on Baseball Prospectus or FanGraphs. As always, if you have any questions about the topics covered in this write-up you are more than encouraged to reach out to me via Twitter.

Until next time.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.