It is officially time for us to set our eyes on Major League Baseball. From now until the end of the season, I will be providing a weekly baseball write-up (hopefully) every Wednesday. Between now and Opening Day, I thought it’d be best to cover some introductory (yet still very comprehensive and higher level) topics, metrics, and ideas that populate the analytical side of professional baseball. For today’s write-up in particular, I will be answering the question “Is it better to be lucky than good?” by taking a look at cluster luck and BaseRuns.

Cluster luck is a term coined by *Trading Bases* author Joe Peta that serves to be the underlying explanation as to why the amount of games a team actually won would differ from the amount of games they should have won. Cluster luck itself is not an actual metric with a formula, but there are plenty of ways to calculate expected wins and see cluster luck in action. The Pythagorean win theorem, a formula created by Bill James (the founding father of sabermetrics), estimates the percentage of wins a team *should* have had given the amount of runs they have scored and the runs they have allowed. The original formula was:

##### (Runs Scored)^2 / [(Runs Scored)^2 + (Runs Allowed)^2]

Since the original formula was published, sabermetricians have more accurately assigned 1.83 as the exponent. Putting the formula to action, the World Series-winning Red Sox scored 876 runs and allowed 647 runs. Plugging those into the modern iteration of the Pythagorean expectation formula with 1.83 as the exponent, the Red Sox should have had a 0.635 win percentage, good for 102.9 wins. The Red Sox finished the regular season with 109 actual wins, a difference of 6.1 wins. Generally speaking, a four win difference between actual wins and Pythagorean expected wins is considered to be significant and generally non-repeatable. In other words, the 2018 Boston Red Sox very likely benefited from cluster luck.

But what exactly is “cluster luck”? Cluster luck is the idea that the particular sequencing of plate appearance outcomes lead to very different run-scoring and run-allowing results. Given that each plate appearance by any given player can be numerically boiled down to a set of expected probabilities assigned to each possible event, pure chance has the ability to cluster positive or negative events sequentially thus leading to very different outcomes. Take an inning in which a team has two walks, one single, one triple, two strikeouts, and one popout. Depending on the sequencing of those events, you can have two very different outcomes:

**Sequence A:**Single, BB, Strikeout, BB, Strikeout, Triple, Popout**Outcome A:**Three runs scored / allowed

**Sequence B:**Triple, Single, BB, Strikeout, Strikeout, BB, Popout**Outcome B:**One run scored / allowed

Sequence A would be considered to have an outcome that the *offense* benefited from cluster luck, whereas Sequence B would be considered to have an outcome that the *defense* benefited from cluster luck. Throughout the course of 162 games, teams can become significant victims or beneficiaries of cluster luck. Here is a table showing the lucky teams and the unlucky teams of 2018:

s always (and especially with baseball), we have an ability to go deeper with our evaluation. In particular there is the BaseRuns (BsR) metric which was originally developed by David Smyth and aims to estimate how many runs a team *should* have scored or allowed over the course of a season. Much like the Pythagorean expectation formula (and the vast majority of sabermetrics), the BaseRuns formula has evolved over time and there are some slight variances depending where you look, but the current iteration I utilize is the FanGraphs version that currently looks like this:

The above table shows BsR scored with the “lucky” teams on the left having more actual runs scored than BaseRuns scored, and the “unlucky” teams having the opposite. We can use the above table to potentially begin answering the “Is it better to be lucky than good?” question. Of the top nine teams in offensive BaseRun differential (actual runs minus BaseRuns), six played in the division tiebreaker games or made the playoffs outright. On the other hand, ten of the bottom eleven teams in differential missed the playoffs. BaseRuns alone are also just a solid measurement of a team’s strength in run production, as last year all ten teams who played in a division tiebreaker or made the playoffs outright finished in the top thirteen in BaseRuns scored. Next, let’s look at BsR allowed:

Obviously the inverse would be true with BsR allowed with the “lucky” teams on the left having less actual runs allowed than BaseRuns allowed, and the “unlucky” teams having the opposite. Five of the top eight teams that have a benefitial differential made the division tiebreakers or playoffs outright, whereas eight of the ten teams with the most unfortunate differentials missed the playoffs. Nine of the ten teams who made it that far also ranked in the top twelve in least BaseRuns allowed. The Rockies are the only such team to miss that cut, but given that they play in run-friendly Coors Field it’s easy to understand why they might not rank favorably in BaseRuns allowed. Nevertheless, now that we have an expected runs scored metric and an expected runs allowed metric, we can use the Pythagorean win formula to see how many wins each team *should* have won in 2018. Below shows exactly that, with the left table being sorted by differential and the right table being sorted by Pythagorean expected wins using BaseRuns. I’ve also highlighted the ten teams that played past Game 162.

As you can see in the right table, Pythagorean expected wins seems to be a good measure of team strength with nine of the ten “postseason” teams ranking in the top eleven. The Rays might be the team that suffered the worst cluster luck fate given the circumstance of their misfortune. They finished fifth in expected wins but finished eleventh in actual wins. Furthermore they finished seven games behind the Athletics for the final Wild Card spot. If you were to bring them from -5.67 to 0.00 in Pythagorean vs. actual win differential, it would have put the Rays at 95.67 wins. If you do the same for the Athletics with their differential (moving them from 2.49 to 0.00), the Rays would have had a better record and beaten them out for that Wild Card spot. On the other side of the coin, the 91 win Rockies certainly benefited the most from cluster luck as they should have had 84.39 expected wins using BaseRuns, which would have been bested by either the Nationals or the Cardinals when bringing their expected vs. actual win differential to zero. So to answer the question, yes, it is in fact better to be lucky than good (sometimes).

That’s going to wrap it up for my inaugural MLB write-up. I hope this was an insightful and educational start to what is hopefully a very insightful and long-lasting series of write-ups for the MLB. If you have any questions regarding the topics I covered today or have any topics in mind that you would like me to cover in future write-ups, don’t be afraid to give me a shout!

Until next time.

Great stuff. Looks like you have a typo above “…102.9 wins. The Red Sox finished the regular season with 109 actual wins, a difference of 5.1 wins.” That’s a difference of 6.1 wins.

Good catch!