Episode 7 | MLB + counting stats
- Jonah Vega-Reid
- Jul 15, 2024
- 2 min read
Finally, we make a video about the best sport there ever was, baseball. Any statistician (or sabermetrician) worth their salt will tell you that baseball lends itself to nuanced statistical approaches better than any other major sport. The reason is temporal isolation, which is a fancy way of saying that there is no running clock on plays. This means that each pitch and subsequent play is an isolated event that can be studied and used as an individual observation in analysis. This also means that fatigue and athleticism play a reduced role while skill and technical ability are showcased.
The other unique feature of baseball is the fixed inning start and complete separation of offense and defense. In a sport like american football, you may have separate offensive and defensive players, but offensive field position and number of possessions is partly determined by the defense, making the two interconnected in a way that baseball simply is not. Instead, each team starts their offensive (and defensive) half inning in exactly the same situation, no outs and nobody on. This means that analysis of either side of the ball is much cleaner because the other can be ignored.
In our first analysis of baseball, we stuck to the basics. Hits, homeruns, walks, strikeouts, and stolen bases were analyzed in single variable models with a binary winning outcome for logistic regression and run differential as a continuous outcome for linear regression. Each stat was effectively a 'differential' type stat with the total for the home team minus the total for the away team. For the purpose of interpretation, the numbers reported in the video are as the home team winning the 'battle' of that particular stat by one. For example, if a team wins the home run battle by 1, their odds to win that game increase by 57% but if they win by two their odds go up by 114%. So a bigger gap in the stats means an increase in the results.
Each of the stats was individually significant with respect to game outcome which isn't very surprising. They are objectively positive events, provided your team is leading in that category, and each has an impact on the game. Not surprisingly, hits was the most important stat in this respect, with an R^2 value of 33%, indicating that a large portion of game outcome can be explained by hits alone. Hits are a very clean baseball stat, all signal and almost no noise. A hit means you got the better of the pitcher, got on base, and possibly even got yourself in scoring position, without an out occuring (as in the case of a sacrifice hit or fielders choice).
We analyzed some of the qualitative nuance of the numbers and speculated as to why they turned out that way. Hope you enjoy the video, we certainly had fun yapping about baseball!
コメント