Smart Baseball Read online




  Dedication

  To my wife, Christa, and my daughter, Kendall Joy

  Contents

  Cover

  Title Page

  Dedication

  Introduction

  PART ONE: Smrt Baseball

  1 Below Average: The Fundamental Flaws of Batting Average

  2 Pitcher Wins: One Guy Gets the Credit for Everyone Else’s Work

  3 RBI: Baseball’s Unreliable Narrator

  4 Holtzman’s Folly: How the Save Rule Has Ruined Baseball

  5 Stolen Bases: Crime Only Pays If You Never Get Caught

  6 Fielding Percentage: The Absolute Worst Way to Measure Defense

  7 Bulfinch’s Baseball Mythology: Clutch Hitters, Lineup Protection, and Other Things That Don’t Exist

  PART TWO: Smart Baseball

  8 OBP Is Life: Why On-Base Percentage Is the Measure of a Hitter

  9 The Power and the Glory: Slugging Percentage and OPS

  10 wOBA/WRC: The Ultimate Measure of the Hitter (Until the Next One)

  11 ERA and the Riddle of Pitching Versus Defense

  12 WPA: Measuring Clutch, If You Must

  13 The Black Box: How Baseball Teams Measure Defense Today

  14 No Puns Intended: Going to WAR to Value the Whole Player

  PART THREE: Smarter Baseball

  15 Applied Math: Looking at Hall of Fame Elections Using Newer Stats

  16 No Trouble with the Curve: How Scouting Works, and How the Statistical Revolution Is Changing It

  17 The Next Big Thing Is Here, the Revolution’s Near: MLB Statcast

  18 The Edge of Tomorrow: Where the Future of Stats Might Take Us

  Epilogue

  Acknowledgments

  Index

  About the Author

  Credits

  Copyright

  About the Publisher

  Introduction

  Like many of you, I imagine, I grew up in a Pleasantville-esque world of baseball statistics, where everything you might want to know about a baseball player was displayed in tabular form on the back of his baseball card (until you destroyed it by flipping it against a brick wall or sticking it in the spokes of a bicycle tire). A hitter’s home runs, average, and RBI were right there, along with the obscure and intimidating OBP and SLG. A pitcher’s won-lost record, saves, and ERA were shown, along with strikeouts, innings, and the undefined GS, which for a few of my elementary school years I could only assume meant “grand slams,” which never made mathematical sense to me. (It means games started.) I was born in 1973 and the eighties were my formative years as a fan. For most of that time, it didn’t even occur to me that there might be more information out there to learn about players’ performances, or that the stats were birthed from the sanctum sanctorum of baseball accounting. This was what there was, and if it was good enough for Topps and Newsday and WPIX, it was good enough for me.

  Of course, there came a point where I realized that these stats weren’t doing a particularly good job of telling me what was happening on the field or helping me predict what players might do in the future. I played fantasy baseball for thirteen years, from my senior year of high school (1990) until my first year as a front office employee of the Toronto Blue Jays, and in the first few years of playing, I was awful at it. I founded the league and finished dead last. I thought being good at math gave me some kind of advantage at the game, but it turned out it gave me a lot of false confidence and nothing else.

  Eventually, the desire to be better at a frivolous endeavor—we never played for money—drove me to seek out some new perspectives on baseball, which led me to the small but active online sabermetrics community of the time, and eventually to books like Baseball Prospectus and works by Bill James and Eddie Epstein. None of these were specifically guides to statistics, but they all looked at the game a different way, often incorporating new stats—James was a sort of Thomas Edison of the field, generating new stats as easily as most of us breathe—to tell the reader something new about a player. The more I read, the more I wanted to read. Baseball had always been my favorite sport, likely because it was my parents’ favorite and my grandmother’s as well, but now I could watch and follow the sport with a totally new set of eyes.

  In the twenty years since I first wrote my first public piece on baseball, in 1996, the field of baseball analysis has undergone a quantum-state change, going from one or two consultants providing statistical insight to a handful of interested teams to all thirty clubs employing departments of full-time quants. Where media coverage of baseball in the 1990s was homogeneous in people and in content, today it is exploding with diversity of faces, voices, and opinions. This revolution has had, at its heart, the rising adoption of statistical analysis within and around the game. If you said OBP was better than batting average in 1996, you’d be looked at as if you were a little strange. If you say it now, you’ll be asked why you’re not looking at wOBA or wRC+ instead.

  Why has baseball as an industry, including the media covering it and the fans who follow it, stuck with outdated statistics for so long? The answer is largely a giant appeal to tradition, a common type of fallacious argument that says we should keep doing it this way because we’ve always done it this way. Baseball has always suffered from a sort of inertia. Whether it’s about the rules of the game, replay, or the unwritten code of player behavior, old ideas are hard to unseat. For too long people have put faith in old numbers and stats precisely because they’re old; these are the numbers that the baseball gods graced us with all those years ago, so we must follow them—even if there are numbers out there that actually work better. A game with a century and a half’s worth of history has a hard time escaping the gravitational pull of that past.

  The fact that baseball’s irrational reliance on tradition, gut instinct, and flawed stats continued even as better stats became widely available to everyone isn’t just an academic concern. Because allegiance to these old stats is not rooted in accuracy or success, people who’ve repeatedly failed at their jobs are often given new opportunities to fail some more. Using the wrong measurements has resulted in bad decisions on contracts, playing time, trades, and draft picks. It’s led the voters in the Baseball Writers’ Association of America (BBWAA) to pick the wrong players for MVP, Cy Young, and Rookie of the Year awards, and often to screw up even obvious stuff like which players to put in the Hall of Fame. It drives conversations around teams and players—often driving the conversations right off a cliff. Even now, in 2017, you will still hear broadcasters refer to and rely on outdated or flat-out useless statistics to try to analyze what’s happening on the field, to advocate bad strategies, or to praise a player for doing something that actually wasn’t very good. This isn’t just an issue for Major League Baseball (MLB)—it’s a problem at all levels of the game. Just go to a college or high school baseball game and watch the bench empty as players rush to congratulate a hitter who just advanced a runner via a bunt or an out. “Yay! We’re in worse shape than we were a few pitches ago!”

  But even as commentators, managers, writers, and talking heads have resisted the statistical sea change of the last decade, most front offices around the league have long recognized that these and other numbers lie at the heart of the game precisely because they work better. They describe in-game events with greater accuracy and they predict what players will do in the future with greater accuracy. Baseball might be a sport fueled by nostalgia and adherence to the past, but no team wants to go back to a time when they used to lose more often.

  As such, teams evaluate players substantially differently today than they did in 2000, and it’s time that we as journalists, bloggers, and fans adapt. For that to happen, the conversation has to go beyond merely pointing out that batting average and the pitcher win a
re bad, into a discussion of what stats are better, allowing us to reframe how we discuss player performance. Communicating that is my main goal in writing this book. (Also money. But mostly communicating that.)

  The world of baseball is changing, and has been for some time now, but the mainstream discussion and coverage of the sport has lagged behind the changes within major-league team operations. You’ll still read elegies to the pitcher win in your local paper, arguments that poor defenders are actually great because they don’t make errors, and managers are brilliant for employing “small ball” tactics that lead to fewer runs. There’s no reason on earth for any baseball fan to cling to old, anachronistic, or disproven notions like these. I coined the Twitter term #smrtbaseball a few years ago, an homage to a Simpsons joke, to refer to managerial moves and executive comments that were, in fact, the opposite of smart. I’ve restored the “a” to smart here because the point of this book is to try to educate the reader on the way front offices look at player statistics and valuation today, and where their thinking is likely to head in the future.

  As 2016 drew to a close, Major League Baseball was coming off one of its most successful postseasons ever, one full of drama, narratives, and rising young stars, where the Chicago Cubs managed to end the longest championship drought in US professional sports, and did so in no small part because they went from also-rans in the stats department to industry leaders. You couldn’t watch or follow the 2016 playoffs without noticing, reading, or hearing about the statistical revolution—players’ Wins Above Replacement values, defensive positioning, advanced fielding metrics like Ultimate Zone Rating, and the use of leverage to determine when to use your best reliever. This was unthinkable when I first started dabbling in baseball analysis in my early twenties. It’s now standard, with every MLB owner who wasn’t already on board looking at the most successful teams the last few seasons and realizing that if they didn’t add this capability in-house they’d only fall further behind their direct competitors.

  You don’t have to understand FIP or dRS or exit velocity to enjoy a baseball game or follow a team. Granted, there are folks out there who’ll make you feel like you have to—I’m sure I’ve been guilty of that a few times—but the truth is you don’t need to know all this. It will make you a more educated fan, and to me, becoming educated makes me enjoy the game even more. It will help you when you hear your team made a trade or a signing and you don’t immediately get why they did it. It will help you understand a pitching change or a decision to bunt or bring the infield in—or maybe help you question it. And with coverage of every aspect of the sport, from games to transactions to postseason awards to the Hall of Fame, now suffused with the vernacular of sabermetrics, it’ll help you keep up with all of the great content being written and spoken about our national pastime.

  Smart Baseball is, more than anything else, a book for the reader. If we were sitting at a game together—something I’ve done with a handful of fans over the years—and you asked me why the save statistic is a travesty on the order of the Alien & Sedition Acts, or what I’m looking for when I scout a player in person, this book gives you the monologue version of the conversation we’d have.

  I try to build up from zero here, assuming you come into this book without knowledge of advanced statistics, or that you come into it knowing some stats are bad and some are good but would like a rational explanation of why. In Part One, I cover most of the traditional statistics that just don’t tell us what they purport to tell us. RBI, batting average, wins, saves—they’re a bunch of filthy liars, really, and they’ve been lying to us for decades now. In Part Two, I work my way up through some better traditional statistics, like on-base percentage (OBP), on my way to discussing entirely new stats that show how teams and analysts try to value a player’s production. If you want to pay for a player, first you have to know what he’s worth, and to do that, you have to know how much baseball value he produced. In Part Three, I apply these concepts to Hall of Fame debates, explain how traditional scouting works and is changing in light of new data, and discuss the MLB Statcast product, an entirely new stream of data that dwarfs anything teams have worked with previously. The future of baseball analysis revolves around Statcast, which has the potential to change the way teams look at everything from contracts to scouting to player development to keeping players (especially pitchers) healthy.

  Sabermetrics is baseball math, but I’ve tried to keep the math in this book to a bare minimum. This isn’t a manual to build your own, better sabermetric mousetrap, although I won’t discourage you from trying; this is about a new way of thinking about the game, a general philosophy of player valuation and evaluation that over the last fifteen years has gone from the lunatic fringe to the predominant way of thinking. Every MLB team has made or is making statistical analysis a core part of its baseball decision-making process, and the effects of this revolution were all over the 2016 postseason, from Cleveland’s unconventional use of closer Andrew Miller to the World Champion Chicago Cubs exploiting new data to become defensive wizards. Even if you just want to follow the conversation around the game, it will help to know where we came from and where the world of baseball statistics is going. This book will take you there.

  PART ONE

  Smrt Baseball

  1

  Below Average:

  The Fundamental Flaws of Batting Average

  The language of baseball is built around some of its most basic statistics. Batting average, the simple division of a hitter’s hits recorded by the number of at bats he had, is the foundation of baseball’s “batting title.” The player in each league with the highest batting average is named the “batting champion.” When hitters retire, we count their batting titles and compare them to other batting champions’ totals. We revere the “lifetime .300 hitter” as if he ascended to a higher plane of existence than the mere .299 hitter. But the batting title and the stat behind it are both guilty of telling us half-truths, giving us a less-than-complete story of the hitter’s performance.

  Consider the descriptions found on the plaques for these Hall of Famers:

  An artisan with a bat whose daily pursuit of excellence produced a .338 lifetime batting average, 3141 hits, and a National League record–tying eight batting titles . . .

  —Hall of Fame plaque for Tony Gwynn

  A five-time batting champion who also led the league in on-base percentage and intentional walks six times each . . .

  —Hall of Fame plaque for Wade Boggs

  Led American League in batting twelve times . . .

  —Hall of Fame plaque for Ty Cobb

  Accomplished as these players were, their lionization in Cooperstown ignores a crucial question: If you’re only leading the league in batting average, one flawed and incomplete stat, should we really say you led the league in “batting”? Are you the “batting champion” if other players hit better than you did?

  Batting average has been at the top of the heap of hitter stats for as long as hitters have been putting bat to ball. The English-American statistician Henry Chadwick is credited with creating batting average (among many other common baseball stats) in the late 1800s, designing it along the lines of cricket’s version of batting average, which is runs divided by outs. Baseball in the nineteenth century resembled today’s game, but had several significant differences, such as times when batters could tell the pitcher where they wanted the ball thrown, or periods where the number of balls required for a walk or strikes required for a strikeout varied from today’s 4 and 3. Hitting the ball over the fence for a home run was rare—in 1895, the National League leader in home runs had 18—as most hitters were just trying to put the ball in play. So, at the time, Chadwick’s idea had merit: when batters rarely walk and are focused on making contact, hits divided by at bats probably is a good measure of their performance.

  Batting average today still has some value, albeit a limited one; batting average’s primary problem is one of marketing. If batting average were content
with second-tier statistical duty, to impart some small amount of information, without claiming to be the be-all and end-all of hitting statistics, then it would probably fly under the radar without attracting much notice from traditionalists or statheads.

  Ah, but when you claim to be the King of All Stats and fail to deliver, then you have earned my ire—and that of analysts and executives around the sport, who now recognize that you can get all the information batting average is supposed to give you in other, more complete, less flawed statistics. So while we still celebrate the player who “won the batting title” or “led the league in hitting” for having the highest batting average in the league, the stat itself has been falling out of favor for twenty years already—and its decline is only accelerating.

  All this history may be impressive, but it obscures what batting average actually tells you. Batting average is a simple calculation any third grader could do—take a player’s hits, divide it by that player’s at bats, and round to three digits. That’s batting average, and while in tiny samples it can range from .000 to 1.000, in the modern era batting averages have typically fallen in the .200 to .400 range. In the five seasons from 2011 to 2015, no player qualified for the “batting title” with an average above .350, and only two players even cracked .340 (Jose Altuve once, Miguel Cabrera twice).

  Did you notice that odd phrase in there—qualified for the batting title? Because batting average is a rate stat, a statistic that measures something per something else—in this case, hits per at bat—MLB sets a minimum threshold to appear on its leaderboards, in this case a reasonable 3.1 plate appearances per team game played. Since a full season for most teams is 162 games played, that means a player must have 503 plate appearances on the year to qualify for the league’s batting title or appear anywhere on the leaderboard.