If You Don't Want to Read the Whole Thing: I assume you know what batting average is. On base percentage, which measures a batter's ability to get on base via hits, walks, and hit by pitches (hits by pitches?) is more closely related to run scoring than batting average. So when you want to learn how good a hitter is, on base percentage does a better job than batting average.
But If You Do: I haven't been posting anything during the postseason. This is the time of year when you have many ways to enjoy baseball, from nationally televised games to wall-to-wall coverage by major media to Twitter to baseball-specific sites that are absolutely on top of their games. Easy to get lost in that shuffle. My only post has been to lament the demise of small-market teams in the postseason, though I'll concede that a World Series matching the teams with the best regular season records in their respective leagues is aesthetically, if not emotionally, pleasing.
I'm going to use this hiatus to start what I expect will be a series of posts going into the basics of baseball analysis. One of the things I've learned in a long career as a financial analyst is that you've got to constantly question your assumptions. The rules that worked a decade ago may not still be applicable. So let's start questioning.
I'm going to start with on base percentage, or OBP. It measures the frequency with which a batter gets on base. The formula is (hits + walks + hit by pitch) / (at bats + walks + hit by pitch + sacrifice flies). You've seen me use it several times in posts. Why is it important? Or, as the casual fan may ask, why should I care about it when I have batting average?
To partially answer the second question, batting average (BA) is just hits divided by at bats. It measures the percentage of time that a player gets a base hit. A good hitter hits .300. Ty Cobb, Babe Ruth, Ted Williams, Rod Carew, Tony Gwynn...all .300 hitters. A .300 batting average is good. A .200 batting average isn't. So what's wrong with that? Why do I need OBP?
Well, let's look at batting. The object on offense is to score runs. Something that helps you score runs is good. Something that impedes scoring runs is bad. We can measure how good or how bad something is by using a statistic called correlation. Correlation measures how closely related two sets of numbers are. The higher the correlation, the stronger the relationship. The correlation coefficient is the statistic that correlation yields. It runs between 0 (no correlation at all) to 1 (perfect correlation). Once you get above 0.7 or so, you're talking about a pretty decent correlation, in general. (I've put a longer discussion of correlation in the tab Statistics at the top of the blog. You can go there to read it if you want. I figure many of you already know about correlation, and many others of you didn't come here for a math lesson.)
In order to measure how much a statistic like BA or OBP contributes to run production, I ran correlations. I used every season in the three-division era that began in 1995. That works out to 564 team seasons: 28 teams for each of 1995-97, and 30 teams starting in 1998, when Arizona and Florida joined the league.
The correlation coefficient between runs and batting average is 0.80. That's pretty high. What it means is that batting average explains a lot of how runs are scored (technically, about 65%).
Again, that's pretty high correlation. If we're trying to measure how runs are scored, batting average does a nice job, and that's based on 564 datapoints, so it's not random.
The thing is: on base percentage does better. The correlation between runs and OBP is 0.88. That's a good bit higher than for BA. If batting average provides 65% of the explanation of how runs are scored, on base percentage explains 78%. That's a big difference, big enough to make OBP much more useful than BA. Simply stated, on base percentage contributes more to scoring runs than batting average.
We're talking about teams here, but this relationship applies to individuals as well. Consider, for example, two catchers: The Tigers' Alex Avila and the Angels' Chris Ianetta. They played about the same amount this year (379 plate appearances for Avila, 399 for Ianetta) with similar power (14 doubles, 1 triple, 11 homers for Avila; 15 doubles, no triples, 11 homers for Ianetta). Their batting averages were similar: .227 for Avila, .225 for Ianetta. Identical players? Not at all: Ianetta drew 68 walks and was hit twice, while Avila walked 44 times and was hit once. That difference gives Ianetta a .358 on base percentage, fifth best among the 24 catchers who played in 100 or more games, compared to Avila's .317, which ranks 16th. That makes Ianetta a more valuable offensive performer, given the importance of OBP.
Or compare two second basemen, the Mets' Daniel Murphy and Tampa Bay's Ben Zobrist. In an almost identical number of plate appearances (697 for Murphy, 698 for Zobrist) they displayed similar power (38 doubles, 4 triples, 13 home runs for Murphy; 36, 3, and 12 for Zobrist) and Murphy had a better batting average, .286-.275. So was Murphy better? Nope. Zobrist got on base an additional 45 times that don't show up in batting average compared to Murphy (72 walks and 7 hit by pitch for Zobrist, 32 and 2 for Murphy). That gives Zobrist a much bigger edge in OBP (.354-.319) compared to Murphy's edge in the less important BA.
As you can tell, the biggest difference between BA and OBP is walks. A player who walks a lot boosts his OBP more than one who walks infrequently. You know that saying from the playground when you were a kid, "a walk's as good as a hit?" From the perspective of OBP (and from the perspective of scoring runs), that's exactly right.
What's a good OBP? This year the major league average OBP was .320 compared to .256 for batting average. The top three were Miguel Cabrera (.442), Joey Votto (.435), and Mike Trout (.432). The bottom three were Alcides Escobar (.259), Darwin Barney (.266), and Adeiny Hechavarria (.267). The 75th percentile was Adam Lind's .357, and the 25th percentile was Ryan Doumit's .314. If we're looking for a rule of thumb, like for a .300 hitter, well, there were 24 hitters who batted .300 or better in 2013. Chris Davis was 24th in OBP with .370. So let's say a .370 on base percentage is like a .300 batting average.
When you're watching the Series, you might see that Mike Napoli hit .259 this season compared to his likely first base counterpart, Matt Adams, who batted .284. Here's two other numbers: Napoli had a .360 on base percentage, while Adams' was .335. So you'll know that Napoli was actually the better offensive performer, and that's before you start comparing beards.