James’ Cy Young Predictor

A book I received for Christmas and that I often consult and reference is The Neyer/James Guide to Pitchers. If you can imagine such a volume, it lists every pitcher on record who has thrown a ball in a Major League game. Beyond that, it lists their relevant statistics, their tendencies, what pitches they threw, and in what part of their career they threw those pitches. It also includes a history of every type of pitch ever recorded from fastballs to eephus to foshballs.

In addition to the stats and lists and commentaries, my favorite part of the book are the articles at the end written by Bill James, Rob Neyer and others on a variety of different projects and experiments they undertook for the book or for their own interests. One of the more interesting and provoking pieces is one about Bill James’ Cy Young Predictor that he developed and tweaked for the book and explains in a chapter entitled E = M Cy Squared. James claims that after years of working on it, he has developed one of the most accurate formulas for predicting Cy Young winners. The formula, explained in more detail in the book and in a link I will provide later, is simply:

Wins times 6
minus Losses times 2
plus Strikeouts divided by 12
plus Saves times 2.5
plus Shutouts
plus Runs Saved
plus 12 points for pitching for a first-place team

What James claims, and there is data to back it up, is that his formula has predicted the Cy Young winner accurately almost 81% of the time. In the book you will see formula results compared to Cy Young voting for each league dating back to 1990. For example:

1990 American League
Formula:
1. Bob Welch
2. Roger Clemens
3. Dave Stewart

Voting:
1. Bob Welch
2. Dave Stewart
3. Roger Clemens

The only complaint I can come up with about the formula is that it can tend to overvalue closers somewhat. The formula predicted Cy Youngs for Eric Gagne and Billy Wagner in 2004 and 2006, respectively. Roger Clemens and Brandon Webb were the eventual winners in those years. But overall you can’t complain about 81%

So what about this year. Fortunately, EPSN runs the formula continuously on its website, and can be found at this link. For the 2007 season, the formula presently predicts the top three this way:

NL
Jake Peavy
Brad Penny
Brandon Webb

AL
John Lackey
Dan Haren
JJ Putz

After looking at the numbers, I have to say I really don’t have a problem with Lackey and Peavy winning the award this year, if that is what ends up happening. Lackey is tied for the MLB lead in wins with 15, with an ERA of just 3.07 and an ERA+ of 140, easily the highest of his career. Plus, whether you feel it is deserved or not, the Angels are in first place, which will count in voter’s minds, and why James gives this factor an additional 12 points on the scale. Peavy is 13-5 and leads the NL in strikeouts with 164. His ERA+ of 184 is off the charts as well and is also the highest of his career.

One funny thing about the NL race is that I wonder where Chris Young would be if his team had won him more decisions and if he had not spent two weeks on the DL. Young’s ERA, WHIP, ERA+, HR/9, and LOB% are all better than Peavy’s. Young has four starts this season where he has allowed one or zero runs but has recorded either a loss or no decision. Turn those games into wins (and a 13-4 record) and you are probably looking at Young in the top three of NL in the formula.

One more thing before I finish here. What the heck is going on in the West division to have all of these great pitchers?! All six pitchers listed above are in the West in their leagues, and there are also two more from each Western division in each league’s top ten. So five out of the ten in the NL and AL are from the West. Not listed above are Francisco Rodriguez, Kelvim Escobar, Jose Valverde, and Takashi Saito.

That just struck me as odd, is all. Who gets your vote for Cy Young in each league this year? Do you dare disagree with the mighty Bill James?!

2 Responses to “James’ Cy Young Predictor”

  1. My vote for the NL Cy Young would go to Jake Peavy as well. The AL? I think its a little more cloudy in that league. Right now, I would vote for Santana (who is leading as of today, according to the CYP). But Bedard would be a really close 2nd for me, with Santana getting the nod simply because Minnesota is doing a little better than the Orioles. What Bedard has done thus far is pretty amazing.

  2. [...] Stats Glossary James’ Cy Young Predictor [...]

Leave a Reply