Projecting projections

It seems quite hard to believe, but we are only three weeks from February 2008. And since February is still the month when pitchers and catchers report, we are drawing very close to a new baseball season.

After the first of year, many, MANY websites and media outlets will begin posting predictions and projections for the 2008 season; who will win the divisions, win the big awards, have the best fantasy seasons, etc. And while most of those are fun to read at a slow work day, they are just the guesses of the writer or team of writers that put the information together, and are very rarely based on anything quantitative or conclusive. You will get a lot of people saying A-Rod or Ortiz or Miguel Cabrera will win the AL MVP next year. Wow. Was that hard to come up with? My wife could tell you that much. It’s the projections that are actually based on something other than gut feelings and conjecture that are truly interesting to look at.

And that’s what I want to explore today. Most of what you will read below is still very much unknown by John and Jane Baseball Fan, who rely on their local paper and ESPN for projections of a new season. But in the past few years, a few dedicated people and ambitious websites have taken on the goal of trying to come up with the most accurate projections possible. Whether the goal is winning a fantasy title, creating an accurate formulaic projection, or just the desire to be right, all the examples you will see here want the title of the most accurate season forecaster.

And while it may seem intimidating to many (including myself) to stop thinking of projections in terms of, “well this player is a little older, he won’t do as well this year,” or “this team added some great hitters, they will definitely score more runs (ahem, Astros),” and start thinking of things like park effects and team run differential and pitcher LD/FB ratios, in the end we will come up with a much better product.

So let’s get started with the most familiar of the bunch first:

The Bill James Handbook 2008 & BIS Pitcher Projections

Every year, Bill James and his partnering organization, Baseball Info Solutions, release a set of projections for the upcoming season. These are usually the first to arrive, and if you search the player pages on www.fangraphs.com, you can find the simple projections for most players next season (more on that in a minute).

What these projections offer is pretty simple. For hitters, you will find your basic projections for stats like average, RBI, HR, plate appearances, hits, walks, OBP, SLG, OPS, etc. But, there is also valuable information found in projections for less popular stats like BB%, ISO, BABIP, Runs Created, and RC/27. Lance Berkman, for example, is projected to have a .954 OPS and create 124 runs in 2008. And those are numbers not based on Miguel Tejada in the lineup. Your pitching stats will encompass the normal wins, losses, ERA, innings, strikeouts, etc. But also available are K/9, HR/9, WHIP, and pitcher BABIP numbers amongst others.

Without the exact numbers and formula, all I can say is that the projections for players in the Handbook are based on past performance, then three things are factored in, age, projected playing time, and park effects. The only downside to the Bill James projections is that they are the first projections out, usually by the first of December, and therefore can not account for things like recent trades that have just happened, injuries during spring training, or anything that might affect where a player plays or how much he plays. And some numbers are not available for rookies of the previous year. So if you want to know how Troy Patton will perform in 2008, you will have to look elsewhere.

You can pick up the Bill James Handbook at the publisher’s site here for $21.95.

PECOTA - Baseball Prospectus

PECOTA, which stands for Player Empirical Comparison and Optimization Test Algorithm, is the creation of Nate Silver at Baseball Prospectus and is possibly the most comprehensive projection system out there. It covers not only every major league player, but also projections for team records for the upcoming season.

Using a vast series of computations and algorithms including the not-so-well-known reliever leverage, BatDelta, PitDelta, and others that are incomprehensible to a run-of-the-mill numbskull like me, every player currently in a major league uniform is given a projection for the next season. Trying to simplify it all, the process works like this:

Using some of the same methods such as projected playing time, age, park index, risk for injury, past performance, etc., a projection is made for each hitter and pitcher.

The team totals for hitters in stats like Avg, OBP, SLG, are used and plugged into David Tate’s Marginal Lineup Value Formula, and an estimate of runs scored for each team is then spit out taking all of the factors for each hitter into account. So, an estimate for a team is based on the 650 PAs the everyday leadoff hitter might have, plus the 75 PAs a bench player might have who rarely gets in a game, and everyone in-between. Even forecasts for stolen bases are included.

For pitchers, the exercise is simpler, as they just total up the projections for runs allowed from each pitcher on the staff and tally those together for a team. You now have a team where you guess the number of runs they will score and the number of runs they will allow.

These runs scored and runs allowed totals for each team are then plugged into the Pythagenport formula, which translates those numbers into a win-loss total for every team in the Majors. Once you have those, it is easy to see who the formula thinks will have the most wins, losses, etc. for a season. and you can rank them by division, which BP does before every season.

PECOTA usually goes as far as to project up to five seasons for all players that are input in the calculations. But, because of the variance of numbers from year to year, players changing teams, and schedules that shift from year to year, the team won-loss records are typically done a year at a time.

The criticism I hear about PECOTA is that is very often seems to be too conservative. But some of that criticism tends to be unfounded. Using Berkman as an example again, he had that monster 2006 season, but PECOTA projected him at 32 HR and 100 RBI. This was laughed at more than once, but sure enough, Berkman finished the season at 34/102.

Every player has a PECOTA page on Baseball Prospectus, but you have to pay for a membership to be able to view them. But I can vouch that it can be very helpful for fantasy purposes. Team record projections are usually posted in an article on the front page, but also can only be fully viewed by members. A one-year subscription is $34.95 for the site.

Ron Shandler & Baseball Forecaster

Ron Shandler is the fantasy guru and stats projectionist made famous in the book Fantasyland by Sam Walker (a great read by the way) and the famed experts fantasy league, Tout Wars. Shandler’s Baseball Forecaster website (www.baseballhq.com) claims that it is “Since 1986, the industry’s leading resource for creating fantasy baseball winners.”

Shandler claims to be the first to acclimate sabermetrics, including some stats of his own creation, into fantasy baseball projections and predictions. And his numerous expert league titles and awards won by his website and projections definitely show that he is one of, if not the best in the business.

The mantra that Shandler uses to lure people to his product is that winning in fantasy is NOT, in fact, all about luck. Any fantasy player will try and tell you that he was lucky to pick up Magglio Ordonez in the 11th round last year or something similar, but Shandler doesn’t see it that way. He believes luck can be managed and even predicted.

In a world of fantasy- and real-life-based projections that are clouded by simple and non-descriptive stats, he takes what he calls a “components-based approach” to player projections. While a lot of times on this site, we talk about the numbers beneath the numbers and how certain things lead to more common stats we see like batting average, RBI, ERA, etc, Shandler really gets into the numbers beneath the numbers beneath the numbers of the common Fantasy stats out there.

It can sometimes seem like a daunting task to try and grapple with some of Shandler’s stats like qERA, the RIMA plan, and reliability score, but fortunately with each purchase of his book or online projections, a comprehensive glossary with explanations is included. The final outcome is very comprehensive in its format, showing all stats, rates, and formulas used to pull together the numbers for a specific player, culminating in the final projections for that player for the upcoming season.

This book is also now on sale and can be purchased here for $24.95. And of you order the book directly from Shandler Enterprises, he will also send the book in an online format, complete with PDF and excel files for all charts; a handy tool for draft day.

There also is some valuable information that can be had for free on the website, beginning with an option to sign up for a free e-newsletter that comes every Friday from January to September with projections, trends, FAQs, etc. Also check out Shandler’s essay on projective accuracy located here.

Marcels and Tangotiger

This set of projections is hot off the presses for 2008 and is available on the incredibly helpful website www.hardballtimes.com. Marcel projections are comically named for Ross Gellar’s monkey on the show Friends, as in they are so simple that even a monkey could do them.

Worked out by the famous sabermetric/projectionist blogger Tangotiger, Marcel projections (fromTHT website) “simply consist of averaging a player’s previous experience (with greatest weight on the most recent years) and regressing to the major league average depending on the number of years the player has been in the majors. This is done for each component (home runs, doubles, walks, etc.) A simple aging factor is applied, but no park factor.”

Easily laid out into projections for hitters and pitchers Marcels are laid out on The Hardball Times by default to rank hitters by GPA (gross production average, a stat developed by THT that is in the same family as OPS, but more accurate. GPA is laid out like batting average when determining good and bad. The formula for GPA is (OBP*1.8+SLG)/4, adjusted for ballpark factor), and pitchers by ERA. But each chart also has at least a dozen stat projections that are sortable for the upcoming predictions in 2008.

This link specifically lays out how Tangotiger determines Marcels, with numerous helpful comments attached that provide more insight into specifics below the essay. He does not use GPA in his projections, that is done after the fact by THT.

Essentially, Marcels counts on a baseline average for all major leaguers and uses that as a starting point for comparing hitters and pitchers and then relies on Tango’s 5/4/3 method of weighting the three previous seasons of data for a player and using that compared to projected plate appearances to come to the final conclusion. One downside is that all rookies with no MLB experience will be rated as being league average, which is not always the case.

Also in an interesting piece, Tango looks at Marcels related to other prognosticators (circa 2007) and compares them mathematically to the actual stats and each other, so it’s proves helpful to see how these different systems rated to at least one of their peers. Specifically look at comment number five by Nate Silver.

And while Marcels are free, THT also sells The Hardball Times Season Preview 2008, which includes newly developed player projections by The Hardball Times as well as projections for each team and also for players’ careers. The book is available here and sells for $17.95.

ZiPS and Dan Szymborksi

Right now, on the website www.baseballthinkfactory.org, Dan Szymborski is a little more than halfway through his yearly set of team by team articles complete with their ZiPS projections for 2008. ZiPS is a computer-based projection system that stands for sZymborskI Projection System.

Like many others, ZiPS relies on weighted yearly stats (four years for this system). But a difference in this projections system from many others is that Szymborski doesn’t try to compare his numbers and averages to individual players with similar past performances or results, but rather large groups with similar characteristics where he can concentrate on comparing larger sets of data against a more balanced field.

Stats such as BABIP, K-rate, and Speed Score and relied upon heavily in these projections for hitters and pitchers. And like the others, playing time and PAs are based on most recent playing time and PAs for specific players.

Szymborski also projects league offensive totals for upcoming seasons by weighing recent years for the separate leagues and looking at trends in offense, defense and pitching to determine new numbers. Projections for hitters can then be compared to these offensive totals or averages to see how they will compare. He also breaks them down by position in each league to truly compare apples to apples.

ZiPS projections are done for a very large number of players, some of which realistically have no shot at much playing time in the major leagues, but should they get that chance, Szymborski takes all of their factors into account as well and predicts performance should they get a shot. If using these for fantasy purposes, he leaves it up to the reader to determine whether they feel like they should take a shot on someone who may or may not have that playing time.

For now, the ZiPS projections are completely free and accessible through Baseball Think Factory. The articles on each team are a very good read and can easily be found on the front page. Most recently completed are the 2008 White Sox.

So there you have it, 2500 words about some of the most well-known and accurate projections out there. The question I hear the most is “well yeah, but how accurate were they in their projections? Which one should I use?” Well, find out for yourself. Pick up a couple of the books or check out a couple of the websites and look at predictions from last year and compare to the real numbers we now have. You will find a favorite eventually; one that peaks your interest based on what you really want to use them for.

If you know of any others that you would like me to research and get some info (like CHONE, Rotowire, etc.), let me know and I will see what I can do.

4 Responses to “Projecting projections”

  1. Ryan, thanks for the THT reference, but I want to clear up one thing. The THT Season Preview will have our own specially-developed projections (we don’t have a fancy name for them), not the Marcels.

  2. Thanks for the update. It has been corrected in the post.

    RK

  3. [...] Stats Glossary Projecting projections [...]

  4. What about this service?

Leave a Reply