Baseball Notes » Stats http://somebaseballnotes.com Searching for truth behind the numbers of this great game Sat, 05 Apr 2008 06:24:50 +0000 http://wordpress.com/ en hourly 1 http://www.gravatar.com/blavatar/388dd55313d1745707a85386007a5851?s=96&d=http://s.wordpress.com/i/buttonw-com.png Baseball Notes » Stats http://somebaseballnotes.com What interested me online this week… http://somebaseballnotes.com/2008/04/05/what-interested-me-online-this-week/ http://somebaseballnotes.com/2008/04/05/what-interested-me-online-this-week/#comments Sat, 05 Apr 2008 06:24:50 +0000 Ryan Kirksey http://rkirksey.wordpress.com/?p=136 ]]>

A lot of great stuff this week online from a baseball stat and sabermetric perspective. I don’t want to take too much time building it up, so I will just get right into the details.

The Great Clutch Project

I mentioned this a few weeks ago, concerning Tom Tango asking fans to join him in a battle in the clutch debate where he will pit his “good” hitters against the fans’ choice for “clutch hitters” to see if there really is a way people can see and perceive clutch. The stats he is using are his Leverage Index scores and wOBA (weighted on base average), so if you are not familiar with those, read up on them. You can find the summary of the project here, and Fangraphs will be running the season tally here for 2008.

Never thought I would see a Ginger/Mary-Anne and clutch/non-clutch analogy used, but I guess nothing should surprise me anymore.

Hardball Times OF Arms

John Walsh of THT reveals his new defensive metric to measure OF arms, something that has always been missing and that is sorely needed in the defense discussion that has escalated in the past few years. You can search by year on their stats page here, and there is a lengthy description of the methodology at this link. The stats for the OF arms goes back to 2004.

Richard Justice’s war with the stat guys

Local sports writer for the Houston Chronicle has the stats world up in arms from his reaction to a post by Mitchel Lichtman on Justice’s blog piece this week about bunting and how it is always a bad idea. Apparently, Justice has long been a target of some bloggers for his inability to look past emotion and personal feelings and look at the numbers. And the blogosphere just can’t get enough of all of this. And all of this from a couple of sentences about how bunting is always bad in the situation with which the Astros were faced.

For the record, I fall somewhere in the middle of what Lichtman and what Keith Law propose (you need to read all the threads to understand where that is). A manager has to make a yes/no decision in that moment, but his job should be to be as prepared as he possibly can with all the available data that will help him make an educated decision. The ones that are too close to call? Well, that’s why a managers are paid the way they are.

Lineup Analysis

This is not a new tool by any means, but something I have been messing around with this week that I recommend. Baseball Musings hosts a page that has a Lineup Analyzer put together by Morong, Arneson, and Armburst that allows you to put in any nine players with their OBP and SLG and it will construct the ideal lineup based on those numbers, and their calculated comparison and analysis of the two.

Here is the page, use it on your favorite team for this year or any year.

]]>
http://somebaseballnotes.com/2008/04/05/what-interested-me-online-this-week/feed/ rkirksey
To be the aboslute worst and the aboslute best http://somebaseballnotes.com/2008/04/04/to-be-the-aboslute-worst-and-the-aboslute-best/ http://somebaseballnotes.com/2008/04/04/to-be-the-aboslute-worst-and-the-aboslute-best/#comments Fri, 04 Apr 2008 05:06:07 +0000 Ryan Kirksey http://somebaseballnotes.com/?p=134 ]]>

One of the local FM radio stations here in Houston is celebrating “Baseball Week” all this week, with baseball-themed interviews (including Jose Canseco and the guy who is auctioning off Barry Bonds’ 762nd home run ball and who sounds like he is perpetually stoned), Astros updates, and other various items. And, God bless ‘em, they are really trying. In fact, on Monday they posted a poll on their website that had the question, how many games will the Astros win this year?

So I figure this will look like most of these things I have seen before and have some options like less than 70, 71-75, 76-80, 81-85, etc. But that is assuredly not what I found. You had four options that looked like this:

1. 0-40
2. 41-80
3. 81-120
4. 121-162

Huh? I mean winning 40 or less or 121 or more - is that even possible? And of course a few real smart folks voted for those top and bottom two, but otherwise it was like 45% for number 2 and 45% for number 3. One of the more scientific polls ever created, if you ask me.

But that got me thinking. What would it really take for the 2008 Astros, or any team, to win 40 or less or 121 or more. So I decided to first look back in history. First, the worst teams in major league baseball history, by number of wins:

1. 1899 Cleveland Spiders - 20 Wins, 134 Losses
2. 1916 Philadelphia Athletics - 36 Wins, 117 Losses
3.  1962 New York Mets - 40 Wins, 120 Losses
4. 2003 Detroit Tigers - 43 Wins, 119 Losses

And now the best teams, by wins, in MLB history:

1. 1906 Chicago Cubs - 116 Wins, 36 Losses
2. 2001 Seattle Mariners - 116 Wins, 46 Losses
3.  1998 New York Yankees - 114 Wins, 48 Losses

Essentially, we have had three teams win 40 or less over a full season (though 2 of those teams played 154 game seasons), and no team has ever reached 120 wins. But I did not want to stop there; I wanted to look at how futile or magnificent a team would have to be to reach these win milestones. First, the 40 win or under plateau.

Using our trusty Pythagenpat formula again, we can work backwards to find out how many runs a team would have to score and allow to only win 40 games.

Winning 40 out of 162 gives you a winning percentage of roughly .250 or 25%. So let’s say for arguments sake you have an average park, pitching staff and defense, and your team allows exactly the average number of runs in a season to their opponents. From 2001-2007, the average runs allowed by a major league team was 768, so we will start with that number.

With the formula being to first solve for the exponent, using: X=((rs+ra)/g)^.285. With X being the exponent, you then calculate rs^X/rs^X+ra^X = Winning Percentage. Working backward, and sparing you the math, a team that allowed 768 runs (thus being an average team in that department) would need to score about 415-420 runs to fall right into that .250 winning percentage.

For some context, no one this decade has had fewer than the 574 runs the Dodgers scored in 2003. So we are talking about more than 150 runs less than that team, assuming an average runs allowed total. 420 runs scored only means 2.59 runs per game. Now I don’t want to steer you in the wrong direction here. In the dead ball era, this number was routinely matched and even bested. Your NL record for fewest runs is 371 by the St. Louis Cardinals, led by the great Red Murray and his .282 batting average and 62 RBI. For the AL, the record for fewest runs is 380 by the 1909 Washington Senators, captained by Bob Unglaub and his .265 average with 41 RBI.

Now for the other side. To win at least 121 games would mean a winning percentage of 75%. We will use our same 768 runs to define our average defense/pitching staff/park. Working backwards, using the same formula as before, we can find that a team that allowed 768 runs over a 162-game season would have to score right at 1300 runs (give or take 10 or so on each side) to equal a .750 winning percentage.

This number certainly is more than any team has ever scored in one season in major league baseball, and would equal more than eight runs per game for 162 games. Although, it doesn’t outpace the historical leaders in runs scored by one team by that much. In the AL, the 1931 New York Yankees (did you think it would be anyone else?) scored 1,067 runs, led of course by Ruth and Gehrig. More surprisingly, in the NL, the 1894 Boston Beaneaters scored 1220 runs, led by an amazing nine players who batted .320 or better - and five of them had better than 100 RBI. Of course that was a different time and game, so just in the context of this decade, no one has scored more than 978, the 2000 Chicago White Sox, still more than 300 runs behind our 121-win team.

So certainly both of these situation are unlikely to happen this season, or in any season in the near future. Seeing what a team would have to achieve to accomplish these makes it seem as though we may not see either happen the way the game is played today.

]]>
http://somebaseballnotes.com/2008/04/04/to-be-the-aboslute-worst-and-the-aboslute-best/feed/ rkirksey
Southeast Invitational: Preseason best and worst value pick predictions http://somebaseballnotes.com/2008/03/26/southeast-invitational-preseason-best-and-worst-value-pick-predictions/ http://somebaseballnotes.com/2008/03/26/southeast-invitational-preseason-best-and-worst-value-pick-predictions/#comments Wed, 26 Mar 2008 22:12:35 +0000 Ryan Kirksey http://rkirksey.wordpress.com/?p=132 ]]>

In another one of my posts that will interest exactly nine other people, I plan to look at the recent 10-team fantasy baseball draft in which I participated on Monday, March 24 and make my predictions as to who will be the picks that give each team the most value over the 2008 season as well as the picks that will have the worst value relative to where they were drafted.

First of all, the league format: We play in a 10-team Yahoo! custom points league. Categories for offense are HR, RBI, SB, R, BB, 1B, 2B, and 3B. For pitchers, the categories are W, L, ER, IP, K, Hld, and SV. We have a MLB universe roster of 20 players (5 of which are bench) plus two DL spots per team. There are unlimited trades (with an August deadline) and unlimited other moves as well. The league is a daily league that has been running for five years (with a bit of turnover), but it is not a keeper league.

And by value I am, again, talking about relative to the spot where they were picked. A-Rod will have tremendous value going with the first pick, but that is expected value - what we are looking for is unexpected value or expected value not achieved.

So without any more rambling, here are (in my opinion) the best and worst value draft picks by team, in order of how we drafted.

1. Eric Ramirez

Best pick - Josh Hamilton - 13th round

In half a season last year, the rookie with the troubled past totaled 19 HR and 47 RBI in only 337 plate appearances. This year, he has a guaranteed starting spot on the Rangers and is going to a park that might be the best in the majors for left handed power hitters. Some of the OF taken in the two rounds before Hamilton include Jermaine Dye, Matt Kemp, Jose Guillen and Delmon Young. My guess is that Hamilton out-homers all those guys. In a quick glance at eight popular projection systems, his average home run prediction from all of those is 21. If he stays healthy, he will fly past that number in Arlington

Worst pick - Derrek Lee - 6th round

Not so much that it was a bad pick in the 6th round, but because power is the most sought-after commodity in our league, it was a questionable strategy to wait until the 6th for a 1B - the premium power position. At that point, it may have been wise to wait and grab Carlos Pena, Paul Konerko or Ryan Garko - all of whom went in much later rounds. At age 32, Lee is not the 46/107 guy from three years ago, but he is also not the 22/82 guy from last year. His mean lies somewhere in the middle - but he is just too inconsistent for me as a sixth pick.

2. Kirk Kornegay

Best pick - Joakim Soria - 20th round

I almost had this pegged as Scott Kazmir in the 9th round, but his current injury gives me just a little bit of concern, so Soria is the choice. The Royals play a lot of close games due to an offense that is in the bottom half of the AL. And that is good for a closer, especially one with Soria’s numbers. Soria posted a VORP of 26.4 in 2007 - higher than any of his other Royal teammates. His ERA, WHIP, K/9, K/BB, and BAA were all outstanding in 2007. His BABIP was .264, showing he was a bit lucky, and therefore likely causing a slight bump up in his other peripherals in 2008. However, a closer with 25+ save potential in the last round is a steal.

Worst pick - John Lackey - 7th round

We all know how incredible Lackey has been as a pitcher recently. He was the third best pitcher in the AL last season, and thanks to a significant decrease in his BB/9 and a sharp increase in his LOB%, Lackey made the jump from 13-14 wins per season to 19, and an ERA in the mid-3’s to exactly 3.01. But a strained triceps injury on his throwing arm currently leaves Lackey on the shelf for all of April. Five months from Lackey could still make up for the lost time, especially with an improved offense, but pitchers such as Harang, Oswalt, Beckett and Matsuzaka were still on the board for this pick.

3. Jeremy Gibson

Best pick - H. Bell and R. Betancourt - Rounds 16 and 17

The combined numbers for these two pitchers in 2007 looks like this:

11 wins, 172 IP, 34 ER, 182 Ks, 65 holds, and an ERA around 1.80

Those are fabulous numbers for the amount of innings they account for (out of our 1200 total). Betancourt has some numbers from 2007 that make you feel like the numbers will dip a little, such as an astonishingly low BABIP of .246 and astonishingly high LOB% of 86.4%. Bell also makes you think a little bit because his ERAs in 2005 and 2006 were in the 5’s and his 102 Ks more than doubles anything he has ever done, but even if these pitchers see their numbers decrease by 20% across the board, the 16th and 17th rounds are great for their stats.

Worst pick - Chipper Jones - 6th round

Jones is a great player, and after last year, he clearly has plenty of offense left in that bat. Despite missing almost 30 games, Jones was 6th in the Majors in batting average and VORP and also top 12 in Runs Created (keep in mind, a cumulative stat). So while normally this would be a great pick, there is just too much injury concern for me. Jones has not played in more than 137 games since 2003 and in 2005-2006 he missed at least 50 games each season. Jeremy has an admirable back-up plan with Mike Lowell also on the roster, and I just have this feeling that he will definitely be forced into active duty on Jeremy’s team before too long. But of course this could easily end up being his best pick if it pans out and Jones stays healthy.

4. Ryan Kirksey

Best pick - Dustin McGowan - 19th round

The one pitcher I was targeting most in the late rounds this year, and I kept waiting and waiting, forcing myself to be patient in hopes on one else would take him. McGowan, now 26, saw so many things trend upward from 2006 to 2007: namely his K/9, BB/9, ERA, and WHIP. Plus, he is an extreme groundball pitcher (a great thing seeing as he will be facing some of the toughest lineups in the league on a regular basis) to the tune of 53% in 2007. An improved Blue Jays lineup certainly won’t hurt his case for wins, but a questionable bullpen might. Still, if his Ks and IP continue to go up and his BB and earned runs continue to go down, I should be happy.

Worst pick - Francisco Liriano - 12th round

I will be the hardest on myself. This was a stupid pick here. I thought a lot of people were looking for him about this time, but it turns out most people were going to wait a couple more rounds. They had probably all seen the note that came out Monday that I missed: Liriano will likely open the season with some starts in Triple-A before moving up. Guys like Carmona, Wang and Bonderman went around at that time, and I probably would have been better off with one of those. Even for a talent like Liriano, there are too many questions surrounding a return from Tommy John that I should have waited a few more picks or waited to see how he responds in real game action.

5. Regan Boudra

Best pick - Chien Ming Wang - 12th round

Probably the hardest draft to pick a best/worst value. Every pick seemed to be pretty much in line with the perceived value of the player, so these may be a stretch in a couple of areas. Regan’s quote during the draft of “I’ll take 17 wins even if he only gets 17 strikeouts” was one of the highlights, but it also rings somewhat true. Actually, Wang’s K rate improved to 4.70 per nine last year, up from 3.14 in 2006. Your standard for acceptable K/9 rates for a starter really should be around 5.6 or so, but when 59% of your balls in play are grounders, you are getting plenty of people out that way (even with Jeter and whatever lead glove they will play at first). So 17 wins is certainly achievable, I might even go one or two more for him.

Worst pick - Carlos Guillen - 6th round

I am lukewarm on Carlos Guillen as I had him in the second half of last year (the half he was just good, and not incredible), but he is just a bit too streaky in my opinion. His HR the past 5 years have been 21, 19, 5, 20, 7. His OPS: 859, 920, 803, 921, 753. RC: 99, 112, 51, 107, 55. Still, playing first should keep him on the field more and cut down on his pesky injuries. Guillen is not a bad pick here by any means, but he was the fifth SS taken after Rollins, Ramirez, Reyes and Tulo - so there were options available.

6. Tim Miller

Best pick - Kerry Wood - 20th round

Wood was officially named the Cubs’ closer earlier that day, and the Cubs look to compete in the division, so Wood should factor in many of those decisions albeit if he remains healthy. A K/9 rate of 8.9 last year in limited work is very promising (especially since it was 5.95 in 2006), but a BB/9 rate of 4.81 is decidedly not promising. He has not had a number that high since 2000, so he can certainly get it under control, but with Marmol and Howry breathing down his neck, Pinella does have options if this arm-saving move of having Wood close does not pan out.

Worst Pick - Justin Verlander - 5th round

This may be a bit of a reach on my part, because Verlander is going to be great, but he was the fifth pitcher taken overall in our draft. Only one fantasy preview source that I found (out of about 15) had Verlander ranked fifth. The average of all of those pegged Verlander at 9.4. Still on the board when Verlander went were Beckett, Sabathia, Harang, and Haren - all of whom could make a case to go ahead of Verlander. A couple of things to watch would be the interesting fact that despite the velocity of Verlander’s fastball decreasing the past three years, his K rate has increased. Watch that and his innings count (high for a young pitcher), but the offense behind him should support plenty of wins. Still, JV is Tim’s ace, and should serve him well

7. Joel Ramirez

Best pick - Edgar Renteria - 15th round

While nothing with the glove, Renteria has again turned into a force with the bat. Hitting in that stacked Detroit lineup won’t hurt anything, either. Since our league does not have a middle infielder spot, once the top ten SS went, it was a while until number 11 (Renteria) went off the board. While Joel also drafted Hanley Ramirez, a quick check of his roster shows that Renteria would be a better fit at the Util spot that anyone else in his lineup (Giambi, Rolen, etc.). Renteria has always been a great source for hits and average (although last year he was off the charts and won’t repeat that), but his OBP has also improved four straight years, something that is vital to the Tigers and to fantasy points leagues.

Worst pick - Jacque Jones and Derrick Turnbow - 17th and 18th rounds

Jones also has the benefit of being in the Tigers’ lineup, but he will be batting ninth and is generally projected with a line around .265/.325/.410 and 10-15 HR - in other words no real value in our league. Jones’ Runs Created numbers last year dropped from 85 to 58, thanks in part to a 100-point drop in SLG. His best days are behind him.

Turnbow is in line to replace Gagne if he fails in Milwaukee, but he has two years of overall numbers that are just bad. Actually, the strikeouts have been great at over 11 K/9 the past two years, but the walk rate has been over 6 BB/9 over the same two years. Value comes only if he gets the chance to save again.

8. David Gilly

Best pick - Chad Billingsley - 15th round

There is only one real number I can find where Billingsley did not improve from 2006 to 2007 (his first true chance to start), and that is his HR/9 number of .92, up from .70. Billingsley’s career FB% is around 37.6% or just better than average, so the HR rate is not too much of a concern, especially pitching in Dodger Stadium. Besides that, all of his rate stats such as ERA, WHIP, K/9, BB/9, BABIP, LOB%, and BAA all made significant increases in 2007. And he is only 24 - which means he is still maturing.

Worst pick - Jorge Posada - 8th round

Posada is a fine catcher and should put up great positional numbers in the Yankees lineup. But he was taken as the fourth catcher overall behind the 3M’s - Martinez, Martin and McCann. Fourth is not crazy for Posada, but I think it started a run on catchers too early. In fact, 6 teams drafted their catcher before the 10th round was over. And that may not sound so bad, but that means that 6 catchers were drafted in the top 100 players. And that just can’t be justified. For example, in MLB.com’s top 100 fantasy list, only three are catchers. I imagine David drafted with a few thoughts of Posada’s 2007 numbers in his head. We have discussed this before, but Posada’s 2007 base stats just do not coincide with the rest of his career, and at 37, he can’t be expected to be that lucky again - especially with his incredible .389 BABIP.

9. Justin Jones

Best pick - Curtis Granderson - 8th round

News of Granderson’s broken finger dropped the value of this CF who had a 20-20-20-20-20-20-20-20-20 season (or something like that) in 2007. So this is a guarded pick, but even if Granderson misses three to four weeks from the time of the injury, he should be a steal in the 8th round. Granderson led all AL leadoff men in OPS in 2007 with .926 and the projections of Bill James, CHONE, Marcel and ZiPS all feel that Granderson will repeat his 2007 numbers in 2B, HR and RBI - all very valuable in our league.

Worst pick - Nick Swisher - 4th round

Again, I sort of understand this because first basemen were flying off the board so he had to have somebody, but Atkins, Adrian Gonzalez, Lee, and Konerko were still on the board - all of whom will probably have better power numbers than Swisher. Swisher is famous for his ability to get on base, but with his power, he might be sacrificing some HR (which were down by 13 from’06) for walks and other hits. That’s great in real life, “get on base….help the team,” but doesn’t do too much for us. He can play 1B or OF, so depending on how Justin’s other 1B (Todd Helton) does, this may all be a moot point.

10. Jason Kirksey

Best pick - Adam Wainwright - 15th round

I was all ready to put his pick of Kelvim Escobar in this slot, but then word comes today that his injury might be career-threatening, so that shot that idea. Anyway, Wainwright’s move from reliever to starter was an overwhelming success. While most of his numbers saw a bit of an uptick, understandable due to his innings being almost three times that of 2006, the one number that decreased significantly was his HR/9 from 0.72 to 0.58. This is promising for someone who is going to be the veritable ace of the staff at least until the all-star break. Wainwright is another groundball pitcher (48%), but with more than a third of his batted balls being flyballs, it is good that he is developing a skill of keeping balls in the park at a better rate.

Worst pick - Dontrelle Willis - 14th round

We all like looking for great value in the last quarter of the draft, but…

dontrelle.jpg

And now he moves to a tougher league for pitchers, albeit with a stellar offense behind him. There is also always talk of Comerica being a pitchers’ park, but in 2006 and 2007, it was very average, and actually a better park for hitters on average over the last three years than the Marlins’ stadium in Miami. That could spell trouble.

So there you have it, too much information that too few people don’t care about. I will do a brief recap mid-season of how things are going and then do a seasonal review looking back at these picks and deciding what the REAL best and worst value picks were for 2008.

I would love to hear your complaints or praise on your thoughts on the picks in general.

]]>
http://somebaseballnotes.com/2008/03/26/southeast-invitational-preseason-best-and-worst-value-pick-predictions/feed/ rkirksey dontrelle.jpg
Ultimate 2007 Batting Order http://somebaseballnotes.com/2008/03/20/ultimate-2007-batting-order/ http://somebaseballnotes.com/2008/03/20/ultimate-2007-batting-order/#comments Thu, 20 Mar 2008 18:48:38 +0000 Ryan Kirksey http://rkirksey.wordpress.com/?p=131 ]]>

Using a relatively new tool on BaseballReference.com known as the Batting Order Outcomes, I thought it might be fun to go back and look at last season and construct the ultimate lineup, spots 1-8, using each team’s production in each of those spots as our data.

The way this page works is you can put in any team and any spot in the lineup (1-9) and BR will pull up a page with stats on how that team performed in that season at that spot in the lineup, with all PA included throughout the season.

So, I can quickly go back and see that in 1972, The Boston Red Sox had an OPS of .625 in the 7th spot, with the famous Doug Griffin getting the majority of the plate appearances that year.

Using OPS as our gauge, I will lay out the ultimate 2007 batting order from across the Majors. While the batting order page has incredible splits and breakouts of stats per month, player, inning, relative score, and more, the stats used are pretty basic, so OPS is probably our best bet for this exercise.

Starting with the leadoff position, here is the best from each spot in 2007, with a couple of my random comments associated with each:

1. Florida Marlins - .897 OPS

This one makes sense especially when you consider that Hanley Ramirez was given 706 of the 780 plate appearances for the Marlins in the leadoff spot in ‘07. May and June were actually not kind to Ramirez and the Marlins’ leadoff spot; the OPS totals for those two months in that spot were .738 and .694, respectively. But the next three months had totals of 1.094, .875 and .944 - so he certainly finished strong. In comparison, Ramirez’s two counterparts, Rollins in Philly and Reyes in NY, both contributed to .869 and .772 totals for their teams. Ramirez is expected to move to third in the order in 2008, so don’t look for the Fish to repeat in this spot.

2. St Louis Cardinals - .870 OPS

This one mildly surprised me. No Derek Jeter, Kevin Youkilis, Placido Polanco, or even Hunter Pence took this spot. Rather, the combination of Chris Duncan and Rick Ankiel give the Cardinals the top spot. Certainly helping the cause, Ankiel slugged .603 batting second. Also contributing to the solid .870 number were the OPS numbers by Scott Speizio and Skip Schmaker, who both had an OPS over 1.000 in 131 total plate appearances.

3. Boston Red Sox - 1.034 OPS

No surprises here. David Ortiz ate up 89% of the 751 total plate appearances in the third spot. I have heard some people say that Ortiz had a down year last year because his homeruns and RBI were down from the previous two seasons, but that argument is truly ridiculous. His batting average, OBP, OPS+, Runs Created, and Runs Above Replacement were all the best of his career. His 52 doubles made up for “only” 35 HRs - a number which will likely trend upward in 2008. And in September, during the playoff push, Ortiz’s OPS was a mere 1.355.

4. New York Yankees - 1.069 OPS

Again, no surprises at this spot. Of 744 2007 plate appearances in the #4 spot, A-Rod had 700 of them, with OPS of 1.081. In the few times someone else actually hit in this spot, Jorge Posada, Miguel Cairo and Hideki Matsui all had an OPS of at least 1.000 as well. And quite possibly even more impressive, the Yankees who had the number four spot come up with RISP 243 times, totaled an OPS of 1.127.

5. Toronto Blue Jays - .939 OPS

I probably could have given you a dozen guesses to this one, and you wouldn’t have said the Blue Jays. But there they are - with big Frank Thomas leading the way with his .935 OPS. Actually, while Thomas had the most PAs in that spot, he only accounted for about a third of the total plate appearances. Some of the other notable names hitting in that spot: Aaron Hill, Troy Glaus and Matt Stairs totaled OPS scores of .946, 1.145 and 1.003, respectively. All of these numbers represent significant increases over their seasonal totals.

6. Colorado Rockies - .908 OPS

This spot makes sense as well, with Brad Hawpe demanding 73% of the PAs for the Rockies in 2007. And while Hawpe’s OPS in 2007 in that spot was an incredible .918, it is severely overshadowed by Ryan Spilborghs who had an OPS of 1.212 over 74 PAs in the six hole. In another interesting note, the Rockies only had one month all season (April) where they did not slug at least .500 from the 6th spot in the lineup. Perhaps not surprisingly, that was the month they had a losing record.

7. Philadelphia Phillies - .850 OPS

This spot in the Phillies’ lineup was distributed pretty evenly amongst Abraham Nunez, Jayson Werth, Wes Helms, Greg Dobbs, and Aaron Rowand. Except for Nunez, all other batters had an OPS of at least .847 in the seven spot, with Rowand leading the way a 1.070 over 87 plate appearances. One entertaining and interesting note here looks at when throughout the course of the game the Phillies really produced in the 7th spot. In the 1st-6th innings, the Phillies had an OPS of .885 in the seventh spot, but that number drops to .783 from the 7th-9th innings.

8. Pittsburgh Pirates - .800 OPS

I could probably give you 25 guesses and you would not have picked the Pirates in this spot. I certainly thought it would be Robinson Cano or some other powerhouse offense, not the team that was 12th in the National League in runs scored. But with Jack Wilson and his .825 OPS getting exactly half of the plate appearances, the Cesar Izturis’s, Jose Castillos and Jose Bautistas of the world could not drag down the total number below .800. The second half of 2007 is what tells the story for the Pirates earning this spot - as a team the OPS in the 8th spot after the All-Star Break was an amazing .899.

In an exercise like this, the Magglio Ordonezes, Matt Hollidays and Miguel Cabreras unfortunately get stuck on the outside. But I certainly think that a team composed of this lineup would score an astonishing amount of runs. But just how many? Well, using the basic Runs Created formula, we can come up with a good guess as to just how many.

Formula: ((H+BB)*(1B+(2*2B)+(3*3B)+(4*HR)))/(AB+BB)

Total estimated Runs Created: 1024

In context, the team with the most runs in 2007 were the Yankees with 968 and the average across MLB was 777.

So in other words, we have quite an offensive machine on our hands, even including batters from the Pirates, Cardinals and Blue Jays.

]]>
http://somebaseballnotes.com/2008/03/20/ultimate-2007-batting-order/feed/ rkirksey
Who can beat my two aces? http://somebaseballnotes.com/2008/03/07/who-can-beat-my-two-aces/ http://somebaseballnotes.com/2008/03/07/who-can-beat-my-two-aces/#comments Fri, 07 Mar 2008 22:25:56 +0000 Ryan Kirksey http://rkirksey.wordpress.com/?p=129 ]]>

I work at a place that often gets involved in the political arena, specifically policy recommendation and research, so times like these are often quite entertaining and quite busy. My past couple of weeks have been spent working on plans for various presidential candidates we have invited to come and also hosting an event for Senator John McCain ahead of the Texas Primary. While I am not using all of that as an excuse for the delay in writing, I am using it as segue into what I will discuss today.

You see, when you host an event for a presidential candidate, there are always questions from the guests or from the audience that they would like answered. Inevitably, the two questions always raised are “would you have done X differently if you were president at that time,” and “if you are president, what will happen when you are faced with X problem?”

The candidates are a little more comfortable with the first question because hindsight is always 20/20 and they can come up with a solution that most people would approve and state how much better their solution is than the one that was made. Conversely, they get a little bit more uneasy when it comes to the second question, there are no decisions that have already been made about the hypothetical problem, thus nothing to base their answer on. And who knows, maybe they will be faced with this same problem in office, do they stick with their answer even though it may not be the best one, or decide differently, and risk looking like a liar or a flip-flopper.

I think the same holds true for baseball. It is a bit easier to look back and plug in a different solution/player/strategy than to predict the course of action for a hypothetical game situation or how a season will play out.

And thus is the nature of projections - a lot is usually right and a lot is usually wrong. However, it’s much easier to look back, take numbers we know are facts, and plug in a few new variables to make educated guesses than it is to base future numbers on unknowns.

And with that we turn our attention to the two new aces of the National League: Johan Santana and Dan Haren of the Mets and Diamondbacks, respectively. Fortunately, for the purposes of this research, both of their new teams were involved in tight races towards the end of the season, with the D’backs turning out a lot better than the now-famous collapse by the Mets the last 17 games of 2007.

While we can’t know for sure how these pitchers will perform in 2008, can we at least try to plug them onto their teams last year and see what kind of difference they would have made? Would it have caused Arizona to miss the playoffs? Caused the Mets to make the playoffs? And what is the best way to find out?

Well, also fortunate for us, we know exactly who these two new pitchers will be replacing on their new teams. Haren will replace Livan Hernandez in the rotation (who left for the Twins), while Santana will replace Tom Glavine (now with the Braves). Otherwise, the rotations seem to be the same.

With a little tweaking, and some playing with the numbers, adjusting them from league to league, I think we can tell how Haren and Santana might have affected their new teams had they been pitching instead of Hernandez and Glavine. Comparatively, Haren had 34 starts to Hernandez’s 33, and Santana also started 33 to Glavine’s 34 - so we almost come out event there already.

Here is what I think we should do:

First, we will remove one perfectly average game from Haren’s line and add one perfectly average game to Santana’s so that they will also each reflect 33 or 34 starts (I want to leave in the best and worst games because those are what make a pitcher’s season and define his consistency. See Ron Shandler’s PQS scores for more on that topic).

Second, we will subtract all of the runs allowed by Hernandez and Glavine for their teams last year from the team’s runs allowed total. We will work with both earned and unearned runs here so that the defensive aspect stays constant - it is something pitchers can not control.

Third, we have to add back in to the teams’ runs allowed totals the number of runs allowed by Santana and Haren last year. This is where it gets a bit tricky and where we have to adjust for context. In 2007, the average ERA in the AL was 4.50. In the NL, it was 4.43. So, the AL was about 2% tougher for pitchers than the NL. Keeping unearned numbers the same, we can adjust Santana’s and Haren’s earned run totals by that 2% to get a sensible estimate of what each pitcher would have done in the NL.

We will then check each team’s actual 2007 won-loss record compared to their expected won-loss record using runs scored vs. runs allowed and the Pythagenpat formula: X = ((rs+ra)/g)^.285 for the exponent and then rs^X/rs^X + ra^X for winning percentage. It has been documented that Clay Davenport, who modified the original Pythagorean Theory for win/loss by Bill James believes this Pythagenpat method is an even further improvement, so we will use that one. We will see how many wins better or worse the two teams were in 2007.

Then using the new runs allowed totals and adding them back into their new teams’ 2007 numbers, we can plug these in for runs allowed, adjust for the number of games better or worse they were above the expected outcome, and see where each team would have ended their 2007 regular season. Would the Mets have held off the Phillies? Would the D’Backs have won the division outright? Won the Wild Card? Missed the playoffs?

Here’s the math, starting with Santana:

2007 Mets - 88-74 record - 804 runs scored, 750 runs allowed for Pythagenpat record of 86-76 - two games better than projected
Glavine accounts for 102 runs - subtract from 750 to get 648
Santana accounts for 88 real runs in 2007, 81 earned
Add three earned runs to Santana’s total (an average start for Johan) to make him equal to 34 starts
Santana now has 91 total runs, 84 of them earned
Take 2% away from 84, leaving him with 82 earned runs, 89 total runs
Add 89 back into the 648 left for runs allowed for 737
Pythagenpat forumla:
X = ((804+737)/162)^.285, X = 1.90
W% = 804^1.90/804^1.90 + 737^1.90, W% = .541
New Pythagenpat record: 88-74
New actual record, 2 games better: 90-72

So now the Mets hang on and beat the Phillies (89-73) by one game to represent the NL East in the playoffs. And the 17-game collapse is all but forgotten. Until they get swept by the D’Backs.

And now for Haren:

2007 Diamondbacks - 90-72 record - 712 runs scored, 732 runs allowed for Pythagenpat record of 79-83 - 11 games better than projected
Hernandez accounts for 116 runs - subtract from 732 to get 616
Haren accounts for 91 real runs in 2007, 76 earned
Subtract three earned runs to Haren’s total (an average start for Haren) to make him equal to 34 starts
Haren now has 88 total runs, 73 of them earned
Take 2% away from 73, leaving him with 72 earned runs, 87 total runs
Add 87 back into the 616 left for runs allowed for 703
Pythagenpat formula:
X = ((712+703)/162)^.285, X = 1.855
W% = 712^1.855/712^1.855 + 703^1.855, W% = .506
New Pythagenpat record: 82-80
New actual record, 11 games better: 93-69

The D’Backs had the best record in the NL to begin with, edging out the Rockies for the WC and beating Philadelphia by one game, so it might not look like it would have affected Arizona’s season too much, much less their sweep of the Cubs and then being swept by the Rockies in the NLCS. But, Hernandez did start game 3 of the NLCS, losing it 4-1. Who knows if Haren had started that game what would have happened (especially since Arizona only scored once). But a 2-1 deficit at that stage would have been much less daunting than down 3-0 with another to play in Coors.

So while this is not ground-breaking stuff by any means, don’t be surprised when these guys make a significant difference on their clubs this year, especially if races end up being close like in 2007. It’s impossible to know for sure what will happen this time, but it just goes to show that one guy could make a difference between the playoffs and going home.

If you catch any errors in my math, please let me know.

]]>
http://somebaseballnotes.com/2008/03/07/who-can-beat-my-two-aces/feed/ rkirksey
Two guys taking on clutch http://somebaseballnotes.com/2008/01/23/two-guys-taking-on-clutch/ http://somebaseballnotes.com/2008/01/23/two-guys-taking-on-clutch/#comments Wed, 23 Jan 2008 17:43:24 +0000 Ryan Kirksey http://somebaseballnotes.com/2008/01/23/two-guys-taking-on-clutch/ ]]>

If I hear one more person talk or write about how clutch-hitting ability or the perception of clutch hitting is the most debated and written about statistical anomaly in baseball, I might start going a little crazy. So much has been written and discussed on this topic over the past 20 years that it is getting somewhat ridiculous these days. There is the side that believes certain hitters, whether they are typically good or not-so-good hitters, can somehow routinely deliver in the most crucial of circumstances, and then there is the side that believes it is just about perception, and the people who are commonly referred to as “clutch hitters” are only defined as such because they have performed well a few times maybe on the biggest of stages and they stick out in our minds as having this unique ability.

But finally, two different sites are asking people to put their money where their mouths are when it comes to clutchiness (only one involves actual money, but still…).

The famous sabermetric blogger Tom Tango, aka Tangotiger, is challenging readers, clutch advocates, and the like to a simple contest to see if clutch hitting really can be predicted or continued after it is recognized. In this post on his website, Tango challenges the following:

He wants fans/readers/critics or whomever to pick one guy from “their team” who they feel is the most clutch, assuming they believe that sort of thing. The one guy they would want to have come up in the most crucial, pressure-filled situations and then nominate them for their side. Tango will then pick who he thinks is the best hitter from that same team to compare at the end of the season.

Recognizing that some teams will have one player represent both sides (i.e. the Cardinals with Albert Pujols), the thought is that if that happens he will use the next most requested clutch hitter and who he perceives to be the next best hitter on the team. And this will go on down the line until there is a difference in who the readers pick and who Tango picks. Once this is done, you will be comparing 30 “clutch” hitters to 30 good hitters and Tango is predicting that his hitters will come out on top in the clutch situations.

How will he measure it? A while back Tango invented a measurement tool for in-game situations called Leverage Index. Essentially, it takes each moment in a game, a batter’s at-bat or a pitcher’s confrontation with that hitter, and places it in the context of the game. As the game gets closer to the end, the LI goes up for the hitter or pitcher because you have less time to put your team ahead, tie the score, hold the lead, etc. At the beginning of a game when there is no score, there may be an LI of 1.0 or lower per situation. A batter coming up in the 9th with a tie score could have an LI of 10.0 or so. So games that are blowouts by the 8th or 9th inning would have low LI scores for each situation, but tied games that progress into the 6th, 7th, 8th, and 9th innings would have increasingly higher LI scores.

Tango proposes that we compare the plate appearances with the 50 highest LI scores for each player that is chosen for the project. So you would end up comparing 1500 plate appearances for the “clutch” side to 1500 plate appearances for the “Tango” side. At the end of the season, he will look at the aggregated lines of the two groups to see which performed better in their most crucial situations.

He believes if the players actually chosen as clutch hitters do perform better in those situations, there will be some sort of statistical significance separating the two groups. Not to spoil the surprise, but he is not expecting that separation to be there.

In a related story inspired by Tango’s challenge, blogger Phil Birnbaum has laid out his own challenge to clutch advocates in this post on his website. His project is a little more risky on his part and actually involves money changing hands. The details look like this:

Birnbaum is not proposing to compare clutch hitters to other good hitters, but rather perceived clutch hitters to proposed choke hitters - those who absolutely fail in pressure situations. He challenges readers to pick any number of clutch and choke hitters; one vs. one, 30 vs. 30, 100 vs. 100, whatever. And he has finally settled on odds for the bet - 2:3. So he is proposing to every bettor whose clutch players out-perform the choke hitters, he will pay them $10, but if your choke hitters out-perform the clutch hitters, you have to pay Birnbaum $15. The reason he does not offer 1:1 odds is because he states that if you accept those odds, you are basically saying whether or not a player is clutch or not is essentially “a coin flip”, and this bet is supposed to attract those who believe clutch hitting exists.

If you accept this bet, you can define the players, you can define the amount of money, you can define the metric used to measure the players (as long as it revolves around batting average, such as BA in close and late, or LIPS), and you can control your sample size. Hitters suggested with obviously different skill levels (someone wanted to use Ortiz vs. Kevin Millar) will be judged on clutch differentials from seasonal numbers and not overall performance in the defined situations.

What he wants to see is if there is anyone out there who believes with a there is at least a 60% probability (hence the 2:3 odds) that you can predict hitters who will perform extraordinarily in the clutch. Some have accepted and some have declined, but there has definitely been some action on this post. You can also make the bet for charity, with the loser paying the amount to the charity of the winner’s choice. Email Phil Birnbaum from his website, http://sabermetricresearch.blogspot.com/, if you want to take part.

As you know, clutch is a very tricky thing. It has been shown to have patterns across seasons but not necessarily across careers, and every new study that comes out seems to contradict or challenge the rest. And you have so many competing ways to measure it, that it all gets lost in the shuffle anyway. Perhaps, at least for 2008, these two projects will use some real-life examples to put some of the issue to rest. Until it all comes up again next year, that is….

]]>
http://somebaseballnotes.com/2008/01/23/two-guys-taking-on-clutch/feed/ rkirksey
Projecting projections http://somebaseballnotes.com/2008/01/07/projecting-projections/ http://somebaseballnotes.com/2008/01/07/projecting-projections/#comments Mon, 07 Jan 2008 22:55:54 +0000 Ryan Kirksey http://somebaseballnotes.com/2008/01/07/projecting-projections/ ]]>

It seems quite hard to believe, but we are only three weeks from February 2008. And since February is still the month when pitchers and catchers report, we are drawing very close to a new baseball season.

After the first of year, many, MANY websites and media outlets will begin posting predictions and projections for the 2008 season; who will win the divisions, win the big awards, have the best fantasy seasons, etc. And while most of those are fun to read at a slow work day, they are just the guesses of the writer or team of writers that put the information together, and are very rarely based on anything quantitative or conclusive. You will get a lot of people saying A-Rod or Ortiz or Miguel Cabrera will win the AL MVP next year. Wow. Was that hard to come up with? My wife could tell you that much. It’s the projections that are actually based on something other than gut feelings and conjecture that are truly interesting to look at.

And that’s what I want to explore today. Most of what you will read below is still very much unknown by John and Jane Baseball Fan, who rely on their local paper and ESPN for projections of a new season. But in the past few years, a few dedicated people and ambitious websites have taken on the goal of trying to come up with the most accurate projections possible. Whether the goal is winning a fantasy title, creating an accurate formulaic projection, or just the desire to be right, all the examples you will see here want the title of the most accurate season forecaster.

And while it may seem intimidating to many (including myself) to stop thinking of projections in terms of, “well this player is a little older, he won’t do as well this year,” or “this team added some great hitters, they will definitely score more runs (ahem, Astros),” and start thinking of things like park effects and team run differential and pitcher LD/FB ratios, in the end we will come up with a much better product.

So let’s get started with the most familiar of the bunch first:

The Bill James Handbook 2008 & BIS Pitcher Projections

Every year, Bill James and his partnering organization, Baseball Info Solutions, release a set of projections for the upcoming season. These are usually the first to arrive, and if you search the player pages on www.fangraphs.com, you can find the simple projections for most players next season (more on that in a minute).

What these projections offer is pretty simple. For hitters, you will find your basic projections for stats like average, RBI, HR, plate appearances, hits, walks, OBP, SLG, OPS, etc. But, there is also valuable information found in projections for less popular stats like BB%, ISO, BABIP, Runs Created, and RC/27. Lance Berkman, for example, is projected to have a .954 OPS and create 124 runs in 2008. And those are numbers not based on Miguel Tejada in the lineup. Your pitching stats will encompass the normal wins, losses, ERA, innings, strikeouts, etc. But also available are K/9, HR/9, WHIP, and pitcher BABIP numbers amongst others.

Without the exact numbers and formula, all I can say is that the projections for players in the Handbook are based on past performance, then three things are factored in, age, projected playing time, and park effects. The only downside to the Bill James projections is that they are the first projections out, usually by the first of December, and therefore can not account for things like recent trades that have just happened, injuries during spring training, or anything that might affect where a player plays or how much he plays. And some numbers are not available for rookies of the previous year. So if you want to know how Troy Patton will perform in 2008, you will have to look elsewhere.

You can pick up the Bill James Handbook at the publisher’s site here for $21.95.

PECOTA - Baseball Prospectus

PECOTA, which stands for Player Empirical Comparison and Optimization Test Algorithm, is the creation of Nate Silver at Baseball Prospectus and is possibly the most comprehensive projection system out there. It covers not only every major league player, but also projections for team records for the upcoming season.

Using a vast series of computations and algorithms including the not-so-well-known reliever leverage, BatDelta, PitDelta, and others that are incomprehensible to a run-of-the-mill numbskull like me, every player currently in a major league uniform is given a projection for the next season. Trying to simplify it all, the process works like this:

Using some of the same methods such as projected playing time, age, park index, risk for injury, past performance, etc., a projection is made for each hitter and pitcher.

The team totals for hitters in stats like Avg, OBP, SLG, are used and plugged into David Tate’s Marginal Lineup Value Formula, and an estimate of runs scored for each team is then spit out taking all of the factors for each hitter into account. So, an estimate for a team is based on the 650 PAs the everyday leadoff hitter might have, plus the 75 PAs a bench player might have who rarely gets in a game, and everyone in-between. Even forecasts for stolen bases are included.

For pitchers, the exercise is simpler, as they just total up the projections for runs allowed from each pitcher on the staff and tally those together for a team. You now have a team where you guess the number of runs they will score and the number of runs they will allow.

These runs scored and runs allowed totals for each team are then plugged into the Pythagenport formula, which translates those numbers into a win-loss total for every team in the Majors. Once you have those, it is easy to see who the formula thinks will have the most wins, losses, etc. for a season. and you can rank them by division, which BP does before every season.

PECOTA usually goes as far as to project up to five seasons for all players that are input in the calculations. But, because of the variance of numbers from year to year, players changing teams, and schedules that shift from year to year, the team won-loss records are typically done a year at a time.

The criticism I hear about PECOTA is that is very often seems to be too conservative. But some of that criticism tends to be unfounded. Using Berkman as an example again, he had that monster 2006 season, but PECOTA projected him at 32 HR and 100 RBI. This was laughed at more than once, but sure enough, Berkman finished the season at 34/102.

Every player has a PECOTA page on Baseball Prospectus, but you have to pay for a membership to be able to view them. But I can vouch that it can be very helpful for fantasy purposes. Team record projections are usually posted in an article on the front page, but also can only be fully viewed by members. A one-year subscription is $34.95 for the site.

Ron Shandler & Baseball Forecaster

Ron Shandler is the fantasy guru and stats projectionist made famous in the book Fantasyland by Sam Walker (a great read by the way) and the famed experts fantasy league, Tout Wars. Shandler’s Baseball Forecaster website (www.baseballhq.com) claims that it is “Since 1986, the industry’s leading resource for creating fantasy baseball winners.”

Shandler claims to be the first to acclimate sabermetrics, including some stats of his own creation, into fantasy baseball projections and predictions. And his numerous expert league titles and awards won by his website and projections definitely show that he is one of, if not the best in the business.

The mantra that Shandler uses to lure people to his product is that winning in fantasy is NOT, in fact, all about luck. Any fantasy player will try and tell you that he was lucky to pick up Magglio Ordonez in the 11th round last year or something similar, but Shandler doesn’t see it that way. He believes luck can be managed and even predicted.

In a world of fantasy- and real-life-based projections that are clouded by simple and non-descriptive stats, he takes what he calls a “components-based approach” to player projections. While a lot of times on this site, we talk about the numbers beneath the numbers and how certain things lead to more common stats we see like batting average, RBI, ERA, etc, Shandler really gets into the numbers beneath the numbers beneath the numbers of the common Fantasy stats out there.

It can sometimes seem like a daunting task to try and grapple with some of Shandler’s stats like qERA, the RIMA plan, and reliability score, but fortunately with each purchase of his book or online projections, a comprehensive glossary with explanations is included. The final outcome is very comprehensive in its format, showing all stats, rates, and formulas used to pull together the numbers for a specific player, culminating in the final projections for that player for the upcoming season.

This book is also now on sale and can be purchased here for $24.95. And of you order the book directly from Shandler Enterprises, he will also send the book in an online format, complete with PDF and excel files for all charts; a handy tool for draft day.

There also is some valuable information that can be had for free on the website, beginning with an option to sign up for a free e-newsletter that comes every Friday from January to September with projections, trends, FAQs, etc. Also check out Shandler’s essay on projective accuracy located here.

Marcels and Tangotiger

This set of projections is hot off the presses for 2008 and is available on the incredibly helpful website www.hardballtimes.com. Marcel projections are comically named for Ross Gellar’s monkey on the show Friends, as in they are so simple that even a monkey could do them.

Worked out by the famous sabermetric/projectionist blogger Tangotiger, Marcel projections (fromTHT website) “simply consist of averaging a player’s previous experience (with greatest weight on the most recent years) and regressing to the major league average depending on the number of years the player has been in the majors. This is done for each component (home runs, doubles, walks, etc.) A simple aging factor is applied, but no park factor.”

Easily laid out into projections for hitters and pitchers Marcels are laid out on The Hardball Times by default to rank hitters by GPA (gross production average, a stat developed by THT that is in the same family as OPS, but more accurate. GPA is laid out like batting average when determining good and bad. The formula for GPA is (OBP*1.8+SLG)/4, adjusted for ballpark factor), and pitchers by ERA. But each chart also has at least a dozen stat projections that are sortable for the upcoming predictions in 2008.

This link specifically lays out how Tangotiger determines Marcels, with numerous helpful comments attached that provide more insight into specifics below the essay. He does not use GPA in his projections, that is done after the fact by THT.

Essentially, Marcels counts on a baseline average for all major leaguers and uses that as a starting point for comparing hitters and pitchers and then relies on Tango’s 5/4/3 method of weighting the three previous seasons of data for a player and using that compared to projected plate appearances to come to the final conclusion. One downside is that all rookies with no MLB experience will be rated as being league average, which is not always the case.

Also in an interesting piece, Tango looks at Marcels related to other prognosticators (circa 2007) and compares them mathematically to the actual stats and each other, so it’s proves helpful to see how these different systems rated to at least one of their peers. Specifically look at comment number five by Nate Silver.

And while Marcels are free, THT also sells The Hardball Times Season Preview 2008, which includes newly developed player projections by The Hardball Times as well as projections for each team and also for players’ careers. The book is available here and sells for $17.95.

ZiPS and Dan Szymborksi

Right now, on the website www.baseballthinkfactory.org, Dan Szymborski is a little more than halfway through his yearly set of team by team articles complete with their ZiPS projections for 2008. ZiPS is a computer-based projection system that stands for sZymborskI Projection System.

Like many others, ZiPS relies on weighted yearly stats (four years for this system). But a difference in this projections system from many others is that Szymborski doesn’t try to compare his numbers and averages to individual players with similar past performances or results, but rather large groups with similar characteristics where he can concentrate on comparing larger sets of data against a more balanced field.

Stats such as BABIP, K-rate, and Speed Score and relied upon heavily in these projections for hitters and pitchers. And like the others, playing time and PAs are based on most recent playing time and PAs for specific players.

Szymborski also projects league offensive totals for upcoming seasons by weighing recent years for the separate leagues and looking at trends in offense, defense and pitching to determine new numbers. Projections for hitters can then be compared to these offensive totals or averages to see how they will compare. He also breaks them down by position in each league to truly compare apples to apples.

ZiPS projections are done for a very large number of players, some of which realistically have no shot at much playing time in the major leagues, but should they get that chance, Szymborski takes all of their factors into account as well and predicts performance should they get a shot. If using these for fantasy purposes, he leaves it up to the reader to determine whether they feel like they should take a shot on someone who may or may not have that playing time.

For now, the ZiPS projections are completely free and accessible through Baseball Think Factory. The articles on each team are a very good read and can easily be found on the front page. Most recently completed are the 2008 White Sox.

So there you have it, 2500 words about some of the most well-known and accurate projections out there. The question I hear the most is “well yeah, but how accurate were they in their projections? Which one should I use?” Well, find out for yourself. Pick up a couple of the books or check out a couple of the websites and look at predictions from last year and compare to the real numbers we now have. You will find a favorite eventually; one that peaks your interest based on what you really want to use them for.

If you know of any others that you would like me to research and get some info (like CHONE, Rotowire, etc.), let me know and I will see what I can do.

]]>
http://somebaseballnotes.com/2008/01/07/projecting-projections/feed/ rkirksey
Beauty and the perception of beauty http://somebaseballnotes.com/2007/12/09/beauty-and-the-perception-of-beauty/ http://somebaseballnotes.com/2007/12/09/beauty-and-the-perception-of-beauty/#comments Sun, 09 Dec 2007 05:13:27 +0000 Ryan Kirksey http://somebaseballnotes.com/2007/12/09/beauty-and-the-perception-of-beauty/ ]]>

There are not many things more beautiful to me than a ballpark open for the first time in the spring, or a perfectly executed hit and run, or a majestic homerun that clears a park. I can always find beauty in the simplest of forms at a baseball game, and there are not many things that rival what I see at the park.

But something that tops everything on that list is my new baby girl.

I have taken the past eight weeks off from doing something I love that in the end means nothing to spend time with someone I now adore and that now means everything. Now that things are starting to get back to a normal schedule (or as normal as it will be), I hope to be able to pick back up where I left off and get back to some research.

While I will never doubt the beauty of my new daughter, beauty on the baseball field or in the box score is something that has been debated for more than a century. Specifically with statistics, as we have seen in the past, the naked eye can often lie when it comes to observing and, in turn, trying to qualify a “good” player. Everyone knows the old quote from Bull Durham about the difference between a .250 hitter and a .300 hitter:

“…one extra flare a week, a ground ball, a dying quail… you’re in Yankee Stadium.”

Essentially, it’s VERY hard to tell between a mediocre, .250 hitter and a great .300 hitter. So when fans, announcers, managers, or anyone make general statements about how hitters perform based on what they see or what they believe, it’s always best to take it with a grain of salt.

A situation like this came up towards the end of the 2007 regular season as I was watching an Astros/Brewers game in late September.

In a game that featured two of the Majors’ top rookies for the season, the announcers on Fox Sports began discussing the value that Hunter Pence and Ryan Braun had on their teams this past year. In noting that both of them had very good batting averages (Braun finished the year at .324, Pence at .322) a comment was made along the lines of “rookies will typically hit for a higher average when they arrive in the majors because the quality of the pitchers is much better in the majors and they are able to be around the plate much more than their minor league counterparts.”

I don’t have the transcript of the game in my possession, so please don’t take that word for word, but the general idea is there. That because hitters see more hittable pitches when they come to the majors, they will be better hitters when it comes to average.

So I immediately thought, can this be true? Never mind that pitchers in the majors hit their spots better and their fastballs are faster and their breaking balls have more movement. And forget that defenses are better, travel is more brutal, and playing time for rookies is usually more sporadic; does that actually translate into better stats for rookies when they are facing tougher competition? That got me thinking about 2007 and using it as a case study for rookie production in the majors vs. their minor league numbers.

These broadcasters did not qualify their statement by specifying any level of the minor leagues, so it is pretty easy to pull a list of rookies and their 2007 MLB batting averages and compare them to their minor league career averages. I chose rookies with at least 150 plate appearances so we could see hitters who at least had routine/daily at bats. Here is the list of the 55 who qualified (actually there were 56, but Akinori Iwamura has no minor league stats to work with) ranked in order of their 2007 MLB batting average:

rookie-average-2007.jpg

A simple count of these rookies shows that only 14 out of 55 (or 25%) out-performed their career minor league batting averages in their first major league season. And out of those 14, four of them beat their minor league total by .005 or less. Running a simple correlation of the two sets of numbers shows that the two sides (minors and MLB 2007) are not statistically significant (with r=.191 and p=.162). Simply speaking, looking at a player’s minor league average before 2007 would not be a good way to predict or even estimate their batting averages as a major leaguer in 2007.

You will always have your studs coming out of the minors who find a way to translate that talent into almost instant success in the majors such as Ryan Braun, Hunter Pence, and Troy Tulowitzki. But does everyone remember all of the experts’ preseason Rookie of the Year, Kansas City’s Alex Gordon? He was actually being hailed as the next Mike Schmidt. But after a few benchings and a .247 average on the year, he did not receive a single vote in the category. And what about Justin Upton, Elijah Dukes, and others who were supposed to pay immediate dividends? There are plenty just like them who did not pan out as originally advertised. And not to say Gordon won’t become Schmidt….just not this year.

So, if average is not a good predictor of success from the minors to the majors, what might be? We need to look at a more cumulative offensive statistic, not just one that says, “I got this many hits in this many at-bats.”

What I want to propose is Runs Created per Game or RC/27. We are all pretty familiar with the stat Runs Created. It simply takes into account a player’s offensive production based on runs he created for himself and for others on his team and tallies it into a calculable, sum total. What RC/27 does is ask the question, “what if there was a whole lineup of X player? How many runs would that lineup score per game?” For example, in 2007, the top three in the category were David Ortiz (surprisingly first at 10.86 runs/game), Alex Rodriguez (10.49), and Magglio Ordonez (10.12). That tells you how good these guys were - can you imagine a team that would average more than 10 runs per game? The Yankees had the highest average in 2007 with 5.83 runs per game (and their best month was September at 6.67).

Anyway, RC/27 will take into account not only the runs created by the batter by themselves as well as opportunities presented to that player by teammates and how he performed in those circumstances. Using the same 55 players, here is the list of their career minor league RC/27 numbers vs. their numbers in their rookies seasons of 2007:

rookie-rc-27-2007.jpg

Running the correlation again, we see that the numbers for RC/27 comparing minors to 2007 MLB ARE statistically significant (r=.268 and p=.05). So while not perfect, Runs Created per Game would be a much more reliable stat to judge performance across levels of competition.

My guess is that this would be partially due to the fact that a player’s pure talent should eventually translate across the levels he plays in, whether good or bad, in looking at how he performs on offense individually. Average only accounts for one piece of the offensive puzzle: how many times did I get a hit in my times at bat? It doesn’t account for walks, what type of hit it was, who was on base, whether they got the hit with one out or two outs, etc.

Another theory of mine is that in the majors, these rookies will be playing and batting in a lineup of players that (should) actually belong in the majors. I imagine that would lead to more consistent opportunities of plate appearances with men on base, men in scoring position, and also competent hitters batting behind them, allowing something like RC/27 to stabilize quicker with less variance than something like average where it is solely reliant upon batter and pitcher; one at bat. But, then again, that’s just my opinion, and the topic of a whole other post with different numbers to crunch.

Unfortunately, this is a difficult study to continue to quantify. The statement proposed by the announcers about the averages in their rookie seasons qualifies the research and limits the set of data we can use for the players. Once their second year comes around, they are not rookies anymore and their MLB numbers can’t be used anymore.

But if someone wanted to take on the task of comparing the numbers from say 1986 to 2006 for rookies and see how they correlate, I would be very interested to see it. Would average then become significant over 20 years? Would RC/27 become less so? I would be curious to know.

Just be sure to always question what you hear if it doesn’t sound right to you. There’s a good chance it’s not based on facts.

And welcome back to Baseball Notes. More to come soon…

]]>
http://somebaseballnotes.com/2007/12/09/beauty-and-the-perception-of-beauty/feed/ rkirksey rookie-average-2007.jpg rookie-rc-27-2007.jpg
The Worst of the Worst for 2007 (or, anyone can rank the best players, that’s boring) http://somebaseballnotes.com/2007/09/30/the-worst-of-the-worst-for-2007-or-anyone-can-rank-the-best-players-thats-boring/ http://somebaseballnotes.com/2007/09/30/the-worst-of-the-worst-for-2007-or-anyone-can-rank-the-best-players-thats-boring/#comments Mon, 01 Oct 2007 04:47:59 +0000 Ryan Kirksey http://somebaseballnotes.com/2007/09/30/the-worst-of-the-worst-for-2007-or-anyone-can-rank-the-best-players-thats-boring/ ]]>

With another regular season come and gone, you will hear a lot of debate amongst the experts over the next couple of months as to who should win the particular offseason awards. But, let’s face it, besides AL Cy Young and NL MVP, the names are already engraved on the trophies for these prizes.

So I thought it appropriate to recognize those that never get their due, those that won’t even come close to being mentioned, those who don’t even deserve to be mentioned. And what I came up with was the 2007 Worsts Team. This team defines the absolute worst there is in offensive baseball. The most god-awful at each position on the diamond, save the pitcher. But there is one caveat; for some unknown reason, their respective teams stuck with them.

To qualify for this list, you had to qualify for MLB’s offensive categories, which means 3.1 PA per team game, or 503 PA on the season. That means that no matter how bad the player was (and we will see some bad ones), their team sent them to the plate more than 500 times over the course of the season.

The stats I will be using are Value Over Replacement Player and Runs Created for the season (remember, if you need a definition of each, check out the Stats Glossary tab). I looked at the list of the worst qualified performers in each and took the composite score of the two rankings for each player on this list, ranked by position. With only one exception, you will see the eight players with the worst overall rankings in VORP and RC for 2007. One for each position.

First, the boring stuff. Here are the top ten for both VORP and RC in 2007:

VORP
Alex Rodriguez - 95.1
Hanley Ramirez - 90.2
David Ortiz - 86.6
Magglio Ordonez - 85.7
David Wright - 81.3
Chipper Jones - 76.6
Matt Holliday - 74.2
Albert Pujols - 73.8
Jorge Posada - 73.8
Miguel Cabrera - 71.4

RC
Alex Rodriguez - 164
David Ortiz - 156
Magglio Ordonez - 149
Matt Holliday - 148
David Wright - 146
Prince Fielder - 143
Hanley Ramirez - 142
Albert Pujols - 133
Carlos Pena - 132
Jimmy Rollins - 132

These end up being petty consistent lists, with seven of the top ten being the same in both lists (and all of the top ten in VORP are in the top 20 of RC). But that’s not why you are here.

So without further ado, the worst of 2007:

Catcher

A.J. Pierzynski
9.8 VORP, 53 RC

Only nine catchers qualified with at least 503 PAs, and five of those only beat 503 by 15 or less PA, but Pierzynski was the worst of the lot. With an OPS that barely reached .700, Pierzynski was one of several White Sox that succumbed to their inevitable decline after great years in 2006. His average dropped by more than 30 points from the previous year and he ended up being only 10 runs better than the catchers Chicago had on the bench.

First Baseman

Kevin Millar
12.1 VORP, 73 RC

A major league first baseman with 16 HR and 62 RBI over 558 PA might be OK if he was playing for the 2007 Yankees or Red Sox, and they didn’t need his bat. But that is not what the 2007 Orioles were. Baltimore ranked 10th in the AL in OPS, and could have used a first baseman with some legitimate power. His 75 walks are commendable, however.

Second Baseman

Jose Lopez
-10.9 VORP, 48 RC

Yes, that is a negative VORP you see there, meaning any old scrub playing second would have been 11 runs better than Lopez given the same PAs over the season. 538 PAs for this kind of production is unexcusable, as his VORP and RC numbers were both in the bottom five for all of MLB. The fact that Seattle was once so close to a playoff spot makes this even more of a head-scratcher. Why would they leave Lopez in the lineup for so long?

Shortstop

Tony Pena, Jr.
-7.8 VORP, 47 RC

Another player with bottom five numbers in both VORP and RC for all of MLB - and that is over 533 PAs. Actually Pena and Omar Vizquel both had the same composite ranking score of 4.5 on the lists, but since Vizquel is once again tops on everyone’s list of the best defensive shortstops, the award goes to Pena.

Third Baseman

Nick Punto
-26.9 VORP, 41 RC

Here we are, the worst of the worst for 2007. Punto, in 533 PA, was the worst in both of these categories while batting .211 and posting an OPS below .600. This means the Twins must have had absolutely NO ONE on the bench whom they thought could replace Punto, because his numbers are by far the worst amongst MLB regulars in 2oo7. Forget talking about extra runs here, the Twins could have had almost three extra wins if Punto had never been in the lineup and some other replacement-level player was. A horrible year.

Left Fielder

Jason Bay
4.2 VORP, 78 RC

The rotisserie darling of so many for two years, Bay struggled mightily this year, batting only .248 in 2007 with an OBP of .328 and SLG of only .419. His power numbers of 25 doubles and 21 HR dropped significantly compared to the past two years. And forget the 21 SB from two years ago. He had but four this year. And he stayed in the Pirates lineup all year, totaling 613 PA over the season.

Center Fielder

Andruw Jones
5.2 VORP, 74 RC

Here is the one exception to my rule of the composite rankings, because Bill Hall actually was worse than Jones according to those rankings, but Hall totaled 503 PA (for his 6.6 VORP and 60 RC) while Jones did it over 659 PA, the most for any player on this list. So much for players having that extra little something in contract years. Jones has not had numbers this poor since his 20-year-old rookie season of 1997. His strikeout rate increased this year as well; he finished with the third highest total of his career. See my other thoughts on Jones here.

Right Fielder

Brian Giles
10.8 VORP, 72 RC

Another former All Star makes the list in 2007. I wonder if moving Giles to the leadoff spot midway through the year had anything to do with his decline in numbers this year - I guess we will have to see what the Padres do in ‘08. Missing a significant amount of time due to injury surely hurt Giles here, and could move Delmon Young up to this spot, but Giles did post the lowest OBP and SLG numbers of his career when he was in the lineup. Even Giles’ BB rate, something he has been famous for, dropped to a career-low 11.6% this year.

So there you have it: My Worst of the Worst team for 2007. Not surprisingly, only one of these players is on a (potential) playoff team - Giles. Teams with a hole as big as these players in their lineups generally will have a tough time making up for the missed production elsewhere, particularly in the NL, where these batters are always asked to bat higher than ninth.

Any disagreements? Let me know in the comments.

]]>
http://somebaseballnotes.com/2007/09/30/the-worst-of-the-worst-for-2007-or-anyone-can-rank-the-best-players-thats-boring/feed/ rkirksey
The defensive spectrum and it’s offensive correlation http://somebaseballnotes.com/2007/09/21/the-defensive-spectrum-and-its-offensive-correlation/ http://somebaseballnotes.com/2007/09/21/the-defensive-spectrum-and-its-offensive-correlation/#comments Fri, 21 Sep 2007 19:05:40 +0000 Ryan Kirksey http://somebaseballnotes.com/2007/09/21/the-defensive-spectrum-and-its-offensive-correlation/ ]]>

Since the first Baseball Abstract was published some 30 years ago, Bill James has been labeled a lot of things: revolutionary, heretic, genius, fraud, etc. His analysis and research has been praised and trashed at the same time. His work has been studied and acutely used by some big league clubs, and laughed at by others.

He has been credited for starting the suddenly not-so-underground sabermetric style of baseball analysis, and accused of reducing players and games down to mere data, not taking into account a manager’s keen eye or a player’s “makeup.”

Some of his research has even been labeled as unfounded, unusable and as something with no data to back it up. One of these innovations is the defensive spectrum.

Quite simply, it is a spectrum drawn out from all defensive positions ranging in order of the least difficult to play to the most difficult. The idea is that a player can easily go from right to left on the spectrum as he gets older and some of his speed diminishes, but it is much harder to go from left to right, no matter what point of his career the player is in. It looks like this:

1B - LF - RF - 3B - CF - 2B - SS - C

(We will leave pitchers out of the equation since this will inevitably be related to offense, but they typically fall on the far right)

With no easy way to compare defensive difficulty across position, this spectrum looks pretty good at first glance. Obviously the skill set of a catcher is much more rare than that of a second baseman, which is much more rare than what a first baseman can provide.

Also associated with this spectrum, James said, is an expected offensive production. As you move more to the left on the chart, and you are not expected to provide as much defensively, you better be carrying your weight offensively. And that makes sense at first glance. Any baseball fan or Rotisserie player can tell you, your sluggers tend to be your first basemen, outfielders, and third basemen, while you see very few guys who play up the middle slug 40 homers or accumulate an OPS of 1.000.

What I am going to look at is have offensive expectations changed since this spectrum was first introduced in the 1980s? Or, more precisely, how accurate is the assumption that just as catchers offer more defensively than shortstops who offer more then second baseman, etc., first basemen offer more offensively than left fielders who offer more than right fielders, etc.

You see, while we can look at metrics such as zone rating and Fielding Runs Above Average, there is no way to compare specific, defined skill sets like blocking wild pitches or turning the double play or routes taken to flyballs across positions. But with offense, we can equally compare contributions. We know the sole objective of offense, which is to create runs for your team and do this at a better rate than the other team. That is the responsibility of every player whether they are a catcher, DH, right fielder, or whatever. And, yes, there may be a number of different ways to accomplish that goal of pushing runs across the plate, but the eventual desired outcome is always the same: score runs.

So, for 2007, let’s see how close James’ spectrum is to predicting its reciprocal: offense.

From what I can tell, a little bit of research has been done on this before a few years ago. This link takes you to a post where Mike Mehl used the OPS of the top players in the 2003 season to plot the offensive range by position. His conclusions were that the majority of your low OBP, low SLG batters fall on the right side of the spectrum, with only a few exceptions. And most of the batters on the opposite side had high OPS, although admittedly skewed by the Barry Bonds’ of the world.

For this post, I am going to use Runs Created and VORP as the comparative metrics for offense. Runs Created because it is an easy way to compare apples to apples. A run is a run is a run. Did you create a lot of runs or did you not? And VORP because it compares positions to themselves. There are stated, specific offensive factors when discussing VORP for each defensive position and its relative value of replacement level. All OF positions are weighed the same in VORP, but that will suffice for our exercise.

What I have done is taken the average number of Runs Created and VORP for the top 30 in each of the eight listed positions of the spectrum. I then charted it on a graph in the order of the positions on the spectrum. So, if the theory is correct (at least for this year), we can expect to see two lines that go from the upper left corner of the graph to the bottom right. Below is the graph and the numbers represented with each. Click on the graph link to show the full image.

rc-and-vorp-2007-positions.jpg

numbers-for-defense-rc-vorp.jpg

So what do we notice here with these numbers? Generally the line we would expect is there, but with a few interesting variations to point out:

1. Shortstops - While this position is still generally considered to be one where you have to have a decent glove to play, you can see that the average RC and VORP for SS in 2007 has now reached 3B and CF levels. Someday, I will do this exercise again for 1987, 1967, and 1947, for example, to see how the numbers compare. But the facts are that there are only 31 players in MLB with over 100 Runs Created, and four of them are shortstops (Ramirez, Reyes, Jeter, and Rollins). These four carry the group, but you still have Guillen, Cabrera, Young, and Renteria with more than 85. And another nine have 70 or more RC. The total of SS over 70 RC is 17, or more than half of MLB. Clearly, this position is becoming one where both defense and offense are valued. But I bet if you polled GMs across the league, they would pick great defense if they were forced to pick one quality their SS had.

Unfortunately, only one of these top eight SS (Reyes at #2) is in the top nine for Revised Zone Rating for 2007. Jeter, Ramirez, and Guillen are all in the bottom 10. So it’s still tough to find a SS who does both well.

2. Right field - RF in 2007 has been the Magglio Ordonez and Vladimir Guerrero show. Can you name a player who is clearly the third best RF this year? It’s not easy. In fact, there is a 26 point gap in VORP between second place (Guerrero - 62) and third place (Corey Hart - 36.9). The same goes for RC: there is a 25 point gap between Guerrero (123) and Abreu in third (98). In fact, there are only eight RF with more than 90 RC while there are seven SS!

But is this a growing trend? Looking back, there have not been more than eight RF with more than 90 RC in a season since at least 2003. Looking at our graph, RF VORP is right where it should be based on those numbers, while Magglio’s astounding 144 RC skews that plot line north a little bit.

3. Catchers - Obviously, you see how wide the gap is between your average catchers and the rest of the seven positions. There are only three catchers (Martinez, Posada, and Martin) with more than 70 RC so far this season. And the same three catchers are the only ones at their position with VORPs over 27 for the season. I actually count eight starting catchers with a negative VORP, meaning whomever the team could call up from Triple-A would probably be better offensively.

While the defensive spectrum is nonscientific and purely speculation based on perceived defensive attributes and responsibilities, it does seem to serve the purpose of evaluating where a player can move on the diamond as his skills diminish. It also seems to prove for 2007, except for the recent outlier of shortstops, that players are increasing their offensive skills as they move down the spectrum.

If you know of any examples of players moving in the opposite direction of the spectrum as their careers moved on, let me know. I would be interested to study them and see why that was. Of course, the most recent famous example of the spectrum at work is Craig Biggio who moved from catcher to second base to center field to left field in his career. Now back at second because of lack of options for the Astros, I hear he is going to play all of his old positions in his final game on September 30.

]]>
http://somebaseballnotes.com/2007/09/21/the-defensive-spectrum-and-its-offensive-correlation/feed/ rkirksey numbers-for-defense-rc-vorp.jpg