1 00:00:04,500 --> 00:00:06,110 Towards the beginning of this lecture, 2 00:00:06,110 --> 00:00:08,310 we stated that the goal of a baseball team 3 00:00:08,310 --> 00:00:11,700 is to make the playoffs and we built predictive models 4 00:00:11,700 --> 00:00:13,790 to achieve this goal. 5 00:00:13,790 --> 00:00:16,140 But why isn't the goal of a baseball team 6 00:00:16,140 --> 00:00:20,180 to win the playoffs or win the World Series? 7 00:00:20,180 --> 00:00:23,060 Billy Beane and Paul Depodesta see their job 8 00:00:23,060 --> 00:00:26,080 as making sure the team makes it to the playoffs, 9 00:00:26,080 --> 00:00:28,180 and after that, all bets are off. 10 00:00:28,180 --> 00:00:32,800 The A's made it to the playoffs four years in a row-- 2000, 11 00:00:32,800 --> 00:00:38,680 2001, 2002, and 2003-- but they didn't win the World Series. 12 00:00:38,680 --> 00:00:39,920 Why not? 13 00:00:39,920 --> 00:00:43,740 In Moneyball, they say that "over a long season luck 14 00:00:43,740 --> 00:00:46,350 evens out, and skill shines through. 15 00:00:46,350 --> 00:00:48,450 But in a series of three out of five, 16 00:00:48,450 --> 00:00:52,360 or even four out of seven, anything can happen." 17 00:00:52,360 --> 00:00:55,570 In other words, the playoffs suffer from the sample size 18 00:00:55,570 --> 00:00:56,620 problem. 19 00:00:56,620 --> 00:01:00,440 There are not enough games to make any statistical claims. 20 00:01:00,440 --> 00:01:05,000 Let's see if we can verify this using our data set. 21 00:01:05,000 --> 00:01:06,760 The number of teams in the playoffs 22 00:01:06,760 --> 00:01:08,510 has changed over the years. 23 00:01:08,510 --> 00:01:10,840 So let's only use the years with eight teams 24 00:01:10,840 --> 00:01:13,820 in the playoffs, which was the number of teams in the playoffs 25 00:01:13,820 --> 00:01:17,450 in 2002, the year Moneyball discusses. 26 00:01:17,450 --> 00:01:20,520 We can compute the correlation between whether or not 27 00:01:20,520 --> 00:01:24,250 the team wins the World Series-- a binary variable-- 28 00:01:24,250 --> 00:01:26,650 and the number of regular season wins, 29 00:01:26,650 --> 00:01:29,200 since we would expect teams with more wins 30 00:01:29,200 --> 00:01:32,180 to be more likely to win the World Series. 31 00:01:32,180 --> 00:01:37,140 This correlation is 0.03, which is very low. 32 00:01:37,140 --> 00:01:40,660 So it turns out that winning regular season games gets you 33 00:01:40,660 --> 00:01:43,610 to the playoffs, but in the playoffs, there too 34 00:01:43,610 --> 00:01:46,580 few games for luck to even out. 35 00:01:46,580 --> 00:01:49,280 Next week, we'll discuss logistic regression, 36 00:01:49,280 --> 00:01:52,289 which we'll be able to use to predict whether or not 37 00:01:52,289 --> 00:01:54,789 the team will win the World Series.