1 00:00:04,150 --> 00:00:07,010 The goal of a basketball team is similar to that of a baseball 2 00:00:07,010 --> 00:00:10,300 team, making the playoffs. 3 00:00:10,300 --> 00:00:12,140 So how many games does a team need 4 00:00:12,140 --> 00:00:15,370 to win in order to make the playoffs? 5 00:00:15,370 --> 00:00:17,400 Recall that in the lecture we found this number 6 00:00:17,400 --> 00:00:19,520 by looking at a graph. 7 00:00:19,520 --> 00:00:21,770 Here in R, let's use the table command 8 00:00:21,770 --> 00:00:24,490 to figure this out for the NBA. 9 00:00:24,490 --> 00:00:26,240 So that's just table(NBA$W, NBA$Playoffs). 10 00:00:38,460 --> 00:00:39,760 So our table pops up. 11 00:00:39,760 --> 00:00:44,670 Let's scroll to the top so we see what's going on. 12 00:00:44,670 --> 00:00:50,350 OK so as we typed in, we've got the number of wins 13 00:00:50,350 --> 00:00:52,620 here as the rows. 14 00:00:52,620 --> 00:00:56,950 And 0 if a team didn't make the playoffs, 1 if a team did 15 00:00:56,950 --> 00:00:59,470 make the playoffs in the columns. 16 00:00:59,470 --> 00:01:03,730 So for all of our data, for example, 17 00:01:03,730 --> 00:01:07,940 consider all the times that a team won 17 games. 18 00:01:07,940 --> 00:01:10,500 So this happened 11 times in total. 19 00:01:10,500 --> 00:01:13,920 And all 11 times the teams didn't make it to the playoffs 20 00:01:13,920 --> 00:01:16,400 when they won 17 games. 21 00:01:16,400 --> 00:01:19,060 Let's scroll down and look at a much higher number 22 00:01:19,060 --> 00:01:21,060 for contrast. 23 00:01:21,060 --> 00:01:24,060 For example, 61 wins. 24 00:01:24,060 --> 00:01:28,890 If a team 61 games then 10 of those times they made it 25 00:01:28,890 --> 00:01:31,930 to the playoffs, and 0 times they didn't. 26 00:01:31,930 --> 00:01:34,220 So it seems like if you win 61 games 27 00:01:34,220 --> 00:01:37,229 you are definitely going to make it to the playoffs. 28 00:01:37,229 --> 00:01:39,710 But I'm sure we can find a much better threshold. 29 00:01:39,710 --> 00:01:42,300 Let's take a look at the table, say around the middle section. 30 00:01:46,490 --> 00:01:51,180 OK, so here we can see that a team who 31 00:01:51,180 --> 00:01:57,310 wins say about 35 games or fewer almost never 32 00:01:57,310 --> 00:01:58,360 makes it to the playoffs. 33 00:01:58,360 --> 00:02:02,820 We see a lot of 0s and 1s in this column up until 35. 34 00:02:02,820 --> 00:02:05,330 After 35 we start seeing some numbers over here. 35 00:02:05,330 --> 00:02:08,690 So teams are starting to make it to the playoffs. 36 00:02:08,690 --> 00:02:15,050 And if we scroll down, we see that after about 45 wins, 37 00:02:15,050 --> 00:02:16,970 teams almost always make it to the playoffs. 38 00:02:16,970 --> 00:02:22,360 We see very few 1s and 0s in the category of not making it. 39 00:02:22,360 --> 00:02:24,150 So it seems like a good goal would 40 00:02:24,150 --> 00:02:26,950 be to try to win about 42 games. 41 00:02:26,950 --> 00:02:29,260 If a team can win about 42 games then 42 00:02:29,260 --> 00:02:31,680 they have a very good chance of making it to the playoffs. 43 00:02:37,490 --> 00:02:40,650 So in basketball, games are won by scoring more points 44 00:02:40,650 --> 00:02:43,200 than the other team. 45 00:02:43,200 --> 00:02:45,550 Can we use the difference between points scored 46 00:02:45,550 --> 00:02:48,790 and points allowed throughout the regular season in order 47 00:02:48,790 --> 00:02:52,150 to predict the number of games that a team will win? 48 00:02:52,150 --> 00:02:54,180 Let's give it a try. 49 00:02:54,180 --> 00:02:57,210 First we add a variable that is the difference between points 50 00:02:57,210 --> 00:02:59,320 scored and points allowed. 51 00:02:59,320 --> 00:03:00,920 Let's call this NBA$PTSdiff. 52 00:03:05,180 --> 00:03:10,700 And that's just the difference between points scored, 53 00:03:10,700 --> 00:03:14,760 which is points, and points allowed, which 54 00:03:14,760 --> 00:03:16,270 is opponent's points. 55 00:03:18,970 --> 00:03:22,310 All right, so we've created a variable. 56 00:03:22,310 --> 00:03:23,900 Let's first make a scatter plot to see 57 00:03:23,900 --> 00:03:26,730 if it looks like there's a linear relationship 58 00:03:26,730 --> 00:03:28,970 between the number of wins that a team wins 59 00:03:28,970 --> 00:03:31,300 and the point difference. 60 00:03:31,300 --> 00:03:34,740 So this is easy to do just with the Plot command. 61 00:03:34,740 --> 00:03:43,850 NBA$PTSdiff and NBA$W. 62 00:03:43,850 --> 00:03:46,490 So our graph pops up and it looks 63 00:03:46,490 --> 00:03:49,170 like there's an incredibly strong linear relationship 64 00:03:49,170 --> 00:03:51,940 between these two variables. 65 00:03:51,940 --> 00:03:53,630 So it seems like linear regression 66 00:03:53,630 --> 00:03:56,090 is going to be a good way to predict how many wins 67 00:03:56,090 --> 00:03:59,960 a team will have given the point difference. 68 00:03:59,960 --> 00:04:01,010 Let's try to verify this. 69 00:04:06,410 --> 00:04:08,400 So we're going to have points diff 70 00:04:08,400 --> 00:04:11,760 as our independent variable in our regression, 71 00:04:11,760 --> 00:04:15,130 and W for wins as the dependent variable. 72 00:04:15,130 --> 00:04:16,940 So let's call this WinsReg. 73 00:04:20,060 --> 00:04:26,580 And we just use the lm command as before progressing w 74 00:04:26,580 --> 00:04:33,270 on the points diff and using the NBA data. 75 00:04:33,270 --> 00:04:35,070 All right, so we've created our regression. 76 00:04:35,070 --> 00:04:36,440 Let's take a look at the summary. 77 00:04:41,380 --> 00:04:45,450 OK, so the first thing that we notice 78 00:04:45,450 --> 00:04:49,000 is that we've got very significant variables 79 00:04:49,000 --> 00:04:50,180 over here. 80 00:04:50,180 --> 00:04:54,360 And an R squared of 0.9423, which is very high. 81 00:04:54,360 --> 00:04:58,060 And this is verifying the scatter plot 82 00:04:58,060 --> 00:05:01,290 we saw before that there's a very strong linear relationship 83 00:05:01,290 --> 00:05:03,080 between the wins and the points difference. 84 00:05:06,770 --> 00:05:08,290 So let's write down the regression 85 00:05:08,290 --> 00:05:10,490 equation that we found. 86 00:05:10,490 --> 00:05:18,290 We see that the number of wins, W, is equal to 41. 87 00:05:18,290 --> 00:05:22,360 That's coming from the coefficient estimate 88 00:05:22,360 --> 00:05:23,950 for the intercept. 89 00:05:23,950 --> 00:05:42,240 Plus 0.0326*PTSdiff. 90 00:05:42,240 --> 00:05:46,690 And that 0.0326 is coming from the coefficient estimate 91 00:05:46,690 --> 00:05:49,720 for points difference. 92 00:05:49,720 --> 00:05:53,130 So we saw earlier with the table that a team 93 00:05:53,130 --> 00:05:56,250 would want to win about at least 42 games 94 00:05:56,250 --> 00:05:59,920 in order to have a good chance of making it to the playoffs. 95 00:05:59,920 --> 00:06:03,070 So what does this mean in terms of their points difference? 96 00:06:03,070 --> 00:06:04,700 Well, we can calculate it. 97 00:06:04,700 --> 00:06:09,830 If we want this to be greater than or equal to 42, 98 00:06:09,830 --> 00:06:15,940 that means that the points difference 99 00:06:15,940 --> 00:06:26,290 would need to be greater than or equal to 42 minus 41 100 00:06:26,290 --> 00:06:34,120 divided by 0.0326. 101 00:06:34,120 --> 00:06:36,530 So if we actually do that calculation, 102 00:06:36,530 --> 00:06:47,590 we see that this is equal to 30.67. 103 00:06:47,590 --> 00:06:50,510 So we need to score at least 31 more points 104 00:06:50,510 --> 00:06:54,950 than we allow in order to win at least 42 games.