1 00:00:04,500 --> 00:00:08,310 In this video we'll try to make predictions for the 2012-2013 2 00:00:08,310 --> 00:00:09,560 season. 3 00:00:09,560 --> 00:00:13,020 We'll need to load our test set because our training set only 4 00:00:13,020 --> 00:00:18,870 included data from 1980 up until the 2011-2012 season. 5 00:00:18,870 --> 00:00:19,960 So let's call it NBA_test. 6 00:00:25,470 --> 00:00:27,550 And we'll read it in the same way as always, 7 00:00:27,550 --> 00:00:28,600 read.csv("NBA_test.csv"). 8 00:00:36,220 --> 00:00:40,160 All right, so now let's try to predict using our model 9 00:00:40,160 --> 00:00:43,090 that we made in the previous video, how many points we'll 10 00:00:43,090 --> 00:00:46,070 see in 2012-2013 season. 11 00:00:46,070 --> 00:00:47,450 Let's call this PointsPrediction. 12 00:00:54,120 --> 00:00:57,450 And so we use the predict command here. 13 00:00:57,450 --> 00:00:59,990 And we give it the previous model that we made. 14 00:01:02,770 --> 00:01:05,620 We'll give it PointsReg4, because that 15 00:01:05,620 --> 00:01:09,030 was the model we determined at the end to be the best one. 16 00:01:09,030 --> 00:01:11,880 And the new data which is NBA_test. 17 00:01:16,150 --> 00:01:21,430 OK, so now that we have our prediction, how good is it? 18 00:01:21,430 --> 00:01:24,570 We can compute the out of sample r-squared. 19 00:01:24,570 --> 00:01:25,980 This is a measurement of how well 20 00:01:25,980 --> 00:01:28,580 the model predicts on test data. 21 00:01:28,580 --> 00:01:31,200 The r squared value we had before from our model, 22 00:01:31,200 --> 00:01:35,300 the 0.8991, you might remember, is the measure 23 00:01:35,300 --> 00:01:37,610 of an in-sample r-squared, which is 24 00:01:37,610 --> 00:01:40,580 how well the model fits the training data. 25 00:01:40,580 --> 00:01:43,490 But to get a measure of the predictions goodness of fit, 26 00:01:43,490 --> 00:01:46,740 we need to calculate the out of sample r-squared. 27 00:01:46,740 --> 00:01:48,310 So let's do that here. 28 00:01:48,310 --> 00:01:50,190 We need to compute the sum of squared errors. 29 00:01:55,240 --> 00:02:02,370 And so this here is just the sum of the predicted amount 30 00:02:02,370 --> 00:02:11,470 minus the actual amount of points squared and summed. 31 00:02:11,470 --> 00:02:15,670 And we need the total sums of squares, 32 00:02:15,670 --> 00:02:24,890 which is just the sum of the average number of points 33 00:02:24,890 --> 00:02:27,680 minus the test actual number of points. 34 00:02:38,380 --> 00:02:41,530 So the r-squared here then is calculated 35 00:02:41,530 --> 00:02:45,630 as usual, 1 minus the sum of squared errors divided 36 00:02:45,630 --> 00:02:48,500 by total sums of squares. 37 00:02:48,500 --> 00:02:53,100 And we see that we have an r squared value of 0.8127. 38 00:02:53,100 --> 00:02:56,810 We can also calculate the root mean square error the same way 39 00:02:56,810 --> 00:02:58,940 as before, root mean squared error 40 00:02:58,940 --> 00:03:03,080 is going to be the square root of the sum of squared errors 41 00:03:03,080 --> 00:03:06,140 divided by n, which is the number of rows in our test data 42 00:03:06,140 --> 00:03:06,640 set. 43 00:03:13,810 --> 00:03:21,280 OK and the root mean squared error here is 196.37. 44 00:03:21,280 --> 00:03:23,120 So it's a little bit higher than before. 45 00:03:23,120 --> 00:03:24,400 But it's not too bad. 46 00:03:24,400 --> 00:03:29,680 We're making an average error of about 196 points. 47 00:03:29,680 --> 00:03:31,220 We'll stop here for now. 48 00:03:31,220 --> 00:03:33,220 Good luck with the homework.