1 00:00:05,130 --> 00:00:07,990 In this video we will compare all the different methods 2 00:00:07,990 --> 00:00:10,800 we have seen so far in this course and review what 3 00:00:10,800 --> 00:00:15,040 they are used for, their benefits, and limitations. 4 00:00:15,040 --> 00:00:18,870 Linear regression is used to predict a continuous outcome. 5 00:00:18,870 --> 00:00:21,530 Linear regression is simple and commonly used, 6 00:00:21,530 --> 00:00:24,380 and it works on small and large data sets. 7 00:00:24,380 --> 00:00:28,130 The downside is that it assumes a linear relationship. 8 00:00:28,130 --> 00:00:30,220 If we have a nonlinear relationship, 9 00:00:30,220 --> 00:00:33,170 we need to add variables to our analysis. 10 00:00:33,170 --> 00:00:38,390 For instance, suppose y = a*log(X)+b, where x is data, 11 00:00:38,390 --> 00:00:40,660 and y is what we need to predict. 12 00:00:40,660 --> 00:00:43,270 To be able to find the coefficients a and b 13 00:00:43,270 --> 00:00:45,070 through linear regression, we need 14 00:00:45,070 --> 00:00:47,990 to view log(X) as a new variable. 15 00:00:47,990 --> 00:00:52,500 Remember that we did this in the Google homework problem. 16 00:00:52,500 --> 00:00:56,450 Logistic regression is used to predict a categorical outcome. 17 00:00:56,450 --> 00:01:00,470 We mainly focused on binary outcomes, like yes or no, 18 00:01:00,470 --> 00:01:04,810 sell or buy, accept or reject, and so on. 19 00:01:04,810 --> 00:01:08,010 We have seen it applied to predict the quality of care, 20 00:01:08,010 --> 00:01:11,930 good or bad; the winner of the US presidential election, 21 00:01:11,930 --> 00:01:16,030 Republican or Democrat; parole violation and loan payment, yes 22 00:01:16,030 --> 00:01:17,400 or no. 23 00:01:17,400 --> 00:01:19,789 In addition to its relative simplicity, 24 00:01:19,789 --> 00:01:22,200 logistic regression computes probabilities 25 00:01:22,200 --> 00:01:25,850 that can be used to assess the confidence of our prediction. 26 00:01:25,850 --> 00:01:30,300 The downside is again similar to that of linear regression. 27 00:01:30,300 --> 00:01:32,370 In the trees week we learned CART, 28 00:01:32,370 --> 00:01:34,950 which is used to predict a categorical outcome, 29 00:01:34,950 --> 00:01:38,000 with possibly more than two categories, like quality 30 00:01:38,000 --> 00:01:40,710 rating, from one to five, and three decisions, 31 00:01:40,710 --> 00:01:42,910 say, buy, sell, or hold. 32 00:01:42,910 --> 00:01:45,620 It can also predict a continuous outcome, 33 00:01:45,620 --> 00:01:48,090 such as salary or price. 34 00:01:48,090 --> 00:01:49,729 We have seen it applied to predict 35 00:01:49,729 --> 00:01:53,000 life expectancy, earnings from census data, 36 00:01:53,000 --> 00:01:55,370 and letter recognition. 37 00:01:55,370 --> 00:01:57,320 The power of CART lies in the fact 38 00:01:57,320 --> 00:01:59,560 that it can handle nonlinear relationships 39 00:01:59,560 --> 00:02:01,220 between variables. 40 00:02:01,220 --> 00:02:04,960 The tree representation makes it easy to visualize and interpret 41 00:02:04,960 --> 00:02:06,260 the results. 42 00:02:06,260 --> 00:02:08,370 The downside is that CART may not 43 00:02:08,370 --> 00:02:11,820 work very well on small data sets. 44 00:02:11,820 --> 00:02:14,190 Random forest is also used to predict 45 00:02:14,190 --> 00:02:17,550 categorical outcomes or continuous outcomes. 46 00:02:17,550 --> 00:02:20,880 Its benefit over CART is that it can improve the prediction 47 00:02:20,880 --> 00:02:22,090 accuracy. 48 00:02:22,090 --> 00:02:24,440 However, we need to adjust many parameters 49 00:02:24,440 --> 00:02:28,079 and it's not as easy to explain as CART 50 00:02:28,079 --> 00:02:30,860 This week, we learned hierarchical clustering, 51 00:02:30,860 --> 00:02:33,520 which is used to find similar groups. 52 00:02:33,520 --> 00:02:35,900 An important aspect of clustering data 53 00:02:35,900 --> 00:02:38,730 into smaller groups is that we can improve our prediction 54 00:02:38,730 --> 00:02:41,720 accuracy by applying our predictive methods, 55 00:02:41,720 --> 00:02:45,730 like logistic regression for instance, on each cluster. 56 00:02:45,730 --> 00:02:48,720 We expand on this cluster-then-predict idea 57 00:02:48,720 --> 00:02:51,450 in one of our homework problems. 58 00:02:51,450 --> 00:02:54,329 Hierarchical clustering is an attractive technique, 59 00:02:54,329 --> 00:02:57,060 because we do not need to select the number of clusters 60 00:02:57,060 --> 00:02:59,110 before running the algorithm. 61 00:02:59,110 --> 00:03:02,930 Also, we can visualize the clusters using a dendrogram. 62 00:03:02,930 --> 00:03:05,680 The drawback though, is that hierarchical clustering 63 00:03:05,680 --> 00:03:07,890 is hard to use on large data sets, 64 00:03:07,890 --> 00:03:10,570 because of the pairwise distance calculation, 65 00:03:10,570 --> 00:03:13,320 as we saw in this recitation. 66 00:03:13,320 --> 00:03:16,110 An alternative method is k-means clustering, 67 00:03:16,110 --> 00:03:19,180 which works well on data sets of any size. 68 00:03:19,180 --> 00:03:21,200 However, k-means requires selecting 69 00:03:21,200 --> 00:03:24,900 the number of clusters before running the algorithm. 70 00:03:24,900 --> 00:03:26,910 This may not be a limitation if we 71 00:03:26,910 --> 00:03:30,410 have an intuition of the number of clusters we want to look at, 72 00:03:30,410 --> 00:03:34,570 as in the medical image segmentation example. 73 00:03:34,570 --> 00:03:37,310 I hope that this quick review gave you a good refresher 74 00:03:37,310 --> 00:03:39,110 before the competition week. 75 00:03:39,110 --> 00:03:41,020 Good luck.