1 00:00:04,500 --> 00:00:06,990 We saw in the previous video that the outcome 2 00:00:06,990 --> 00:00:10,870 of a logistic regression model is a probability. 3 00:00:10,870 --> 00:00:14,260 Often, we want to make an actual prediction. 4 00:00:14,260 --> 00:00:16,760 Should we predict 1 for poor care, 5 00:00:16,760 --> 00:00:20,110 or should we predict 0 for good care? 6 00:00:20,110 --> 00:00:23,380 We can convert the probabilities to predictions 7 00:00:23,380 --> 00:00:27,030 using what's called a threshold value, t. 8 00:00:27,030 --> 00:00:30,490 If the probability of poor care is greater than this threshold 9 00:00:30,490 --> 00:00:34,290 value, t, we predict poor quality care. 10 00:00:34,290 --> 00:00:36,310 But if the probability of poor care 11 00:00:36,310 --> 00:00:37,920 is less than the threshold value, 12 00:00:37,920 --> 00:00:41,110 t, then we predict good quality care. 13 00:00:41,110 --> 00:00:45,340 But what value should we pick for the threshold, t? 14 00:00:45,340 --> 00:00:48,200 The threshold value, t, is often selected 15 00:00:48,200 --> 00:00:50,480 based on which errors are better. 16 00:00:50,480 --> 00:00:53,200 You might be thinking that making no errors 17 00:00:53,200 --> 00:00:55,590 is better, which is, of course, true. 18 00:00:55,590 --> 00:00:58,820 But it's rare to have a model that predicts perfectly, 19 00:00:58,820 --> 00:01:01,500 so you're bound to make some errors. 20 00:01:01,500 --> 00:01:04,569 There are two types of errors that a model can make -- 21 00:01:04,569 --> 00:01:07,300 ones where you predict 1, or poor care, 22 00:01:07,300 --> 00:01:11,470 but the actual outcome is 0, and ones where you predict 0, 23 00:01:11,470 --> 00:01:15,500 or good care, but the actual outcome is 1. 24 00:01:15,500 --> 00:01:18,670 If we pick a large threshold value t, 25 00:01:18,670 --> 00:01:21,080 then we will predict poor care rarely, 26 00:01:21,080 --> 00:01:23,150 since the probability of poor care 27 00:01:23,150 --> 00:01:26,670 has to be really large to be greater than the threshold. 28 00:01:26,670 --> 00:01:29,050 This means that we will make more errors where 29 00:01:29,050 --> 00:01:32,700 we say good care, but it's actually poor care. 30 00:01:32,700 --> 00:01:35,660 This approach would detect the patients receiving the worst 31 00:01:35,660 --> 00:01:39,660 care and prioritize them for intervention. 32 00:01:39,660 --> 00:01:43,410 On the other hand, if the threshold value, t, is small, 33 00:01:43,410 --> 00:01:46,850 we predict poor care frequently, and we predict good care 34 00:01:46,850 --> 00:01:48,060 rarely. 35 00:01:48,060 --> 00:01:50,400 This means that we will make more errors where 36 00:01:50,400 --> 00:01:53,910 we say poor care, but it's actually good care. 37 00:01:53,910 --> 00:01:56,170 This approach would detect all patients 38 00:01:56,170 --> 00:01:59,039 who might be receiving poor care. 39 00:01:59,039 --> 00:02:01,540 Some decision-makers often have a preference 40 00:02:01,540 --> 00:02:03,810 for one type of error over the other, 41 00:02:03,810 --> 00:02:07,110 which should influence the threshold value they pick. 42 00:02:07,110 --> 00:02:09,449 If there's no preference between the errors, 43 00:02:09,449 --> 00:02:13,280 the right threshold to select is t = 0.5, 44 00:02:13,280 --> 00:02:17,730 since it just predicts the most likely outcome. 45 00:02:17,730 --> 00:02:20,730 To make this discussion a little more quantitative, 46 00:02:20,730 --> 00:02:24,360 we use what's called a confusion matrix or classification 47 00:02:24,360 --> 00:02:25,680 matrix. 48 00:02:25,680 --> 00:02:27,880 This compares the actual outcomes 49 00:02:27,880 --> 00:02:30,160 to the predicted outcomes. 50 00:02:30,160 --> 00:02:33,880 The rows are labeled with the actual outcome, 51 00:02:33,880 --> 00:02:37,920 and the columns are labeled with the predicted outcome. 52 00:02:37,920 --> 00:02:40,350 Each entry of the table gives the number of data 53 00:02:40,350 --> 00:02:43,570 observations that fall into that category. 54 00:02:43,570 --> 00:02:46,900 So the number of true negatives, or TN, 55 00:02:46,900 --> 00:02:49,350 is the number of observations that are actually 56 00:02:49,350 --> 00:02:53,210 good care and for which we predict good care. 57 00:02:53,210 --> 00:02:56,770 The true positives, or TP, is the number 58 00:02:56,770 --> 00:02:58,520 of observations that are actually 59 00:02:58,520 --> 00:03:02,030 poor care and for which we predict poor care. 60 00:03:02,030 --> 00:03:05,260 These are the two types that we get correct. 61 00:03:05,260 --> 00:03:09,510 The false positives, or FP, are the number of data points 62 00:03:09,510 --> 00:03:14,370 for which we predict poor care, but they're actually good care. 63 00:03:14,370 --> 00:03:18,690 And the false negatives, or FN, are the number of data points 64 00:03:18,690 --> 00:03:23,990 for which we predict good care, but they're actually poor care. 65 00:03:23,990 --> 00:03:26,220 We can compute two outcome measures 66 00:03:26,220 --> 00:03:30,050 that help us determine what types of errors we are making. 67 00:03:30,050 --> 00:03:36,850 They're called sensitivity and specificity. 68 00:03:42,620 --> 00:03:46,770 Sensitivity is equal to the true positives 69 00:03:46,770 --> 00:03:51,980 divided by the true positives plus the false negatives, 70 00:03:51,980 --> 00:03:55,750 and measures the percentage of actual poor care cases 71 00:03:55,750 --> 00:03:57,870 that we classify correctly. 72 00:03:57,870 --> 00:04:01,150 This is often called the true positive rate. 73 00:04:01,150 --> 00:04:04,710 Specificity is equal to the true negatives 74 00:04:04,710 --> 00:04:09,520 divided by the true negatives plus the false positives, 75 00:04:09,520 --> 00:04:12,570 and measures the percentage of actual good care cases 76 00:04:12,570 --> 00:04:14,700 that we classify correctly. 77 00:04:14,700 --> 00:04:17,649 This is often called the true negative rate. 78 00:04:17,649 --> 00:04:21,700 A model with a higher threshold will have a lower sensitivity 79 00:04:21,700 --> 00:04:24,200 and a higher specificity. 80 00:04:24,200 --> 00:04:28,440 A model with a lower threshold will have a higher sensitivity 81 00:04:28,440 --> 00:04:30,930 and a lower specificity. 82 00:04:30,930 --> 00:04:33,260 Let's compute some confusion matrices 83 00:04:33,260 --> 00:04:37,550 in R using different threshold values. 84 00:04:37,550 --> 00:04:41,490 In our R console, let's make some classification tables 85 00:04:41,490 --> 00:04:45,680 using different threshold values and the table function. 86 00:04:45,680 --> 00:04:49,560 First, we'll use a threshold value of 0.5. 87 00:04:49,560 --> 00:04:53,020 So type table, and then the first argument, 88 00:04:53,020 --> 00:04:56,500 or what we want to label the rows by, should be the true 89 00:04:56,500 --> 00:04:58,230 outcome, which is qualityTrain$PoorCare. 90 00:05:04,440 --> 00:05:06,290 And then the second argument, or what 91 00:05:06,290 --> 00:05:08,750 we want to label the columns by, will 92 00:05:08,750 --> 00:05:11,850 be predictTrain, or our predictions 93 00:05:11,850 --> 00:05:16,670 from the previous video, greater than 0.5. 94 00:05:16,670 --> 00:05:21,200 This will return TRUE if our prediction is greater than 0.5, 95 00:05:21,200 --> 00:05:23,860 which means we want to predict poor care, 96 00:05:23,860 --> 00:05:27,540 and it will return FALSE if our prediction is less than 0.5, 97 00:05:27,540 --> 00:05:31,010 which means we want to predict good care. 98 00:05:31,010 --> 00:05:34,040 If you hit Enter, we get a table where the rows are labeled 99 00:05:34,040 --> 00:05:37,840 by 0 or 1, the true outcome, and the columns 100 00:05:37,840 --> 00:05:41,780 are labeled by FALSE or TRUE, our predicted outcome. 101 00:05:41,780 --> 00:05:46,400 So you can see here that for 70 cases, we predict good care 102 00:05:46,400 --> 00:05:50,220 and they actually received good care, and for 10 cases, 103 00:05:50,220 --> 00:05:54,230 we predict poor care, and they actually received poor care. 104 00:05:54,230 --> 00:05:57,200 We make four mistakes where we say poor care 105 00:05:57,200 --> 00:06:01,010 and it's actually good care, and we make 15 mistakes where 106 00:06:01,010 --> 00:06:05,010 we say good care, but it's actually poor care. 107 00:06:05,010 --> 00:06:09,240 Let's compute the sensitivity, or the true positive rate, 108 00:06:09,240 --> 00:06:12,790 and the specificity, or the true negative rate. 109 00:06:12,790 --> 00:06:16,870 The sensitivity here would be 10, our true positives, 110 00:06:16,870 --> 00:06:22,460 divided by 25 the total number of positive cases. 111 00:06:22,460 --> 00:06:26,400 So we have a sensitivity of 0.4. 112 00:06:26,400 --> 00:06:31,260 Our specificity here would be 70, the true negative cases, 113 00:06:31,260 --> 00:06:37,430 divided by 74, the total number of negative cases. 114 00:06:37,430 --> 00:06:42,900 So our specificity here is about 0.95. 115 00:06:42,900 --> 00:06:45,480 Now, let's try increasing the threshold. 116 00:06:45,480 --> 00:06:48,830 Use the up arrow to get back to the table command, 117 00:06:48,830 --> 00:06:53,350 and change the threshold value to 0.7. 118 00:06:53,350 --> 00:06:56,310 Now, if we compute our sensitivity, 119 00:06:56,310 --> 00:07:02,480 we get a sensitivity of 8 divided by 25, which is 0.32. 120 00:07:02,480 --> 00:07:04,620 And if we compute our specificity, 121 00:07:04,620 --> 00:07:08,930 we get a specificity of 73 divided by 74, 122 00:07:08,930 --> 00:07:11,050 which is about 0.99. 123 00:07:11,050 --> 00:07:14,870 So by increasing the threshold, our sensitivity went down 124 00:07:14,870 --> 00:07:17,670 and our specificity went up. 125 00:07:17,670 --> 00:07:20,540 Now, let's try decreasing the threshold. 126 00:07:20,540 --> 00:07:23,820 Hit the up arrow again to get to the table function, 127 00:07:23,820 --> 00:07:27,940 and change the threshold value to 0.2. 128 00:07:27,940 --> 00:07:30,570 Now, if we compute our sensitivity, 129 00:07:30,570 --> 00:07:35,380 it's 16 divided by 25, or 0.64. 130 00:07:35,380 --> 00:07:37,290 And if we compute our specificity, 131 00:07:37,290 --> 00:07:42,590 it's 54 divided by 74, or about 0.73. 132 00:07:42,590 --> 00:07:45,930 So with the lower threshold, our sensitivity went up, 133 00:07:45,930 --> 00:07:48,890 and our specificity went down. 134 00:07:48,890 --> 00:07:51,250 But which threshold should we pick? 135 00:07:51,250 --> 00:07:54,450 Maybe 0.4 is better, or 0.6. 136 00:07:54,450 --> 00:07:56,100 How do we decide? 137 00:07:56,100 --> 00:07:59,320 In the next video, we'll see a nice visualization 138 00:07:59,320 --> 00:08:01,800 to help us select a threshold.