1 00:00:04,740 --> 00:00:08,380 Logistic regression predicts the probability of the outcome 2 00:00:08,380 --> 00:00:10,420 variable being true. 3 00:00:10,420 --> 00:00:13,550 In this example, a logistic regression model 4 00:00:13,550 --> 00:00:16,230 would predict the probability that the patient 5 00:00:16,230 --> 00:00:18,220 is receiving poor care. 6 00:00:18,220 --> 00:00:20,620 Or if we denote the PoorCare variable 7 00:00:20,620 --> 00:00:24,800 by y, the probability that y = 1. 8 00:00:24,800 --> 00:00:26,870 For the remainder of this lecture, 9 00:00:26,870 --> 00:00:36,440 we will denote poor care by 1, and good care by zero. 10 00:00:39,110 --> 00:00:42,860 So, since the outcome is either 0 or 1, 11 00:00:42,860 --> 00:00:46,790 this means that the probability that the outcome variable is 0 12 00:00:46,790 --> 00:00:51,580 is just 1 minus the probability that the outcome variable is 1. 13 00:00:51,580 --> 00:00:54,930 So by predicting the probability that y = 1, 14 00:00:54,930 --> 00:00:59,400 we also get the probability that y = 0. 15 00:00:59,400 --> 00:01:01,430 Just like in linear regression, we 16 00:01:01,430 --> 00:01:05,770 have a set of independent variables, x1 through xk, 17 00:01:05,770 --> 00:01:10,789 where k is the total number of independent variables we have. 18 00:01:10,789 --> 00:01:14,060 Then to predict the probability that y = 1, 19 00:01:14,060 --> 00:01:18,610 we use what's called the Logistic Response Function. 20 00:01:18,610 --> 00:01:21,850 This seems like a complicated, nonlinear equation, 21 00:01:21,850 --> 00:01:24,980 but you can see the familiar linear regression 22 00:01:24,980 --> 00:01:30,070 equation in this Logistic Response Function. 23 00:01:30,070 --> 00:01:32,039 The Logistic Response Function is 24 00:01:32,039 --> 00:01:35,460 used to produce a number between 0 and 1. 25 00:01:35,460 --> 00:01:37,930 Let's understand this function a little better. 26 00:01:40,509 --> 00:01:43,410 This plot shows the logistic response function 27 00:01:43,410 --> 00:01:47,080 for different values of the linear regression piece. 28 00:01:47,080 --> 00:01:49,240 The logistic response function always 29 00:01:49,240 --> 00:01:53,300 takes values between 0 and 1, which makes sense, 30 00:01:53,300 --> 00:01:56,150 since it equals a probability. 31 00:01:56,150 --> 00:01:59,470 A positive coefficient value for a variable 32 00:01:59,470 --> 00:02:02,370 increases the linear regression piece, 33 00:02:02,370 --> 00:02:05,940 which increases the probability that y = 1, 34 00:02:05,940 --> 00:02:08,770 or increases the probability of poor care. 35 00:02:12,530 --> 00:02:16,060 On the other hand, a negative coefficient value 36 00:02:16,060 --> 00:02:20,960 for a variable decreases the linear regression piece, 37 00:02:20,960 --> 00:02:25,560 which in turn decreases the probability that y = 1, 38 00:02:25,560 --> 00:02:30,200 or increases the probability of good care. 39 00:02:30,200 --> 00:02:33,420 The coefficients, or betas, are selected 40 00:02:33,420 --> 00:02:37,500 to predict a high probability for the actual poor care cases, 41 00:02:37,500 --> 00:02:42,780 and to predict a low probability for the actual good care cases. 42 00:02:42,780 --> 00:02:46,040 Another useful way to think about the logistic response 43 00:02:46,040 --> 00:02:50,120 function is in terms of Odds, like in gambling. 44 00:02:50,120 --> 00:02:53,250 The Odds are the probability of 1 45 00:02:53,250 --> 00:02:56,980 divided by the probability of 0. 46 00:02:56,980 --> 00:03:01,960 The Odds are greater than 1 if 1 is more likely, and less than 1 47 00:03:01,960 --> 00:03:03,770 if 0 is more likely. 48 00:03:03,770 --> 00:03:08,690 The Odds are equal to 1 if the outcomes are equally likely. 49 00:03:08,690 --> 00:03:11,680 If you substitute the Logistic Response Function 50 00:03:11,680 --> 00:03:13,470 for the probabilities in the Odds 51 00:03:13,470 --> 00:03:16,050 equation on the previous slide, you 52 00:03:16,050 --> 00:03:19,590 can see that the Odds are equal to "e" raised 53 00:03:19,590 --> 00:03:23,110 to the power of the linear regression equation. 54 00:03:23,110 --> 00:03:27,310 By taking the log of both sides, the log(Odds), 55 00:03:27,310 --> 00:03:30,620 or what we call the Logit, looks exactly 56 00:03:30,620 --> 00:03:33,570 like the linear regression equation. 57 00:03:33,570 --> 00:03:37,780 This helps us understand how the coefficients, or betas, affect 58 00:03:37,780 --> 00:03:40,730 our prediction of the probability. 59 00:03:40,730 --> 00:03:44,520 A positive beta value increases the Logit, 60 00:03:44,520 --> 00:03:48,000 which in turn increases the Odds of 1. 61 00:03:48,000 --> 00:03:51,550 A negative beta value decreases the Logit, 62 00:03:51,550 --> 00:03:55,430 which in turn, decreases the Odds of one. 63 00:03:55,430 --> 00:03:59,740 In the next video, we'll build a logistic regression model in R, 64 00:03:59,740 --> 00:04:02,610 and predict the quality of care.