1 00:00:04,500 --> 00:00:07,450 To predict the outcomes of the Supreme Court, 2 00:00:07,450 --> 00:00:12,100 Martin used cases from 1994 through 2001. 3 00:00:12,100 --> 00:00:15,200 He chose this period of time because the Supreme Court 4 00:00:15,200 --> 00:00:18,530 was composed of the same nine justices that 5 00:00:18,530 --> 00:00:22,740 were justices when he made his predictions in 2002. 6 00:00:22,740 --> 00:00:29,250 These nine justices were Breyer, Ginsburg, Kennedy, O'Connor, 7 00:00:29,250 --> 00:00:34,750 Rehnquist-- who was the Chief Justice-- Scalia, Souter, 8 00:00:34,750 --> 00:00:37,580 Stevens, and Thomas. 9 00:00:37,580 --> 00:00:41,600 This was a very rare data set, since as we mentioned earlier, 10 00:00:41,600 --> 00:00:43,540 this was the longest period of time 11 00:00:43,540 --> 00:00:48,050 with the same set of justices in over 180 years. 12 00:00:48,050 --> 00:00:51,170 This allowed Martin to use a larger data set 13 00:00:51,170 --> 00:00:52,670 then might have been available if he 14 00:00:52,670 --> 00:00:56,200 was doing this experiment at a different time. 15 00:00:56,200 --> 00:00:58,810 In this lecture, we'll focus on predicting 16 00:00:58,810 --> 00:01:01,320 Justice Stevens' decisions. 17 00:01:01,320 --> 00:01:03,340 He is generally considered a justice 18 00:01:03,340 --> 00:01:05,650 who started out moderate, but became 19 00:01:05,650 --> 00:01:08,840 more liberal during his time on the Supreme Court-- 20 00:01:08,840 --> 00:01:11,550 although, he's a self-proclaimed conservative. 21 00:01:14,560 --> 00:01:17,160 In this problem, our dependent variable 22 00:01:17,160 --> 00:01:19,130 is whether or not Justice Stevens 23 00:01:19,130 --> 00:01:22,460 voted to reverse the lower court decision. 24 00:01:22,460 --> 00:01:26,630 This is a binary variable taking value 1 if Justice Stevens 25 00:01:26,630 --> 00:01:30,820 decided to reverse or overturn the lower court decision, 26 00:01:30,820 --> 00:01:33,780 and taking value 0 if Justice Stevens voted 27 00:01:33,780 --> 00:01:37,729 to affirm or maintain the lower court decision. 28 00:01:37,729 --> 00:01:40,900 Our independent variables are six different properties 29 00:01:40,900 --> 00:01:42,710 of the case. 30 00:01:42,710 --> 00:01:46,330 The circuit court of origin is the circuit 31 00:01:46,330 --> 00:01:49,150 or lower court where the case came from. 32 00:01:49,150 --> 00:01:52,950 There are 13 different circuit courts in the United States. 33 00:01:52,950 --> 00:01:55,009 The 1st through 11th and Washington, 34 00:01:55,009 --> 00:01:58,060 DC courts are defined by region. 35 00:01:58,060 --> 00:02:00,590 And the federal court is defined by the subject 36 00:02:00,590 --> 00:02:02,980 matter of the case. 37 00:02:02,980 --> 00:02:05,790 The issue area of the case gives each case 38 00:02:05,790 --> 00:02:10,610 a category, like civil rights or federal taxation. 39 00:02:10,610 --> 00:02:13,720 The type of petitioner and type of respondent 40 00:02:13,720 --> 00:02:16,180 define two parties in the case. 41 00:02:16,180 --> 00:02:20,110 Some examples are the United States, an employer, 42 00:02:20,110 --> 00:02:22,530 or an employee. 43 00:02:22,530 --> 00:02:26,440 The ideological direction of the lower court decision 44 00:02:26,440 --> 00:02:28,130 describes whether the lower court 45 00:02:28,130 --> 00:02:30,829 made what was considered a liberal 46 00:02:30,829 --> 00:02:34,250 or a conservative decision. 47 00:02:34,250 --> 00:02:36,900 The last variable indicates whether or not 48 00:02:36,900 --> 00:02:39,890 the petitioner argued that a law or practice was 49 00:02:39,890 --> 00:02:42,240 unconstitutional. 50 00:02:42,240 --> 00:02:45,160 To collect this data, Martin and his colleagues 51 00:02:45,160 --> 00:02:48,890 read through all of the cases and coded the information. 52 00:02:48,890 --> 00:02:52,020 Some of it, like the circuit court, is straightforward. 53 00:02:52,020 --> 00:02:54,870 But other information required a judgment call, 54 00:02:54,870 --> 00:02:57,440 like the ideological direction of the lower court. 55 00:03:00,740 --> 00:03:03,400 Now that we have our data and variables, 56 00:03:03,400 --> 00:03:07,730 we are ready to predict the decisions of Justice Stevens. 57 00:03:07,730 --> 00:03:10,010 We can use logistic regression. 58 00:03:10,010 --> 00:03:13,590 And we get a model where some of the most significant variables 59 00:03:13,590 --> 00:03:17,079 are whether or not the case is from the 2nd circuit court, 60 00:03:17,079 --> 00:03:20,750 with a coefficient of 1.66. 61 00:03:20,750 --> 00:03:23,770 Whether or not the case is from the 4th circuit court, 62 00:03:23,770 --> 00:03:27,600 with a coefficient of 2.82. 63 00:03:27,600 --> 00:03:29,570 And whether or not the lower court decision 64 00:03:29,570 --> 00:03:34,900 was liberal, with a coefficient of negative 1.22. 65 00:03:34,900 --> 00:03:36,920 Well this tells us that the case being 66 00:03:36,920 --> 00:03:38,740 from the 2nd or 4th circuit courts 67 00:03:38,740 --> 00:03:42,200 is predictive of Justice Stevens reversing the case. 68 00:03:42,200 --> 00:03:44,850 And the lower court decision being liberal 69 00:03:44,850 --> 00:03:48,579 is predictive of Justice Stevens affirming the case. 70 00:03:48,579 --> 00:03:50,800 It's difficult to understand which factors 71 00:03:50,800 --> 00:03:52,950 are more important due to things like the scales 72 00:03:52,950 --> 00:03:55,550 of the variables, and the possibility 73 00:03:55,550 --> 00:03:57,670 of multicollinearity. 74 00:03:57,670 --> 00:03:59,880 It's also difficult to quickly evaluate 75 00:03:59,880 --> 00:04:03,880 what the prediction would be for a new case. 76 00:04:03,880 --> 00:04:06,240 So instead of logistic regression, 77 00:04:06,240 --> 00:04:08,270 Martin and his colleagues used a method 78 00:04:08,270 --> 00:04:12,250 called classification and regression trees, or CART. 79 00:04:12,250 --> 00:04:14,770 This method builds what is called a tree 80 00:04:14,770 --> 00:04:18,600 by splitting on the values of the independent variables. 81 00:04:18,600 --> 00:04:21,970 To predict the outcome for a new observation or case, 82 00:04:21,970 --> 00:04:25,110 you can follow the splits in the tree and at the end, 83 00:04:25,110 --> 00:04:26,980 you predict the most frequent outcome 84 00:04:26,980 --> 00:04:30,790 in the training set that followed the same path. 85 00:04:30,790 --> 00:04:33,690 Some advantages of CART are that it does not 86 00:04:33,690 --> 00:04:36,720 assume a linear model, like logistic regression 87 00:04:36,720 --> 00:04:41,370 or linear regression, and it's a very interpretable model. 88 00:04:41,370 --> 00:04:44,580 Let's look at an example. 89 00:04:44,580 --> 00:04:48,700 This plot shows sample data for two independent variables, x 90 00:04:48,700 --> 00:04:51,680 and y, and each data point is colored 91 00:04:51,680 --> 00:04:55,700 by the outcome variable, red or gray. 92 00:04:55,700 --> 00:04:59,100 CART tries to split this data into subsets 93 00:04:59,100 --> 00:05:04,700 so that each subset is as pure or homogeneous as possible. 94 00:05:04,700 --> 00:05:09,840 The first three splits that CART would create are shown here. 95 00:05:09,840 --> 00:05:13,100 Then the standard prediction made by a CART model 96 00:05:13,100 --> 00:05:16,040 is just the majority in each subset. 97 00:05:16,040 --> 00:05:20,640 If a new observation fell into one of these two subsets, then 98 00:05:20,640 --> 00:05:23,010 we would predict red, since the majority 99 00:05:23,010 --> 00:05:27,070 of the observations in those subsets are red. 100 00:05:27,070 --> 00:05:29,130 However, if a new observation fell 101 00:05:29,130 --> 00:05:31,920 into one of these two subsets, we 102 00:05:31,920 --> 00:05:35,270 would predict gray, since the majority of the observations 103 00:05:35,270 --> 00:05:37,130 in those two subsets are gray. 104 00:05:40,480 --> 00:05:44,780 A current model is represented by what we call a tree. 105 00:05:44,780 --> 00:05:46,890 The tree for the splits we just generated 106 00:05:46,890 --> 00:05:49,090 is shown on the right. 107 00:05:49,090 --> 00:05:54,100 The first split tests whether the variable x is less than 60. 108 00:05:54,100 --> 00:05:57,760 If yes, the model says to predict red, and if no, 109 00:05:57,760 --> 00:06:00,660 the model moves on to the next split. 110 00:06:00,660 --> 00:06:03,260 Then, the second split checks whether or not 111 00:06:03,260 --> 00:06:06,300 the variable y is less than 20. 112 00:06:06,300 --> 00:06:09,440 If no, the model says to predict gray, 113 00:06:09,440 --> 00:06:13,430 but if yes, the model moves on to the next split. 114 00:06:13,430 --> 00:06:15,500 The third split checks whether or not 115 00:06:15,500 --> 00:06:18,770 the variable x is less than 85. 116 00:06:18,770 --> 00:06:22,590 If yes, then the model says to predict red, and if no, 117 00:06:22,590 --> 00:06:25,480 the model says to predict gray. 118 00:06:25,480 --> 00:06:26,860 There are a couple things to keep 119 00:06:26,860 --> 00:06:29,300 in mind when reading trees. 120 00:06:29,300 --> 00:06:32,300 In this tree, and for the trees we'll generate in R, 121 00:06:32,300 --> 00:06:36,680 a yes response is always to the left and a no response 122 00:06:36,680 --> 00:06:38,840 is always to the right. 123 00:06:38,840 --> 00:06:42,360 Also, make sure you always start at the top of the tree. 124 00:06:42,360 --> 00:06:45,390 The x less than 85 split only counts 125 00:06:45,390 --> 00:06:50,060 for observations for which x is greater than 60 126 00:06:50,060 --> 00:06:53,720 and y is less than 20. 127 00:06:53,720 --> 00:06:56,070 In the next video, we'll discuss how 128 00:06:56,070 --> 00:06:58,800 CART decides how many splits to generate 129 00:06:58,800 --> 00:07:02,120 and how the final predictions are made.