1
00:00:04,500 --> 00:00:07,450
To predict the outcomes
of the Supreme Court,

2
00:00:07,450 --> 00:00:12,100
Martin used cases from
1994 through 2001.

3
00:00:12,100 --> 00:00:15,200
He chose this period of time
because the Supreme Court

4
00:00:15,200 --> 00:00:18,530
was composed of the
same nine justices that

5
00:00:18,530 --> 00:00:22,740
were justices when he made
his predictions in 2002.

6
00:00:22,740 --> 00:00:29,250
These nine justices were Breyer,
Ginsburg, Kennedy, O'Connor,

7
00:00:29,250 --> 00:00:34,750
Rehnquist-- who was the Chief
Justice-- Scalia, Souter,

8
00:00:34,750 --> 00:00:37,580
Stevens, and Thomas.

9
00:00:37,580 --> 00:00:41,600
This was a very rare data set,
since as we mentioned earlier,

10
00:00:41,600 --> 00:00:43,540
this was the longest
period of time

11
00:00:43,540 --> 00:00:48,050
with the same set of
justices in over 180 years.

12
00:00:48,050 --> 00:00:51,170
This allowed Martin to
use a larger data set

13
00:00:51,170 --> 00:00:52,670
then might have
been available if he

14
00:00:52,670 --> 00:00:56,200
was doing this experiment
at a different time.

15
00:00:56,200 --> 00:00:58,810
In this lecture, we'll
focus on predicting

16
00:00:58,810 --> 00:01:01,320
Justice Stevens' decisions.

17
00:01:01,320 --> 00:01:03,340
He is generally
considered a justice

18
00:01:03,340 --> 00:01:05,650
who started out
moderate, but became

19
00:01:05,650 --> 00:01:08,840
more liberal during his
time on the Supreme Court--

20
00:01:08,840 --> 00:01:11,550
although, he's a
self-proclaimed conservative.

21
00:01:14,560 --> 00:01:17,160
In this problem, our
dependent variable

22
00:01:17,160 --> 00:01:19,130
is whether or not
Justice Stevens

23
00:01:19,130 --> 00:01:22,460
voted to reverse the
lower court decision.

24
00:01:22,460 --> 00:01:26,630
This is a binary variable taking
value 1 if Justice Stevens

25
00:01:26,630 --> 00:01:30,820
decided to reverse or overturn
the lower court decision,

26
00:01:30,820 --> 00:01:33,780
and taking value 0 if
Justice Stevens voted

27
00:01:33,780 --> 00:01:37,729
to affirm or maintain
the lower court decision.

28
00:01:37,729 --> 00:01:40,900
Our independent variables
are six different properties

29
00:01:40,900 --> 00:01:42,710
of the case.

30
00:01:42,710 --> 00:01:46,330
The circuit court of
origin is the circuit

31
00:01:46,330 --> 00:01:49,150
or lower court where
the case came from.

32
00:01:49,150 --> 00:01:52,950
There are 13 different circuit
courts in the United States.

33
00:01:52,950 --> 00:01:55,009
The 1st through
11th and Washington,

34
00:01:55,009 --> 00:01:58,060
DC courts are defined by region.

35
00:01:58,060 --> 00:02:00,590
And the federal court is
defined by the subject

36
00:02:00,590 --> 00:02:02,980
matter of the case.

37
00:02:02,980 --> 00:02:05,790
The issue area of the
case gives each case

38
00:02:05,790 --> 00:02:10,610
a category, like civil
rights or federal taxation.

39
00:02:10,610 --> 00:02:13,720
The type of petitioner
and type of respondent

40
00:02:13,720 --> 00:02:16,180
define two parties in the case.

41
00:02:16,180 --> 00:02:20,110
Some examples are the
United States, an employer,

42
00:02:20,110 --> 00:02:22,530
or an employee.

43
00:02:22,530 --> 00:02:26,440
The ideological direction
of the lower court decision

44
00:02:26,440 --> 00:02:28,130
describes whether
the lower court

45
00:02:28,130 --> 00:02:30,829
made what was
considered a liberal

46
00:02:30,829 --> 00:02:34,250
or a conservative decision.

47
00:02:34,250 --> 00:02:36,900
The last variable
indicates whether or not

48
00:02:36,900 --> 00:02:39,890
the petitioner argued
that a law or practice was

49
00:02:39,890 --> 00:02:42,240
unconstitutional.

50
00:02:42,240 --> 00:02:45,160
To collect this data,
Martin and his colleagues

51
00:02:45,160 --> 00:02:48,890
read through all of the cases
and coded the information.

52
00:02:48,890 --> 00:02:52,020
Some of it, like the circuit
court, is straightforward.

53
00:02:52,020 --> 00:02:54,870
But other information
required a judgment call,

54
00:02:54,870 --> 00:02:57,440
like the ideological
direction of the lower court.

55
00:03:00,740 --> 00:03:03,400
Now that we have our
data and variables,

56
00:03:03,400 --> 00:03:07,730
we are ready to predict the
decisions of Justice Stevens.

57
00:03:07,730 --> 00:03:10,010
We can use logistic regression.

58
00:03:10,010 --> 00:03:13,590
And we get a model where some of
the most significant variables

59
00:03:13,590 --> 00:03:17,079
are whether or not the case
is from the 2nd circuit court,

60
00:03:17,079 --> 00:03:20,750
with a coefficient of 1.66.

61
00:03:20,750 --> 00:03:23,770
Whether or not the case is
from the 4th circuit court,

62
00:03:23,770 --> 00:03:27,600
with a coefficient of 2.82.

63
00:03:27,600 --> 00:03:29,570
And whether or not the
lower court decision

64
00:03:29,570 --> 00:03:34,900
was liberal, with a
coefficient of negative 1.22.

65
00:03:34,900 --> 00:03:36,920
Well this tells us
that the case being

66
00:03:36,920 --> 00:03:38,740
from the 2nd or
4th circuit courts

67
00:03:38,740 --> 00:03:42,200
is predictive of Justice
Stevens reversing the case.

68
00:03:42,200 --> 00:03:44,850
And the lower court
decision being liberal

69
00:03:44,850 --> 00:03:48,579
is predictive of Justice
Stevens affirming the case.

70
00:03:48,579 --> 00:03:50,800
It's difficult to
understand which factors

71
00:03:50,800 --> 00:03:52,950
are more important due
to things like the scales

72
00:03:52,950 --> 00:03:55,550
of the variables,
and the possibility

73
00:03:55,550 --> 00:03:57,670
of multicollinearity.

74
00:03:57,670 --> 00:03:59,880
It's also difficult
to quickly evaluate

75
00:03:59,880 --> 00:04:03,880
what the prediction
would be for a new case.

76
00:04:03,880 --> 00:04:06,240
So instead of
logistic regression,

77
00:04:06,240 --> 00:04:08,270
Martin and his
colleagues used a method

78
00:04:08,270 --> 00:04:12,250
called classification and
regression trees, or CART.

79
00:04:12,250 --> 00:04:14,770
This method builds
what is called a tree

80
00:04:14,770 --> 00:04:18,600
by splitting on the values
of the independent variables.

81
00:04:18,600 --> 00:04:21,970
To predict the outcome for
a new observation or case,

82
00:04:21,970 --> 00:04:25,110
you can follow the splits
in the tree and at the end,

83
00:04:25,110 --> 00:04:26,980
you predict the most
frequent outcome

84
00:04:26,980 --> 00:04:30,790
in the training set that
followed the same path.

85
00:04:30,790 --> 00:04:33,690
Some advantages of CART
are that it does not

86
00:04:33,690 --> 00:04:36,720
assume a linear model,
like logistic regression

87
00:04:36,720 --> 00:04:41,370
or linear regression, and it's
a very interpretable model.

88
00:04:41,370 --> 00:04:44,580
Let's look at an example.

89
00:04:44,580 --> 00:04:48,700
This plot shows sample data for
two independent variables, x

90
00:04:48,700 --> 00:04:51,680
and y, and each data
point is colored

91
00:04:51,680 --> 00:04:55,700
by the outcome
variable, red or gray.

92
00:04:55,700 --> 00:04:59,100
CART tries to split
this data into subsets

93
00:04:59,100 --> 00:05:04,700
so that each subset is as pure
or homogeneous as possible.

94
00:05:04,700 --> 00:05:09,840
The first three splits that CART
would create are shown here.

95
00:05:09,840 --> 00:05:13,100
Then the standard prediction
made by a CART model

96
00:05:13,100 --> 00:05:16,040
is just the majority
in each subset.

97
00:05:16,040 --> 00:05:20,640
If a new observation fell into
one of these two subsets, then

98
00:05:20,640 --> 00:05:23,010
we would predict red,
since the majority

99
00:05:23,010 --> 00:05:27,070
of the observations in
those subsets are red.

100
00:05:27,070 --> 00:05:29,130
However, if a new
observation fell

101
00:05:29,130 --> 00:05:31,920
into one of these
two subsets, we

102
00:05:31,920 --> 00:05:35,270
would predict gray, since the
majority of the observations

103
00:05:35,270 --> 00:05:37,130
in those two subsets are gray.

104
00:05:40,480 --> 00:05:44,780
A current model is represented
by what we call a tree.

105
00:05:44,780 --> 00:05:46,890
The tree for the splits
we just generated

106
00:05:46,890 --> 00:05:49,090
is shown on the right.

107
00:05:49,090 --> 00:05:54,100
The first split tests whether
the variable x is less than 60.

108
00:05:54,100 --> 00:05:57,760
If yes, the model says to
predict red, and if no,

109
00:05:57,760 --> 00:06:00,660
the model moves on
to the next split.

110
00:06:00,660 --> 00:06:03,260
Then, the second split
checks whether or not

111
00:06:03,260 --> 00:06:06,300
the variable y is less than 20.

112
00:06:06,300 --> 00:06:09,440
If no, the model
says to predict gray,

113
00:06:09,440 --> 00:06:13,430
but if yes, the model
moves on to the next split.

114
00:06:13,430 --> 00:06:15,500
The third split
checks whether or not

115
00:06:15,500 --> 00:06:18,770
the variable x is less than 85.

116
00:06:18,770 --> 00:06:22,590
If yes, then the model says
to predict red, and if no,

117
00:06:22,590 --> 00:06:25,480
the model says to predict gray.

118
00:06:25,480 --> 00:06:26,860
There are a couple
things to keep

119
00:06:26,860 --> 00:06:29,300
in mind when reading trees.

120
00:06:29,300 --> 00:06:32,300
In this tree, and for the
trees we'll generate in R,

121
00:06:32,300 --> 00:06:36,680
a yes response is always to
the left and a no response

122
00:06:36,680 --> 00:06:38,840
is always to the right.

123
00:06:38,840 --> 00:06:42,360
Also, make sure you always
start at the top of the tree.

124
00:06:42,360 --> 00:06:45,390
The x less than 85
split only counts

125
00:06:45,390 --> 00:06:50,060
for observations for
which x is greater than 60

126
00:06:50,060 --> 00:06:53,720
and y is less than 20.

127
00:06:53,720 --> 00:06:56,070
In the next video,
we'll discuss how

128
00:06:56,070 --> 00:06:58,800
CART decides how many
splits to generate

129
00:06:58,800 --> 00:07:02,120
and how the final
predictions are made.