1
00:00:04,730 --> 00:00:06,960
In the previous
video, we generated

2
00:00:06,960 --> 00:00:10,030
a CART tree with
three splits, but why

3
00:00:10,030 --> 00:00:13,980
not two, or four, or even five?

4
00:00:13,980 --> 00:00:15,820
There are different
ways to control

5
00:00:15,820 --> 00:00:18,210
how many splits are generated.

6
00:00:18,210 --> 00:00:21,510
One way is by setting a lower
bound for the number of data

7
00:00:21,510 --> 00:00:23,800
points in each subset.

8
00:00:23,800 --> 00:00:27,460
In R, this is called
the minbucket parameter,

9
00:00:27,460 --> 00:00:29,910
for the minimum
number of observations

10
00:00:29,910 --> 00:00:32,570
in each bucket or subset.

11
00:00:32,570 --> 00:00:36,360
The smaller minbucket is, the
more splits will be generated.

12
00:00:36,360 --> 00:00:40,880
But if it's too small,
overfitting will occur.

13
00:00:40,880 --> 00:00:43,210
This means that CART
will fit the training set

14
00:00:43,210 --> 00:00:45,330
almost perfectly.

15
00:00:45,330 --> 00:00:48,440
But this is bad because then
the model will probably not

16
00:00:48,440 --> 00:00:52,260
perform well on test
set data or new data.

17
00:00:52,260 --> 00:00:54,850
On the other hand, if
the minbucket parameter

18
00:00:54,850 --> 00:00:57,900
is too large, the model
will be too simple

19
00:00:57,900 --> 00:01:00,520
and the accuracy will be poor.

20
00:01:00,520 --> 00:01:03,250
Later in the lecture, we will
learn about a nice method

21
00:01:03,250 --> 00:01:04,879
for selecting the
stopping parameter.

22
00:01:08,000 --> 00:01:10,240
In each subset of
a CART tree, we

23
00:01:10,240 --> 00:01:12,530
have a bucket of
observations, which

24
00:01:12,530 --> 00:01:15,860
may contain both
possible outcomes.

25
00:01:15,860 --> 00:01:19,190
In the small example we
showed in the previous video,

26
00:01:19,190 --> 00:01:22,550
we have classified each
subset as either red or gray

27
00:01:22,550 --> 00:01:25,750
depending on the
majority in that subset.

28
00:01:25,750 --> 00:01:29,220
In the Supreme Court case, we'll
be classifying observations

29
00:01:29,220 --> 00:01:32,470
as either affirm or reverse.

30
00:01:32,470 --> 00:01:34,960
Instead of just taking
the majority outcome

31
00:01:34,960 --> 00:01:38,039
to be the prediction, we
can compute the percentage

32
00:01:38,039 --> 00:01:42,080
of data in a subset of
each type of outcome.

33
00:01:42,080 --> 00:01:44,690
As an example, if
we have a subset

34
00:01:44,690 --> 00:01:50,750
with 10 affirms and two
reverses, then 87% of the data

35
00:01:50,750 --> 00:01:52,650
is affirm.

36
00:01:52,650 --> 00:01:55,690
Then, just like in
logistic regression,

37
00:01:55,690 --> 00:01:59,690
we can use a threshold value
to obtain our prediction.

38
00:01:59,690 --> 00:02:02,810
For this example, we
would predict affirm

39
00:02:02,810 --> 00:02:07,340
with a threshold of 0.5
since the majority is affirm.

40
00:02:07,340 --> 00:02:10,509
But if we increase
that threshold to 0.9,

41
00:02:10,509 --> 00:02:12,930
we would predict reverse
for this example.

42
00:02:15,860 --> 00:02:18,410
Then by varying the
threshold value,

43
00:02:18,410 --> 00:02:21,730
we can compute an
ROC curve and compute

44
00:02:21,730 --> 00:02:25,610
an AUC value to
evaluate our model.

45
00:02:25,610 --> 00:02:28,480
In the next video, we'll
build a CART tree in R

46
00:02:28,480 --> 00:02:31,390
to predict the decisions
of Justice Stevens

47
00:02:31,390 --> 00:02:35,329
and evaluate our model
using a ROC curve.