4.2 Judge, Jury, and Classifier: An Introduction to Trees

Quick Question

Important Note: When creating random forest models, you might still get different answers from the ones you see here even if you set the random seed. This has to do with different operating systems and the random forest implementation.

Let's see what happens if we set the seed to two different values and create two different random forest models.

First, set the seed to 100, and the re-build the random forest model, exactly like we did in the previous video (Video 5). Then make predictions on the test set. What is the accuracy of the model on the test set?

Exercise 1

 

Now, set the seed to 200, and then re-build the random forest model, exactly like we did in the previous video (Video 5). Then make predictions on the test set. What is the accuracy of this model on the test set?

Exercise 2

 

Explanation

You can create the models and compute the accuracies with the following commands in R:

set.seed(100)

StevensForest = randomForest(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, ntree=200, nodesize=25)

PredictForest = predict(StevensForest, newdata = Test)

table(Test$Reverse, PredictForest)

and then repeat it, but with set.seed(200) first.

As we see here, the random component of the random forest method can change the accuracy. The accuracy for a more stable dataset will not change very much, but a noisy dataset can be significantly affected by the random samples.