Other topics

This unit examines the following topics:

  • interpreting random forests
  • training random forests
  • pros and cons of random forests

Interpreting random forests

Random forests are more complex to interpret than decision trees. Random forests contain decision trees trained with random noise. Therefore, it is harder to make judgments on the decision tree structure. However, we can interpret random forest models in a couple of ways.

One approach to interpret a random forest is simply to train and interpret a decision tree with the CART algorithm. Because both random forest and CART are trained with the same core algorithm, they "share the same global view" of the dataset. This option works well for simple datasets and to understand the overall interpretation of the model.

Variable importances are another good interpretability approach. For example, the following table ranks the variable importance of different features for a random forest model trained on the Census dataset (also known as Adult).

Table 8. Variable importance of 14 different features.

Feature Sum score Mean decrease in accuracy Mean decrease in AUC Mean min depth Num nodes Mean decrease in PR-AUC Num as root
relationship

4203592.6

0.0045

0.0172

4.970

57040

0.0093

1095

capital_gain

3363045.1

0.0199

0.0194

2.852

56468

0.0655

457

marital_status

3128996.3

0.0018

0.0230

6.633

52391

0.0107

750

age

2520658.8

0.0065

0.0074

4.969

356784

0.0033

200

education

2015905.4

0.0018

-0.0080

5.266

115751

-0.0129

205

occupation

1939409.3

0.0063

-0.0040

5.017

221935

-0.0060

62

education_num

1673648.4

0.0023

-0.0066

6.009

58303

-0.0080

197

fnlwgt

1564189.0

-0.0002

-0.0038

9.969

431987

-0.0049

0

hours_per_week

1333976.3

0.0030

0.0007

6.393

206526

-0.0031

20

capital_loss

866863.8

0.0060

0.0020

8.076

58531

0.0118

1

workclass

644208.4

0.0025

-0.0019

9.898

132196

-0.0023

0

native_country

538841.2

0.0001

-0.0016

9.434

67211

-0.0058

0

sex

226049.3

0.0002

0.0002

10.911

37754

-0.0011

13

race

168180.9

-0.0006

-0.0004

11.571

42262

-0.0031

0

As you see, different definitions of variable importances have different scales and can lead to differences in the ranking of the features.

Variable importances that come from the model structure (for example, sum score, mean min depth, num nodes and num as root in the table above) are computed similarly for decision trees (see section "Cart | Variable importance") and random forests.

Permutation variable importance (for example, mean decrease in {accuracy, auc, pr-auc} in the table above) are model agnostic measures that can be computed on any machine learning model with a validation dataset. With random forest, however, instead of using a validation dataset, you can compute permutation variable importance with out-of-bag evaluation.

SHAP (SHapley Additive exPlanations) is a model agnostic method to explain individual predictions or model-wise interpretation. (See Interpretable Machine Learning by Molnar for an introduction to model agnostic interpretation.) SHAP is ordinarily expensive to compute but can be speeded-up significantly for decision forests, so it is a good way to interpret decision forests.

Usage example

In the previous lesson, we trained a CART decision tree on a small dataset by calling tfdf.keras.CartModel. To train a random forest model, simply replace tfdf.keras.CartModel with tfdf.keras.RandomForestModel:

model = tfdf.keras.RandomForestModel()
model.fit(tf_train_dataset)

Pros and cons

This section contains a quick summary of the pros and cons of random forests.

Pros:

  • Like decision trees, random forests support natively numerical and categorical features and often do not need feature pre-processing.
  • Because the decision trees are independent, random forests can be trained in parallel. Consequently, you can train random forests quickly.
  • Random forests have default parameters that often give great results. Tuning those parameters often has little effect on the model.

Cons:

  • Because decision trees are not pruned, they can be large. Models with more than 1M nodes are common. The size (and therefore inference speed) of the random forest can sometimes be an issue.
  • Random forests cannot learn and reuse internal representations. Each decision tree (and each branch of each decision tree) must relearn the dataset pattern. In some datasets, notably non-tabular dataset (e.g. image, text), this leads random forests to worse results than other methods.