Appropriate data for decision forests

Page Summary

Decision forests are highly effective for modeling tabular data, making them a primary choice for datasets commonly found in spreadsheets, CSV files, or databases.
Unlike neural networks, decision forests directly handle tabular data without requiring preprocessing steps like feature normalization or imputation.
While decision forests can be adapted for non-tabular data like images or text, neural networks are generally better suited for such data types.
Decision forests are sample efficient, performing well even with small datasets or those with a high feature-to-example ratio, but still benefit from larger datasets.
Decision forests offer faster inference speeds compared to neural networks, typically completing predictions within microseconds on modern CPUs.

Decision forests are most effective when you have a tabular dataset (data you might represent in a spreadsheet, csv file, or database table). Tabular data is one of the most common data formats, and decision forests should be your “go-to” solution for modeling it.

Table 1. An example of a tabular dataset.

Number of legs	Number of eyes	Weight (lbs)	Species (label)
2	2	12	Penguin
8	6	0.1	Spider
4	2	44	Dog
…	…	…	…

Unlike neural networks, decision forests natively consume model tabular data. When developing decision forests, you don't have to do tasks like the following:

Perform preprocessing like feature normalization or one-hot encoding.
Perform imputation (for example, replacing a missing value with -1).

However, decision forests are not well suited to directly consume non-tabular data (also called unstructured data), such as images or text. Yes, workarounds for this limitation do exist, but neural networks generally handle unstructured data better.

Performance

Decision forests are sample efficient. That is, decision forests are well suited for training on small datasets, or on datasets where the ratio of number of features / number of examples is high (possibly greater than 1). Even though decision forests are sample efficient, like all machine learning models, decision forests perform best when lots of data is available.

Decision forests typically infer faster than comparable neural networks. For example, a medium-size decision forest runs inference in a few microseconds on a modern CPU.

Course introduction

Overview

Appropriate data for decision forests Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Performance

Appropriate data for decision forests