此页面由 Cloud Translation API 翻译。

种植决策树

与所有监督式机器学习模型一样，决策树经过训练以最好地解释一组训练样本。决策树的最佳训练是一个 NP 难题。因此，通常使用启发法完成训练，这是一种易于创建的学习算法，可提供非最优但接近最优决策树。

用于训练决策树的大多数算法都使用贪心的分而治之策略。该算法首先创建单个节点（根节点），然后以递归方式、贪心的方式生长决策树。

在每个节点中，所有可能的条件都会得到评估和评分。该算法会选择“最佳”条件，即得分最高的条件。现在，您只需要知道得分是与任务相关的指标，并选择条件来使该指标最大化即可。

例如，在巴尔默企鹅数据集（本课程稍后会用于代码示例）中，大多数 Adelie 和 Chinstrap 企鹅的帐单长度大于 16 毫米，而大多数巴尔默企鹅的帐单都较小。因此，条件 bill_length_mm ≥ 16 对根托企鹅的预测几乎是完美的，但无法区分阿德利斯和帽带。算法很可能会选择此条件。

一种情况导致两叶。条件是“bill_length_mm >= 16”。
如果是，叶子是“Adelie or Chinstrap”。如果不是，则叶子是“Gentoo”。

图 7. 种植一棵树的第一步。

然后，该算法在两个子节点上以递归且独立的方式重复。如果未找到满足条件，该节点就会成为叶节点。叶预测被确定为示例中最具代表性的标签值。

算法如下：

def train_decision_tree(training_examples):
  root = create_root() # Create a decision tree with a single empty root.
  grow_tree(root, training_examples) # Grow the root node.
  return root

def grow_tree(node, examples):
  condition = find_best_condition(examples) # Find the best condition.

  if condition is None:
    # No satisfying conditions were found, therefore the grow of the branch stops.
    set_leaf_prediction(node, examples)
    return

  # Create two childrens for the node.
  positive_child, negative_child = split_node(node, condition)

  # List the training examples used by each children.
  negative_examples = [example for example in examples if not condition(example)]
  positive_examples = [example for example in examples if condition(example)]

  # Continue the growth of the children.
  grow_tree(negative_child, negative_examples)
  grow_tree(positive_child, positive_examples)

我们来更详细地了解训练特定决策树的步骤。

第 1 步：创建根：

一个带问号的节点。