Multi-Class Neural Networks: One vs. All

One vs. all provides a way to leverage binary classification. Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. During training, the model runs through a sequence of binary classifiers, training each to answer a separate classification question. For example, given a picture of a dog, five different recognizers might be trained, four seeing the image as a negative example (not an apple, not a bear, etc.) and one seeing the image as a positive example (a dog). That is:

  1. Is this image an apple? No.
  2. Is this image a bear? No.
  3. Is this image candy? No.
  4. Is this image a dog? Yes.
  5. Is this image an egg? No.

This approach is fairly reasonable when the total number of classes is small, but becomes increasingly inefficient as the number of classes rises.

We can create a significantly more efficient one-vs.-all model with a deep neural network in which each output node represents a different class. The following figure suggests this approach:

A neural network with five hidden layers and five output layers.

Figure 1. A one-vs.-all neural network.