Multi-Class Neural Networks

Earlier, you encountered binary classification models that could pick between one of two possible choices, such as whether:

  • A given email is spam or not spam.
  • A given tumor is malignant or benign.

In this module, we'll investigate multi-class classification, which can pick from multiple possibilities. For example:

  • Is this dog a beagle, a basset hound, or a bloodhound?
  • Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris, or Dwarf Bearded Iris?
  • Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
  • Is this an image of an apple, bear, candy, dog, or egg?

Some real-world multi-class problems entail choosing from millions of separate classes. For example, consider a multi-class classification model that can identify the image of just about anything.

Multi-Class Neural Networks

  • Logistic regression gives useful probabilities for binary-class problems.
    • spam / not-spam
    • click / not-click
  • What about multi-class problems?
    • apple, banana, car, cardiologist, ..., walk sign, zebra, zoo
    • red, orange, yellow, green, blue, indigo, violet
    • animal, vegetable, mineral
  • Create a unique output for each possible class
  • Train that on a signal of "my class" vs "all other classes"
  • Can do in a deep network, or with separate models
A neural network with five hidden layers and five output layers.
  • Add an additional constraint: Require output of all one-vs-all nodes to sum to 1.0
  • The additional constraint helps training converge quickly
  • Plus, allows outputs to be interpreted as probabilities
A deep neural net with an input layer, two nondescript hidden layers, then a Softmax layer, and finally an output layer with the same number of nodes as the Softmax layer.
  • Multi-Class, Single-Label Classification:
    • An example may be a member of only one class.
    • Constraint that classes are mutually exclusive is helpful structure.
    • Useful to encode this in the loss.
    • Use one softmax loss for all possible classes.
  • Multi-Class, Multi-Label Classification:
    • An example may be a member of more than one class.
    • No additional constraints on class membership to exploit.
    • One logistic regression loss for each possible class.
  • Full SoftMax
    • Brute force; calculates for all classes.
  • Full SoftMax
    • Brute force; calculates for all classes.
  • Candidate Sampling
    • Calculates for all the positive labels, but only for a random sample of negatives.