Multi-Class Neural Networks

Stay organized with collections Save and categorize content based on your preferences.

Earlier, you encountered binary classification models that could pick between one of two possible choices, such as whether:

  • A given email is spam or not spam.
  • A given tumor is malignant or benign.

In this module, we'll investigate multi-class classification, which can pick from multiple possibilities. For example:

  • Is this dog a beagle, a basset hound, or a bloodhound?
  • Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris, or Dwarf Bearded Iris?
  • Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
  • Is this an image of an apple, bear, candy, dog, or egg?

Some real-world multi-class problems entail choosing from millions of separate classes. For example, consider a multi-class classification model that can identify the image of just about anything.

Multi-Class Neural Networks

  • Logistic regression gives useful probabilities for binary-class problems.
    • spam / not-spam
    • click / not-click
  • What about multi-class problems?
    • apple, banana, car, cardiologist, ..., walk sign, zebra, zoo
    • red, orange, yellow, green, blue, indigo, violet
    • animal, vegetable, mineral
  • Create a unique output for each possible class
  • Train that on a signal of "my class" vs "all other classes"
  • Can do in a deep network, or with separate models
A neural network with five hidden layers and five output layers.
  • Add an additional constraint: Require output of all one-vs-all nodes to sum to 1.0
  • The additional constraint helps training converge quickly
  • Plus, allows outputs to be interpreted as probabilities
A deep neural net with an input layer, two nondescript hidden layers, then a Softmax layer, and finally an output layer with the same number of nodes as the Softmax layer.
  • Multi-Class, Single-Label Classification:
    • An example may be a member of only one class.
    • Constraint that classes are mutually exclusive is helpful structure.
    • Useful to encode this in the loss.
    • Use one softmax loss for all possible classes.
  • Multi-Class, Multi-Label Classification:
    • An example may be a member of more than one class.
    • No additional constraints on class membership to exploit.
    • One logistic regression loss for each possible class.
  • Full SoftMax
    • Brute force; calculates for all classes.
  • Full SoftMax
    • Brute force; calculates for all classes.
  • Candidate Sampling
    • Calculates for all the positive labels, but only for a random sample of negatives.