# Multi-Class Neural Networks

Earlier, you encountered binary classification models that could pick between one of two possible choices, such as whether:

• A given email is spam or not spam.
• A given tumor is malignant or benign.

In this module, we'll investigate multi-class classification, which can pick from multiple possibilities. For example:

• Is this dog a beagle, a basset hound, or a bloodhound?
• Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris, or Dwarf Bearded Iris?
• Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
• Is this an image of an apple, bear, candy, dog, or egg?

Some real-world multi-class problems entail choosing from millions of separate classes. For example, consider a multi-class classification model that can identify the image of just about anything.

# Multi-Class Neural Networks

• Logistic regression gives useful probabilities for binary-class problems.
• spam / not-spam
• click / not-click
• apple, banana, car, cardiologist, ..., walk sign, zebra, zoo
• red, orange, yellow, green, blue, indigo, violet
• animal, vegetable, mineral
• Create a unique output for each possible class
• Train that on a signal of "my class" vs "all other classes"
• Can do in a deep network, or with separate models
• Add an additional constraint: Require output of all one-vs-all nodes to sum to 1.0
• The additional constraint helps training converge quickly
• Plus, allows outputs to be interpreted as probabilities
• Multi-Class, Single-Label Classification:
• An example may be a member of only one class.
• Constraint that classes are mutually exclusive is helpful structure.
• Useful to encode this in the loss.
• Use one softmax loss for all possible classes.
• Multi-Class, Multi-Label Classification:
• An example may be a member of more than one class.
• No additional constraints on class membership to exploit.
• One logistic regression loss for each possible class.
• Full SoftMax
• Brute force; calculates for all classes.
• Full SoftMax
• Brute force; calculates for all classes.
• Candidate Sampling
• Calculates for all the positive labels, but only for a random sample of negatives.