Accuracy
Explore the options below.
In which of the following scenarios would a high accuracy value suggest
that the ML model is doing a good job?
A deadly, but curable, medical condition afflicts .01% of the
population. An ML model uses symptoms as features and predicts
this affliction with an accuracy of 99.99%.
Accuracy is a poor metric here. After all, even a "dumb" model
that always predicts "not sick" would still be 99.99% accurate.
Mistakenly predicting "not sick" for a person who actually is sick
could be deadly.
An expensive robotic chicken crosses a very busy road a
thousand times per day. An ML model evaluates traffic patterns and
predicts when this chicken can safely cross the street with an
accuracy of 99.99%.
A 99.99% accuracy value on a very busy road strongly suggests that
the ML model is far better than chance. In some settings, however,
the cost of making even a small number of mistakes is still too high.
99.99% accuracy means that the expensive chicken will need to be
replaced, on average, every 10 days. (The chicken might also cause
extensive damage to cars that it hits.)
In the game of roulette, a ball
is dropped on a spinning wheel and eventually lands in one of 38
slots. Using visual features (the spin of the ball, the position of
the wheel when the ball was dropped, the height of the ball over the
wheel), an ML model can predict the slot that the ball will land in
with an accuracy of 4%.
This ML model is making predictions far better than chance; a random
guess would be correct 1/38 of the time—yielding an accuracy of 2.6%.
Although the model's accuracy is "only" 4%, the benefits of success
far outweigh the disadvantages of failure.
Precision
Explore the options below.
Consider a classification model that separates email into two categories:
"spam" or "not spam." If you raise the classification threshold, what will
happen to precision?
Definitely increase.
Raising the classification threshold typically increases precision;
however, precision is not guaranteed to increase monotonically
as we raise the threshold.
Probably increase.
In general, raising the classification threshold reduces false
positives, thus raising precision.
Probably decrease.
In general, raising the classification threshold reduces false
positives, thus raising precision.
Definitely decrease.
In general, raising the classification threshold reduces false
positives, thus raising precision.
Recall
Explore the options below.
Consider a classification model that separates email into two categories:
"spam" or "not spam." If you raise the classification threshold, what will
happen to recall?
Always increase.
Raising the classification threshold will cause both of the following:
- The number of true positives will decrease or stay the same.
- The number of false negatives will increase or stay the same.
Always decrease or stay the same.
Raising our classification threshold will cause the number of
true positives to decrease or stay the same and will cause the
number of false negatives to increase or stay the same. Thus,
recall will either stay constant or decrease.
Always stay constant.
Raising our classification threshold will cause the number of
true positives to decrease or stay the same and will cause the
number of false negatives to increase or stay the same. Thus,
recall will either stay constant or decrease.
Precision and Recall
Explore the options below.
Consider two models—A and B—that each evaluate the same dataset.
Which one of the following statements is true?
If Model A has better precision than model B, then
model A is better.
While better precision is good, it might be coming at the expense
of a large reduction in recall. In general, we need to look at
both precision and recall together, or summary metrics like AUC
which we'll talk about next.
If model A has better recall than model B, then model A is
better.
While better recall is good, it might be coming at the
expense of a large reduction in precision. In general, we need
to look at both precision and recall together, or summary metrics
like AUC, which we'll talk about next.
If model A has better precision and better recall than model B,
then model A is probably better.
In general, a model that outperforms another model on both
precision and recall is likely the better model. Obviously,
we'll need to make sure that comparison is being done at a
precision / recall point that is useful in practice for this
to be meaningful. For example, suppose our spam detection model
needs to have at least 90% precision to be useful and avoid
unnecessary false alarms. In this case, comparing
one model at {20% precision, 99% recall} to another at
{15% precision, 98% recall} is not particularly instructive, as
neither model meets the 90% precision requirement. But with that caveat
in mind, this is a good way to think about comparing models when using
precision and recall.