Classification: Check Your Understanding (ROC and AUC)

ROC and AUC

Explore the options below.

Which of the following ROC curves produce AUC values greater than 0.5?

This is the best possible ROC curve, as it ranks all positives
above all negatives. It has an AUC of 1.0.

In practice, if you have a "perfect" classifier with an AUC of 1.0,
you should be suspicious, as it likely indicates a bug in your model. For example,
you may have overfit to your training data, or the label data may be replicated
in one of your features.

This is the worst possible ROC curve; it ranks all negatives above all positives, and has
an AUC of 0.0. If you were to reverse every prediction (flip negatives to positives and
postives to negatives), you'd actually have a perfect classifier!

This ROC curve has an AUC of 0.5, meaning it ranks a random positive example
higher than a random negative example 50% of the time. As such, the
corresponding classification model is basically worthless, as its predictive
ability is no better than random guessing.

This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive
example higher than a random negative example more than 50% of the time. Real-world
binary classification AUC values generally fall into this range.

This ROC curve has an AUC between 0 and 0.5, meaning it ranks a random positive
example higher than a random negative example less than 50% of the time.
The corresponding model actually performs worse than random guessing! If you
see an ROC curve like this, it likely indicates there's a bug in your data.

AUC and Scaling Predictions

Explore the options below.

How would multiplying all of the predictions from a given model by 2.0 (for
example, if the model predicts 0.4, we multiply by 2.0 to get a prediction
of 0.8) change the model's performance as measured by AUC?

No change. AUC only cares about relative prediction scores.

Yes, AUC is based on the relative predictions, so any transformation of
the predictions that preserves the relative ranking has no effect on AUC.
This is clearly not the case for other metrics such as squared error,
log loss, or prediction bias (discussed later).

It would make AUC terrible, since the prediction values are now way off.

Interestingly enough, even though the prediction values are different (and
likely farther from the truth), multiplying them all by 2.0 would keep the relative
ordering of prediction values the same. Since AUC only cares about relative rankings,
it is not impacted by any simple scaling of the predictions.

It would make AUC better, because the prediction values are all farther apart.

The amount of spread between predictions does not actually impact AUC. Even a
prediction score for a randomly drawn true positive is only a tiny epsilon greater than a randomly
drawn negative, that will count that as a success contributing to the overall
AUC score.