Classification

This module shows how logistic regression can be used for classification tasks, and explores how to evaluate the effectiveness of classification models.

Classification

  • Sometimes, we use logistic regression for the probability outputs -- this is a regression in (0, 1)
  • Other times, we'll threshold the value for a discrete binary classification
  • Choice of threshold is an important choice, and can be tuned
  • How do we evaluate classification models?
  • How do we evaluate classification models?
  • One possible measure: Accuracy
    • the fraction of predictions we got right
  • In many cases, accuracy is a poor or misleading metric
    • Most often when different kinds of mistakes have different costs
    • Typical case includes class imbalance, when positives or negatives are extremely rare
  • For class-imbalanced problems, useful to separate out different kinds of errors
True Positives
We correctly called wolf!
We saved the town.

False Positives
Error: we called wolf falsely.
Everyone is mad at us.

False Negatives
There was a wolf, but we didn't spot it. It ate all our chickens.
True Negatives
No wolf, no alarm.
Everyone is fine.

  • Precision: (True Positives) / (All Positive Predictions)
    • When model said "positive" class, was it right?
    • Intuition: Did the model cry "wolf" too often?
  • Precision: (True Positives) / (All Positive Predictions)
    • When model said "positive" class, was it right?
    • Intuition: Did the model cry "wolf" too often?
  • Recall: (True Positives) / (All Actual Positives)
    • Out of all the possible positives, how many did the model correctly identify?
    • Intuition: Did it miss any wolves?

Explore the options below.

Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision?
Definitely increase.
Raising the classification threshold typically increases precision; however, precision is not guaranteed to increase monotonically as we raise the threshold.
Probably increase.
In general, raising the classification threshold reduces false positives, thus raising precision.
Probably decrease.
In general, raising the classification threshold reduces false positives, thus raising precision.
Definitely decrease.
In general, raising the classification threshold reduces false positives, thus raising precision.

Each point is the TP and FP rate at one decision threshold.

ROC Curve showing TP Rate vs. FP Rate at different classification thresholds.
  • AUC: "Area under the ROC Curve"
  • AUC: "Area under the ROC Curve"
  • Interpretation:
    • If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?
  • AUC: "Area under the ROC Curve"
  • Interpretation:
    • If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?
  • Intuition: gives an aggregate measure of performance aggregated across all possible classification thresholds
  • Logistic Regression predictions should be unbiased.
    • average of predictions == average of observations
  • Logistic Regression predictions should be unbiased.
    • average of predictions == average of observations
  • Bias is a canary.
    • Zero bias alone does not mean everything in your system is perfect.
    • But it's a great sanity check.
  • If you have bias, you have a problem.
    • Incomplete feature set?
    • Buggy pipeline?
    • Biased training sample?
  • Don't fix bias with a calibration layer, fix it in the model.
  • Look for bias in slices of data -- this can guide improvements.
A calibration plot