# Classification

This module shows how logistic regression can be used for classification tasks, and explores how to evaluate the effectiveness of classification models.

# Classification

## Classification vs. Regression

• Sometimes, we use logistic regression for the probability outputs -- this is a regression in (0, 1)
• Other times, we'll threshold the value for a discrete binary classification
• Choice of threshold is an important choice, and can be tuned

## Evaluation Metrics: Accuracy

• How do we evaluate classification models?

## Evaluation Metrics: Accuracy

• How do we evaluate classification models?
• One possible measure: Accuracy
• the fraction of predictions we got right

• In many cases, accuracy is a poor or misleading metric
• Most often when different kinds of mistakes have different costs
• Typical case includes class imbalance, when positives or negatives are extremely rare

## True Positives and False Positives

• For class-imbalanced problems, useful to separate out different kinds of errors
 True Positives We correctly called wolf! We saved the town. False Positives Error: we called wolf falsely. Everyone is mad at us. False Negatives There was a wolf, but we didn't spot it. It ate all our chickens. True Negatives No wolf, no alarm.Everyone is fine.

## Evaluation Metrics: Precision and Recall

• Precision: (True Positives) / (All Positive Predictions)
• When model said "positive" class, was it right?
• Intuition: Did the model cry "wolf" too often?

## Evaluation Metrics: Precision and Recall

• Precision: (True Positives) / (All Positive Predictions)
• When model said "positive" class, was it right?
• Intuition: Did the model cry "wolf" too often?
• Recall: (True Positives) / (All Actual Positives)
• Out of all the possible positives, how many did the model correctly identify?
• Intuition: Did it miss any wolves?

Explore the options below.

Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision?
Definitely increase.
Raising the classification threshold typically increases precision; however, precision is not guaranteed to increase monotonically as we raise the threshold.
Probably increase.
In general, raising the classification threshold reduces false positives, thus raising precision.
Probably decrease.
In general, raising the classification threshold reduces false positives, thus raising precision.
Definitely decrease.
In general, raising the classification threshold reduces false positives, thus raising precision.

## A ROC Curve

Each point is the TP and FP rate at one decision threshold. ## Evaluation Metrics: AUC

• AUC: "Area under the ROC Curve"

## Evaluation Metrics: AUC

• AUC: "Area under the ROC Curve"
• Interpretation:
• If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?

## Evaluation Metrics: AUC

• AUC: "Area under the ROC Curve"
• Interpretation:
• If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?
• Intuition: gives an aggregate measure of performance aggregated across all possible classification thresholds

## Prediction Bias

• Logistic Regression predictions should be unbiased.
• average of predictions == average of observations

## Prediction Bias

• Logistic Regression predictions should be unbiased.
• average of predictions == average of observations
• Bias is a canary.
• Zero bias alone does not mean everything in your system is perfect.
• But it's a great sanity check.

## Prediction Bias (continued)

• If you have bias, you have a problem.
• Incomplete feature set?
• Buggy pipeline?
• Biased training sample?
• Don't fix bias with a calibration layer, fix it in the model.
• Look for bias in slices of data -- this can guide improvements.

## Calibration Plots Show Bucketed Bias 