Google is committed to advancing racial equity for Black communities. See how.

A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classificaton of "not sarcastic"):

Adults

True Positives (TPs): 512 False Positives (FPs): 51
False Negatives (FNs): 36 True Negatives (TNs): 9401
$$\text{Precision} = \frac{TP}{TP+FP} = 0.909$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.934$$

Minors

True Positives (TPs): 2147 False Positives (FPs): 96
False Negatives (FNs): 2177 True Negatives (TNs): 5580
$$\text{Precision} = \frac{TP}{TP+FP} = 0.957$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.497$$

Explore the options below.

Which of the following statements about the model's test-set performance are true?
Overall, the model performs better on examples from adults than on examples from minors.

The model achieves both precision and recall rates over 90% when detecting sarcasm in text messages from adults.

While the model achieves a slightly higher precision rate for minors than adults, the recall rate is substantially lower for minors, resulting in less reliable predictions for this group.

The model fails to classify approximately 50% of minors' sarcastic messages as "sarcastic."
The recall rate of 0.497 for minors indicates that the model predicts "not sarcastic" for approximately 50% of minors' sarcastic texts.
Approximately 50% of messages sent by minors are classified as "sarcastic" incorrectly.
The precision rate of 0.957 indicates that over 95% of minors' messages classified as "sarcastic" are actually sarcastic.
The 10,000 messages sent by adults are a class-imbalanced dataset.
If we compare the number of messages from adults that are actually sarcastic (TP+FN = 548) with the number of messages that are actually not sarcastic (TN + FP = 9452), we see that "not sarcastic" labels outnumber "sarcastic" labels by a ratio of approximately 17:1.
The 10,000 messages sent by minors are a class-imbalanced dataset.
If we compare the number of messages from minors that are actually sarcastic (TP+FN = 4324) with the number of messages that are actually not sarcastic (TN + FP = 5676), we see that there is a 1.3:1 ratio of "not sarcastic" labels to "sarcastic" labels. Given that the distribution of labels between the two classes is quite close to 50/50, this is not a class-imbalanced dataset.

Explore the options below.

Engineers are working on retraining this model to address inconsistencies in sarcasm-detection accuracy across age demographics, but the model has already been released into production. Which of the following stopgap strategies will help mitigate errors in the model's predictions?
Restrict the model's usage to text messages sent by adults.

The model performs well on text messages from adults (with precision and recall rates both above 90%), so restricting its use to this group will sidestep the systematic errors in classifying minors' text messages.

When the model predicts "not sarcastic" for text messages sent by minors, adjust the output so the model returns a value of "unsure" instead.

The precision rate for text messages sent by minors is high, which means that when the model predicts "sarcastic" for this group, it is nearly always correct.

The problem is that recall is very low for minors; The model fails to identify sarcasm in approximately 50% of examples. Given that the model's negative predictions for minors are no better than random guesses, we can avoid these errors by not providing a prediction in these cases.

Restrict the model's usage to text messages sent by minors.

The systematic errors in this model are specific to text messages sent by minors. Restricting the model's use to the group more susceptible to error would not help.

Adjust the model output so that it returns "sarcastic" for all text messages sent by minors, regardless of what the model originally predicted.

Always predicting "sarcastic" for minors' text messages would increase the recall rate from 0.497 to 1.0, as the model would no longer fail to identify any messages as sarcastic. However, this increase in recall would come at the expense of precision. All the true negatives would be changed to false positives:

True Positives (TPs): 4324 False Positives (FPs): 5676
False Negatives (FNs): 0 True Negatives (TNs): 0

which would decrease the precision rate from 0.957 to 0.432. So, adding this calibration would change the type of error but would not mitigate the magnitude of the error.