AI-generated Key Takeaways
-
Machine learning engineers use two primary strategies to mitigate bias in models: augmenting training data and adjusting the model's loss function.
-
Augmenting training data involves collecting additional data to address missing, incorrect, or skewed data, but it can be infeasible due to data availability or resource constraints.
-
Adjusting the model's loss function involves using fairness-aware optimization functions like MinDiff or Counterfactual Logit Pairing to penalize errors based on sensitive attributes and counteract imbalances in training data.
-
MinDiff aims to balance errors between different data slices by penalizing differences in prediction distributions, while Counterfactual Logit Pairing penalizes discrepancies in predictions for similar examples with different sensitive attribute values.
-
Choosing the right bias-mitigation technique depends on the specific use case of the model, and augmenting training data and adjusting the loss function can be used in conjunction for optimal bias reduction.
Once a source of bias has been identified in the training data, we can take proactive steps to mitigate its effects. There are two main strategies that machine learning (ML) engineers typically employ to remediate bias:
- Augmenting the training data.
- Adjusting the model's loss function.
Augmenting the training data
If an audit of the training data has uncovered issues with missing, incorrect, or skewed data, the most straightforward way to address the problem is often to collect additional data.
However, while augmenting the training data can be ideal, the downside of this approach is that it can also be infeasible, either due to a lack of available data or resource constraints that impede data collection. For example, gathering more data might be too costly or time-consuming, or not viable due to legal/privacy restrictions.
Adjusting the model's optimization function
In cases where collecting additional training data is not viable, another approach for mitigating bias is to adjust how loss is calculated during model training. We typically use an optimization function like log loss to penalize incorrect model predictions. However, log loss does not take subgroup membership into consideration. So instead of using log loss, we can choose an optimization function designed to penalize errors in a fairness-aware fashion that counteracts the imbalances we've identified in our training data.
The TensorFlow Model Remediation Library provides utilities for applying two different bias-mitigation techniques during model training:
MinDiff: MinDiff aims to balance the errors for two different slices of data (male/female students versus nonbinary students) by adding a penalty for differences in the prediction distributions for the two groups.
Counterfactual Logit Pairing: Counterfactual Logit Pairing (CLP) aims to ensure that changing a sensitive attribute of a given example doesn't alter the model's prediction for that example. For example, if a training dataset contains two examples whose feature values are identical, except one has a
gender
value ofmale
and the other has agender
value ofnonbinary
, CLP will add a penalty if the predictions for these two examples are different.
The techniques you choose for adjusting the optimization function are dependent on the use cases for the model. In the next section, we'll take a closer look at how to approach the task of evaluating a model for fairness by considering these use cases.