Machine learning models are not inherently objective. Engineers train models by
feeding them a data set of training examples, and human involvement in the provision
and curation of this data can make a model's predictions susceptible to bias.
When building models, it's important to be aware of common human biases that can
manifest in your data, so you can take proactive steps to mitigate their effects.
Reporting bias occurs when the frequency of events, properties, and/or outcomes
captured in a data set does not accurately reflect their real-world frequency. This bias can arise
because people tend to focus on documenting circumstances that are unusual or especially memorable,
assuming that the ordinary can "go without saying."
Automation bias is a tendency to favor results generated by automated systems over those
generated by non-automated systems, irrespective of the error rates of each.
Selection bias occurs if a data set's examples are chosen in a way
that is not reflective of their real-world distribution. Selection bias can take many different
Coverage bias: Data is not selected in a representative fashion.
Non-response bias (or participation bias): Data ends up being unrepresentative due to
participation gaps in the data-collection process.
Sampling bias: Proper randomization is not used during data collection.
Group Attribution Bias
Group attribution bias is a tendency to generalize what is true of individuals to an entire group to which
they belong. Two key manifestations of this bias are:
In-group bias: A preference for members of a group to which you also belong, or for characteristics
that you also share.
Out-group homogeneity bias: A tendency to stereotype individual members of a group to which you do not
belong, or to see their characteristics as more uniform.
Implicit bias occurs when assumptions are made based on one's own mental models and personal experiences
that do not necessary apply more generally.
A common form of implicit bias is confirmation bias, where model builders unconsciously process data
in ways that affirm preexisting beliefs and hypotheses. In some cases, a model builder may actually keep
training a model until it produces a result that aligns with their original hypothesis; this is called