Stay organized with collections
Save and categorize content based on your preferences.
This course has walked through many common data traps, from dataset quality
to thinking to visualization and statistical analysis.
ML practitioners should ask:
How well do I understand the characteristics of my datasets and the
conditions under which that data was collected?
What quality or bias issues exist in my data? Are confounding factors
present?
What potential downstream issues could arise from using these particular
datasets?
When training a model that makes predictions or classifications: does
the dataset that the model is trained on contain all relevant variables?
Whatever their findings, ML practitioners should always examine
themselves for confirmation bias, then check their findings against their
intuition and common sense, and investigate wherever the data is in conflict
with these.
Additional reading
Cairo, Alberto. How Charts Lie: Getting Smarter about Visual Information. NY:
W.W. Norton, 2019.
Huff, Darrell. How to Lie with Statistics. NY: W.W. Norton, 1954.
Monmonier, Mark. How to Lie with Maps, 3rd ed. Chicago: U of Chicago P, 2018.
Jones, Ben. Avoiding Data Pitfalls. Hoboken, NJ: Wiley, 2020.
Wheelan, Charles. Naked Statistics: Stripping the Dread from the Data. NY:
W.W. Norton, 2013
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThis course explores common data traps encountered in machine learning, encompassing dataset quality, thinking processes, visualization, and statistical analysis.\u003c/p\u003e\n"],["\u003cp\u003eMachine learning practitioners must critically assess their datasets, identifying potential biases, confounding factors, and downstream issues arising from data usage.\u003c/p\u003e\n"],["\u003cp\u003eThoroughly understanding data characteristics and collection conditions is crucial for mitigating data pitfalls and ensuring robust machine learning models.\u003c/p\u003e\n"],["\u003cp\u003eConfirmation bias should be actively addressed, and data findings should be validated against intuition and common sense, prompting further investigation where discrepancies exist.\u003c/p\u003e\n"],["\u003cp\u003eFurther insights into data analysis and interpretation can be gained from the listed additional reading materials covering topics like chart interpretation, statistical manipulation, and map-based data representation.\u003c/p\u003e\n"]]],[],null,["# Summary\n\n\u003cbr /\u003e\n\nThis course has walked through many common data traps, from dataset quality\nto thinking to visualization and statistical analysis.\n\nML practitioners should ask:\n\n- How well do I understand the characteristics of my datasets and the conditions under which that data was collected?\n- What quality or bias issues exist in my data? Are confounding factors present?\n- What potential downstream issues could arise from using these particular datasets?\n- When training a model that makes predictions or classifications: does the dataset that the model is trained on contain all relevant variables?\n\nWhatever their findings, ML practitioners should always examine\nthemselves for confirmation bias, then check their findings against their\nintuition and common sense, and investigate wherever the data is in conflict\nwith these.\n\nAdditional reading\n------------------\n\nCairo, Alberto. *How Charts Lie: Getting Smarter about Visual Information.* NY:\nW.W. Norton, 2019.\n\nHuff, Darrell. *How to Lie with Statistics.* NY: W.W. Norton, 1954.\n\nMonmonier, Mark. *How to Lie with Maps,* 3rd ed. Chicago: U of Chicago P, 2018.\n\nJones, Ben. *Avoiding Data Pitfalls.* Hoboken, NJ: Wiley, 2020.\n\nWheelan, Charles. *Naked Statistics: Stripping the Dread from the Data.* NY:\nW.W. Norton, 2013"]]