ML Systems in the Real World: Cancer Prediction

In this lesson, you'll debug a real-world ML problem* related to cancer prediction.

Real World Example: Cancer Prediction

  • Model was trained to predict "probability patient has cancer" from medical records
Cancer cells
  • Model was trained to predict "probability patient has cancer" from medical records
  • Features included patient age, gender, prior medical conditions, hospital name, vital signs, test results
Cancer cells
  • Model was trained to predict "probability patient has cancer" from medical records
  • Features included patient age, gender, prior medical conditions, hospital name, vital signs, test results
  • Model gave excellent performance on held-out test data
Cancer cells
  • Model was trained to predict "probability patient has cancer" from medical records
  • Features included patient age, gender, prior medical conditions, hospital name, vital signs, test results
  • Model gave excellent performance on held-out test data
  • But model performed terribly on new patients -- why?
Cancer cells

Why do you think the model was unable to perform well on new patients? See if you can figure out the problem, and then click the Play button ▶ below to find out if you're correct.

* We based this module very loosely (making some modifications along the way) on "Leakage in data mining: formulation, detection, and avoidance" by Kaufman, Rosset, and Perlich.