ML Systems in the Real World: Literature

In this lesson, you'll debug a real-world ML problem* related to 18th century literature.

Real World Example: 18th Century Literature

  • Professor of 18th Century Literature wanted to predict the political affiliation of authors based only on the "mind metaphors" the author used.
Old Books
  • Professor of 18th Century Literature wanted to predict the political affiliation of authors based only on the "mind metaphors" the author used.
  • Team of researchers made a big labeled data set with many authors' works, sentence by sentence, and split into train/validation/test sets.
Old Books
  • Professor of 18th Century Literature wanted to predict the political affiliation of authors based only on the "mind metaphors" the author used.
  • Team of researchers made a big labeled data set with many authors' works, sentence by sentence, and split into train/validation/test sets.
  • Trained model did nearly perfectly on test data, but researchers felt results were suspiciously accurate. What might have gone wrong?
Old Books

Why do you think test accuracy was suspiciously high? See if you can figure out the problem, and then click the Play button ▶ below to find out if you're correct.

  • Data Split A: Researchers put some of each author's examples in training set, some in validation set, some in test set.
All of Richardson's examples might be in the training set, while all of Swift's examples might be in the validation set.
Diagram showing the breakdown of author examples in the training, validation, and test sets. Examples from each of the three authors are represented in each set.
  • Data Split B: Researchers put all of each author's examples in a single set.
Diagram showing the breakdown of author examples in the training, validation, and test sets. The training set contains only examples from Swift, the validation set contains only examples from Blake, and the test set contains only examples from Defoe.
  • Data Split A: Researchers put some of each author's examples in training set, some in validation set, some in test set.
  • Data Split B: Researchers put all of each author's examples in a single set.
  • Results: The model trained on Data Split A had much higher accuracy than the model trained on Data Split B.

The moral: carefully consider how you split examples.

Know what the data represents.

* We based this module very loosely (making some modifications along the way) on "Meaning and Mining: the Impact of Implicit Assumptions in Data Mining for the Humanities" by Sculley and Pasanek.