We looked at a process of using a test set and a training set
to drive iterations of model development. On each iteration, we'd
train on the training data and evaluate on the test data, using the
evaluation results on test data to guide choices of and changes to various
model hyperparameters like learning rate and features. Is there anything
wrong with this approach? (Pick only one answer.)
Totally fine, we're training on training data and evaluating on
separate, held-out test data.
Actually, there's a subtle issue here. Think about what might happen
if we did many, many iterations of this form.
Doing many rounds of this procedure might cause us to implicitly fit
to the peculiarities of our specific test set.
Yes indeed! The more often we evaluate on a given test set, the more we
are at risk for implicitly overfitting to that one test set.
We'll look at a better protocol next.
This is computationally inefficient. We should just pick a default set of
hyperparameters and live with them to save resources.
Although these sorts of iterations are expensive, they are a critical part
of model development. Hyperparameter settings can make an enormous difference in
model quality, and we should always budget some amount of time and computational
resources to ensure we're getting the best quality we can.