The pipeline testing guidelines cannot be demonstrated in a Colab. Instead, the following exercises help practice the guidelines. The next page describes resources for implementing the guidelines.
For the following questions, click on your selection to expand and check your answer.
After launching your unicorn appearance predictor, you must keep your
predictor fresh by retraining on new data. Because you are gathering too
much new data to train on, you decide to limit the training data by
sampling the new data over a window of time. You also need to account for
daily and annual patterns in unicorn appearances. And, the fastest you can
launch new model versions is every three months.
What window of time do you choose?
One day, because a larger window would result
in lots of data and your model would take too long to train.
Incorrect. You can adjust the data sampling rate to limit
the size of the dataset. Given that you can only update your model
every three months, a model trained on a day's worth of data
will gradually become stale.
One week, so that your dataset
is not too large but you can still smooth out patterns.
Incorrect. You can adjust the data sampling rate to limit
the size of the dataset. Given that you can only update your model
every three months, a model trained on a week's worth of data
will gradually become stale.
One year, to ensure that your model
is not biased by daily or yearly patterns.
Correct! You should choose a representative dataset so that
your model learns to predict across all scenarios.
You launch your unicorn appearance predictor. It's working well! You
go on vacation and return after three weeks to find that your
model quality has dropped significantly. Assume that
unicorn behavior is unlikely to change significantly
in three weeks. What is the most likely explanation for the decrease
in quality?
Training-serving skew.
Correct. While unicorn behavior probably didn't change,
perhaps the underlying data reporting or data formatting changed
in the serving data after the training data was collected.
Detect potential training-serving skew by checking the serving
data against the data schema of the training data.
You forgot to test model quality against a fixed threshold.
Incorrect. Testing model quality would help catch a decrease
in quality, but would not explain why that decrease occurred.
Your model is stale.
Incorrect, assuming that your training data covers all cycles
of unicorn behavior, as discussed in the previous question.
You wisely decide to monitor predictions for Antarctica because you
lack sufficient training data there. Your prediction quality mysteriously
drops for a few days at a time, especially in winter. What could be
the cause?
An environmental factor.
Correct. You discover that storms in Antarctica correlate with
decreases in your prediction quality. During these storms,
unicorn behavior changes. Furthermore, collecting data during storms in
Antarctica is impossible, meaning your model cannot train for
such conditions.
Your model becomes stale.
Incorrect. If this cause were correct, then quality would
drop continuously as your model became stale, instead of dropping
for only a few days.
No cause necessary. ML models have inherent randomness.
Incorrect. If your model quality fluctuates, you should
investigate the cause. Try to eliminate randomness in
your model training to increase reproducibility.
Your unicorn appearance predictor has operated for a year. You've
fixed many problems, and quality is now high. However, you
notice a small but persistent problem. Your model quality has drifted
slightly lower in urban areas. What might be the cause?
The high quality of your predictions lead users to
easily find unicorns, affecting unicorn appearance behavior itself.
Correct. Unicorns responded to increased attention by
changing their behavior in urban areas. As your model's predictions
adapt to the changing behavior, unicorns continue to
change their behavior. Such a situation, where
your model's behavior affects the training data itself, is called
a feedback loop.
You should try modifying your training-serving skew detection to
detect changes in serving data that correspond to changes in unicorn
behavior.
Unicorn appearances are reported multiple times in
heavily populated areas, skewing your training data.
Incorrect. This is probably not the cause because this skew should
have lowered your quality from launch.
Urban areas are difficult to model.
Incorrect. If your model was having trouble predicting in
urban areas, the quality would be low from the start, instead of
drifting lower after launch.