Dynamic (Online) Training
Explore the options below.
Which one of the following statements is true of dynamic (online)
training?
The model stays up to date as new data arrives.
This is the primary benefit of online training—we can avoid many
staleness issues by allowing the model to train on new data as
it comes in.
Very little monitoring of training jobs needs to be done.
Actually, you must continuously monitor training jobs to ensure that
they are healthy and working as intended. You'll also need
supporting infrastructure like the ability to roll a model back
to a previous snapshot in case something goes wrong in training,
such as a buggy job or corruption in input data.
Very little monitoring of input data needs to be done at
inference time.
Just like a static, offline model, it is also important to
monitor the inputs to the dynamically updated models. We are
likely not at risk for large seasonality effects, but sudden,
large changes to inputs (such as an upstream data source going
down) can still cause unreliable predictions.
Static (Offline) Training
Explore the options below.
Which of the following statements are true about static (offline)
training?
The model stays up to date as new data arrives.
Actually, if we train offline, then the model has no way to
incorporate new data as it arrives. This can lead to model
staleness, if the distribution we are trying to learn from
changes over time.
You can verify the model before applying it in production.
Yes, offline training gives ample opportunity to verify model
performance before introducing the model in production.
Offline training requires less monitoring of training jobs
than online training.
In general, monitoring requirements at training time are more modest
for offline training, which insulates us from many production
considerations. However, the more frequently you train your model,
the higher the investment you'll need to make in monitoring. You'll
also want to validate regularly to ensure that changes to your code
(and its dependencies) don't adversely affect model quality.
Very little monitoring of input data needs to be done at
inference time.
Counterintuitively, you do need to monitor input data at serving
time. If the input distributions change, then our model's
predictions may become unreliable. Imagine, for example, a model
trained only on summertime clothing data suddenly being used to
predict clothing buying behavior in wintertime.