Production ML systems: Static versus dynamic training
Stay organized with collections
Save and categorize content based on your preferences.
Broadly speaking, you can train a model in either of two ways:
Static training (also
called offline training) means that you train a model
only once. You then serve that same trained model for a while.
Dynamic training (also
called online training) means that you train a model
continuously or at least frequently. You usually serve the most
recently trained model.
Table 1. Primary advantages and disadvantages.
Static training
Dynamic training
Advantages
Simpler. You only need to develop and test the model once.
More adaptable. Your model will keep up with any
changes to the relationship between features and labels.
Disadvantages
Sometimes staler. If the relationship between features and
labels changes over time, your model's predictions will degrade.
More work. You must build, test, and release a new product
all the time.
If your dataset truly isn't changing over time, choose static training because
it is cheaper to create and maintain than dynamic training. However, datasets
tend to change over time, even those with features that you think are as
constant as, say, sea level. The takeaway: even with static
training, you must still monitor your input data for change.
For example, consider a model trained to predict the probability that users
will buy flowers. Because of time pressure, the model is trained only once
using a dataset of flower buying behavior during July and August.
The model works fine for several months but then makes terrible predictions
around Valentine's Day because
user behavior during that floral holiday period changes dramatically.
For a more detailed exploration of static and dynamic training, see the
Managing ML Projects
course.
Exercises: Check your understanding
Which two of the following statements are true about
static (offline) training?
The model stays up to date as new data arrives.
Actually, if you train offline, then the model has no way to
incorporate new data as it arrives. This can lead to model
staleness, if the distribution you are trying to learn from
changes over time.
You can verify the model before applying it in production.
Yes, offline training gives ample opportunity to verify model
performance before introducing the model in production.
Offline training requires less monitoring of training jobs
than online training.
In general, monitoring requirements at training time are more modest
for offline training, which insulates you from many production
considerations. However, the more frequently you train your model,
the higher the investment you'll need to make in monitoring. You'll
also want to validate regularly to ensure that changes to your code
(and its dependencies) don't adversely affect model quality.
Very little monitoring of input data needs to be done at
inference time.
Counterintuitively, you do need to monitor input data at serving
time. If the input distributions change, then our model's
predictions may become unreliable. Imagine, for example, a model
trained only on summertime clothing data suddenly being used to
predict clothing buying behavior in wintertime.
Which one of the following statements is true of
dynamic (online) training?
The model stays up to date as new data arrives.
This is the primary benefit of online training; you can avoid many
staleness issues by allowing the model to train on new data as
it comes in.
Very little monitoring of training jobs needs to be done.
Actually, you must continuously monitor training jobs to ensure that
they are healthy and working as intended. You'll also need
supporting infrastructure like the ability to roll a model back
to a previous snapshot in case something goes wrong in training,
such as a buggy job or corruption in input data.
Very little monitoring of input data needs to be done at
inference time.
Just like a static, offline model, it is also important to
monitor the inputs to the dynamically updated models. You are
likely not at risk for large seasonality effects, but sudden,
large changes to inputs (such as an upstream data source going
down) can still cause unreliable predictions.