Static (Offline) vs. Dynamic (Online) Inference: Check Your Understanding
Static (Offline) Inference
Explore the options below.
In offline inference, we make predictions on a big batch of data all at
We then put those predictions in a look-up table for later use.
Which of the following are true of offline inference?
We will have predictions for all possible inputs.
No, we will not have predictions for all possible inputs.
This is one of the drawbacks of offline inference. We will only
be able to serve a prediction for those examples that we already
know about. This is fine if the set of things that we're predicting
is limited, like world cities. But for things like user queries
that have a long tail of unusual or rare items, we may not be able
to provide full coverage with an offline-inference system.
After generating the predictions, we can verify them before applying
This is indeed one useful thing about offline inference. We can sanity
check and verify all of our predictions before they are used.
For a given input, we can serve a prediction more quickly than with
One of the great things about offline inference is that once
the predictions have been written to some look-up table, they
can be served with minimal latency. No feature computation or model
inference needs to be done at request time.
We will need to carefully monitor our input signals over a long
period of time.
This is the one case where we don't actually need to monitor input
signals over a long period of time. This is because once the
predictions have been written to a look-up table, we're no longer
dependent on the input features. Note that any subsequent update
of the model will require a new round of input verification.
We will be able to react quickly to changes in the world.
No, this is a drawback of offline inference. We'll need to wait
until a new set of predictions have been written to the look-up table
before we can respond differently based on any changes in the
Dynamic (Online) Inference
Explore the options below.
Dynamic (online) inference means making predictions on demand. That is,
in online inference, we put the trained model on a server and issue
inference requests as needed. Which of the following are true of
You can provide predictions for all possible items.
Yes, this is a strength of online inference. Any request that
comes in will be given a score. Online inference handles long-tail
distributions (those with many rare items), like the space of all
possible sentences written in movie reviews.
You can do post-verification of predictions before they
In general, it's not possible to do a post-verification of all
predictions before they get used because predictions are being
made on demand. You can, however, potentially monitor
aggregate prediction qualities to provide some level of
sanity checking, but these will signal fire alarms only after
the fire has already spread.
You must carefully monitor input signals.
Yes. Signals could change suddenly due to upstream issues,
harming our predictions.
When performing online inference, you do not need to worry
about prediction latency (the lag time for returning predictions)
as much as when performing offline inference.
Prediction latency is often a real concern in online inference.
Unfortunately, you can't necessarily fix prediction latency issues
by adding more inference servers.