Static vs. Dynamic Inference: Check Your Understanding

Static (Offline) Inference

Explore the options below.

In offline inference, we make predictions on a big batch of data all at once. We then put those predictions in a look-up table for later use. Which of the following are true of offline inference?
We must create predictions for all possible inputs.
Yes, we will have to make predictions for all possible inputs and store them into a cache or lookup table to use offline inference. This is one of the drawbacks of offline inference. We will only be able to serve a prediction for those examples that we already know about. This is fine if the set of things that we're predicting is limited, like all world cities or all items in a database table. But for freeform inputs like user queries that have a long tail of unusual or rare items, we would not be able to provide full coverage with an offline-inference system.
After generating the predictions, we can verify them before applying them.
This is indeed one useful thing about offline inference. We can sanity check and verify all of our predictions before they are used.
For a given input, we can serve a prediction more quickly than with online inference.
One of the great things about offline inference is that once the predictions have been written to some look-up table, they can be served with minimal latency. No feature computation or model inference needs to be done at request time.
We will need to carefully monitor our input signals over a long period of time.
This is the one case where we don't actually need to monitor input signals over a long period of time. This is because once the predictions have been written to a look-up table, we're no longer dependent on the input features. Note that any subsequent update of the model will require a new round of input verification.
We will be able to react quickly to changes in the world.
No, this is a drawback of offline inference. We'll need to wait until a new set of predictions have been written to the look-up table before we can respond differently based on any changes in the world.

Dynamic (Online) Inference

Explore the options below.

Dynamic (online) inference means making predictions on demand. That is, in online inference, we put the trained model on a server and issue inference requests as needed. Which of the following are true of dynamic inference?
You can provide predictions for all possible items.
Yes, this is a strength of online inference. Any request that comes in will be given a score. Online inference handles long-tail distributions (those with many rare items), like the space of all possible sentences written in movie reviews.
You can do post-verification of predictions before they are used.
In general, it's not possible to do a post-verification of all predictions before they get used because predictions are being made on demand. You can, however, potentially monitor aggregate prediction qualities to provide some level of sanity checking, but these will signal fire alarms only after the fire has already spread.
You must carefully monitor input signals.
Yes. Signals could change suddenly due to upstream issues, harming our predictions.
When performing online inference, you do not need to worry about prediction latency (the lag time for returning predictions) as much as when performing offline inference.
Prediction latency is often a real concern in online inference. Unfortunately, you can't necessarily fix prediction latency issues by adding more inference servers.