Static (Offline) Inference
Explore the options below.
In offline inference, we make predictions on a big batch of data all at
once.
We then put those predictions in a look-up table for later use.
Which of the following are true of offline inference?
We must create predictions for all possible inputs.
Yes, we will have to make predictions for all possible inputs and
store them into a cache or lookup table to use offline inference.
This is one of the drawbacks of offline inference. We will only
be able to serve a prediction for those examples that we already
know about. This is fine if the set of things that we're predicting
is limited, like all world cities or all items in a database table.
But for freeform inputs like user queries that have a long tail of
unusual or rare items, we would not be able to provide full coverage
with an offline-inference system.
After generating the predictions, we can verify them before applying
them.
This is indeed one useful thing about offline inference. We can
sanity check and verify all of our predictions before they are
used.
For a given input, we can serve a prediction more quickly than with
online inference.
One of the great things about offline inference is that once
the predictions have been written to some look-up table, they
can be served with minimal latency. No feature computation or model
inference needs to be done at request time.
We will need to carefully monitor our input signals over a long
period of time.
This is the one case where we don't actually need to monitor input
signals over a long period of time. This is because once the
predictions have been written to a look-up table, we're no longer
dependent on the input features. Note that any subsequent update
of the model will require a new round of input verification.
We will be able to react quickly to changes in the world.
No, this is a drawback of offline inference. We'll need to wait
until a new set of predictions have been written to the look-up
table before we can respond differently based on any changes in the
world.
Dynamic (Online) Inference
Explore the options below.
Dynamic (online) inference means making predictions on demand. That is,
in online inference, we put the trained model on a server and issue
inference requests as needed. Which of the following are true of
dynamic inference?
You can provide predictions for all possible items.
Yes, this is a strength of online inference. Any request that
comes in will be given a score. Online inference handles long-tail
distributions (those with many rare items), like the space of all
possible sentences written in movie reviews.
You can do post-verification of predictions before they
are used.
In general, it's not possible to do a post-verification of all
predictions before they get used because predictions are being
made on demand. You can, however, potentially monitor
aggregate prediction qualities to provide some level of
sanity checking, but these will signal fire alarms only after
the fire has already spread.
You must carefully monitor input signals.
Yes. Signals could change suddenly due to upstream issues,
harming our predictions.
When performing online inference, you do not need to worry
about prediction latency (the lag time for returning predictions)
as much as when performing offline inference.
Prediction latency is often a real concern in online inference.
Unfortunately, you can't necessarily fix prediction latency issues
by adding more inference servers.