Dynamic (online) inference means making predictions on demand. That is,
in online inference, we put the trained model on a server and issue
inference requests as needed. Which of the following are true of
You can provide predictions for all possible items.
Yes, this is a strength of online inference. Any request that
comes in will be given a score. Online inference handles long-tail
distributions (those with many rare items), like the space of all
possible sentences written in movie reviews.
You can do post-verification of predictions before they
In general, it's not possible to do a post-verification of all
predictions before they get used because predictions are being
made on demand. You can, however, potentially monitor
aggregate prediction qualities to provide some level of
sanity checking, but these will signal fire alarms only after
the fire has already spread.
You must carefully monitor input signals.
Yes. Signals could change suddenly due to upstream issues,
harming our predictions.
When performing online inference, you do not need to worry
about prediction latency (the lag time for returning predictions)
as much as when performing offline inference.
Prediction latency is often a real concern in online inference.
Unfortunately, you can't necessarily fix prediction latency issues
by adding more inference servers.