Scoring

After candidate generation, another model scores and ranks the generated candidates to select the set of items to display. The recommendation system may have multiple candidate generators that use different sources, such as the following:

Examples
  • Related items from a matrix factorization model.
  • User features that account for personalization.
  • "Local" vs "distant" items; that is, taking geographic information into account.
  • Popular or trending items.
  • A social graph; that is, items liked or recommended by friends.

The system combines these different sources into a common pool of candidates that are then scored by a single model and ranked according to that score. For example, the system can train a model to predict the probability of a user watching a video on YouTube given the following:

  • query features (for example, user watch history, language, country, time)
  • video features (for example, title, tags, video embedding)

The system can then rank the videos in the pool of candidates according to the prediction of the model.

Why Not Let the Candidate Generator Score?

Since candidate generators compute a score (such as the similarity measure in the embedding space), you might be tempted to use them to do ranking as well. However, you should avoid this practice for the following reasons:

  • Some systems rely on multiple candidate generators. The scores of these different generators might not be comparable.
  • With a smaller pool of candidates, the system can afford to use more features and a more complex model that may better capture context.

Choosing an Objective Function for Scoring

As you may remember from Introduction to ML Problem Framing, ML can act like a mischievous genie: very happy to learn the objective you provide, but you have to be careful what you wish for. This mischievous quality also applies to recommendation systems. The choice of scoring function can dramatically affect the ranking of items, and ultimately the quality of the recommendations.

For Example:

Click the plus icons to learn what happens as a result of using each objective.

An image of the Google Play
 store home page that is displaying new and updated games as well as
 recommended apps with the bottom items highlighted.

Positional Bias in Scoring

Items that appear lower on the screen are less likely to be clicked than items appearing higher on the screen. However, when scoring videos, the system usually doesn't know where on the screen a link to that video will ultimately appear. Querying the model with all possible positions is too expensive. Even if querying multiple positions were feasible, the system still might not find a consistent ranking across multiple ranking scores.

Solutions

  • Create position-independent rankings.
  • Rank all the candidates as if they are in the top position on the screen.