Stay organized with collections
Save and categorize content based on your preferences.
After candidate generation, another model scores and ranks the generated
candidates to select the set of items to display. The recommendation system
may have multiple candidate generators that use different sources, such
as the following:
Examples
Related items from a matrix factorization model.
User features that account for personalization.
"Local" vs "distant" items; that is, taking geographic information
into account.
Popular or trending items.
A social graph; that is, items liked or recommended by
friends.
The system combines these different sources into a common pool of
candidates that are then scored by a single model and ranked according to
that score. For example, the system can train a model to predict the
probability of a user watching a video on YouTube given the following:
query features (for example, user watch history, language, country, time)
video features (for example, title, tags, video embedding)
The system can then rank the videos in the pool of candidates according
to the prediction of the model.
Why not let the candidate generator score?
Since candidate generators compute a score (such as the similarity measure
in the embedding space), you might be tempted to use them to do ranking as
well. However, you should avoid this practice for the following reasons:
Some systems rely on multiple candidate generators. The scores of these
different generators might not be comparable.
With a smaller pool of candidates, the system can afford to use
more features and a more complex model that may better capture context.
Choosing an objective function for scoring
As you may remember from Introduction to ML Problem
Framing,
ML can act like a mischievous genie: very happy to learn the objective
you provide, but you have to be careful what you wish for. This mischievous
quality also applies to recommendation systems. The choice of scoring
function can dramatically affect the ranking of items, and ultimately the
quality of the recommendations.
Example:
Click the plus icons to learn what happens as a result of using each
objective.
Maximize Click Rate
If the scoring function optimizes for clicks, the systems may recommend
click-bait videos. This scoring function generates clicks but does not
make a good user experience. Users' interest may quickly fade.
Maximize Watch Time
If the scoring function optimizes for watch time, the system might
recommend very long videos, which might lead to a poor user experience.
Note that multiple short watches can be just as good as one long watch.
Increase Diversity and Maximize Session Watch Time
Recommend shorter videos, but ones that are more likely to keep the
user engaged.
Positional bias in scoring
Items that appear lower on the screen are less likely to be clicked than
items appearing higher on the screen. However, when scoring videos, the
system usually doesn't know where on the screen a link to that video will
ultimately appear. Querying the model with all possible positions is too
expensive. Even if querying multiple positions were feasible, the system
still might not find a consistent ranking across multiple ranking scores.
Solutions
Create position-independent rankings.
Rank all the candidates as if they are in the top position on the screen.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eRecommendation systems often involve candidate generation followed by scoring and ranking to select items for display.\u003c/p\u003e\n"],["\u003cp\u003eCandidate generation can leverage various sources like user features, popular items, or social graphs, which are then combined and scored by a separate model.\u003c/p\u003e\n"],["\u003cp\u003eUsing a single scoring model allows for better comparability and context consideration compared to relying on individual candidate generator scores.\u003c/p\u003e\n"],["\u003cp\u003eCareful selection of the scoring function is crucial as it directly impacts the quality and relevance of recommendations, with considerations for click-bait or overly long content.\u003c/p\u003e\n"],["\u003cp\u003ePositional bias should be addressed by aiming for position-independent rankings, potentially by scoring candidates as if they were in the top position.\u003c/p\u003e\n"]]],[],null,["# Scoring\n\nAfter candidate generation, another model scores and ranks the generated\ncandidates to select the set of items to display. The recommendation system\nmay have multiple candidate generators that use different sources, such\nas the following: \nExamples \n- Related items from a matrix factorization model.\n- User features that account for personalization.\n- \"Local\" vs \"distant\" items; that is, taking geographic information into account.\n- Popular or trending items.\n- A social graph; that is, items liked or recommended by friends.\n\nThe system combines these different sources into a common pool of\ncandidates that are then scored by a single model and ranked according to\nthat score. For example, the system can train a model to predict the\nprobability of a user watching a video on YouTube given the following:\n\n- query features (for example, user watch history, language, country, time)\n- video features (for example, title, tags, video embedding)\n\nThe system can then rank the videos in the pool of candidates according\nto the prediction of the model.\n\nWhy not let the candidate generator score?\n------------------------------------------\n\nSince candidate generators compute a score (such as the similarity measure\nin the embedding space), you might be tempted to use them to do ranking as\nwell. However, you should avoid this practice for the following reasons:\n\n- Some systems rely on multiple candidate generators. The scores of these different generators might not be comparable.\n- With a smaller pool of candidates, the system can afford to use more features and a more complex model that may better capture context.\n\nChoosing an objective function for scoring\n------------------------------------------\n\nAs you may remember from [Introduction to ML Problem\nFraming](https://developers.google.com/machine-learning/problem-framing/),\nML can act like a mischievous genie: very happy to learn the objective\nyou provide, but you have to be careful what you wish for. This mischievous\nquality also applies to recommendation systems. The choice of scoring\nfunction can dramatically affect the ranking of items, and ultimately the\nquality of the recommendations.\n\n**Example:**\n\nClick the plus icons to learn what happens as a result of using each\nobjective.\n\n#### Maximize Click Rate\n\nIf the scoring function optimizes for clicks, the systems may recommend\nclick-bait videos. This scoring function generates clicks but does not\nmake a good user experience. Users' interest may quickly fade.\n\n#### Maximize Watch Time\n\nIf the scoring function optimizes for watch time, the system might\nrecommend very long videos, which might lead to a poor user experience.\nNote that multiple short watches can be just as good as one long watch.\n\n#### Increase Diversity and Maximize Session Watch Time\n\nRecommend shorter videos, but ones that are more likely to keep the\nuser engaged.\n\nPositional bias in scoring\n--------------------------\n\nItems that appear lower on the screen are less likely to be clicked than\nitems appearing higher on the screen. However, when scoring videos, the\nsystem usually doesn't know where on the screen a link to that video will\nultimately appear. Querying the model with all possible positions is too\nexpensive. Even if querying multiple positions were feasible, the system\nstill might not find a consistent ranking across multiple ranking scores.\n\n### Solutions\n\n- Create position-independent rankings.\n- Rank all the candidates as if they are in the top position on the screen."]]