Google is committed to advancing racial equity for Black communities. See how.

Explore the options below.

Imagine a linear model with two strongly correlated features; that is, these two features are nearly identical copies of one another but one feature contains a small amount of random noise. If we train this model with L2 regularization, what will happen to the weights for these two features?
Both features will have roughly equal, moderate weights.
L2 regularization will force the features towards roughly equivalent weights that are approximately half of what they would have been had only one of the two features been in the model.
One feature will have a large weight; the other will have a weight of almost 0.0.
L2 regularization penalizes large weights more than small weights. So, even if one weight started to drop faster than the other, L2 regularization would tend to force the bigger weight to drop more quickly than the smaller weight.
One feature will have a large weight; the other will have a weight of exactly 0.0.
L2 regularization rarely forces weights to exactly 0.0. By contrast, L1 regularization (discussed later) does force weights to exactly 0.0.