Explore the options below.
Imagine a linear model with two strongly correlated features; that is,
these two features are nearly identical copies of one another but one
feature contains a small amount of random noise. If we train this
model with L2 regularization, what will happen to the weights
for these two features?
Both features will have roughly equal, moderate weights.
L2 regularization will force the features towards
roughly equivalent weights that are approximately half of
what they would have been had only one of the two features
been in the model.
One feature will have a large weight; the other will have a
weight of almost 0.0.
L2 regularization penalizes large weights more
than small weights. So, even if one weight started to drop
faster than the other, L2 regularization would
tend to force the bigger weight to drop more quickly than
the smaller weight.
One feature will have a large weight; the other will have a
weight of exactly 0.0.
L2 regularization rarely forces
weights to exactly 0.0. By contrast, L1 regularization
(discussed later) does force weights to exactly 0.0.