*L*_{2} Regularization

_{2}

Explore the options below.

Imagine a linear model with 100 input features:
10 are highly informative.
90 are non-informative.
Assume that all features have values between -1 and 1.
Which of the following statements are true?

L

_{2}regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0. Yes, L

_{2}regularization encourages weights to be near 0.0, but not exactly 0.0. L

_{2}regularization will encourage most of the non-informative weights to be exactly 0.0. L

_{2}regularization does not tend to force weights to exactly 0.0. L_{2}regularization penalizes larger weights more than smaller weights. As a weight gets close to 0.0, L_{2}"pushes" less forcefully toward 0.0. L

_{2}regularization may cause the model to learn a moderate weight for some**non-informative**features. Surprisingly, this can happen when a non-informative feature happens
to be correlated with the label. In this case, the model incorrectly
gives such non-informative features some of the "credit" that should
have gone to informative features.

*L*_{2} Regularization and Correlated Features

_{2}

Explore the options below.

Imagine a linear model with two strongly correlated features; that is,
these two features are nearly identical copies of one another but one
feature contains a small amount of random noise. If we train this
model with L

_{2}regularization, what will happen to the weights for these two features? Both features will have roughly equal, moderate weights.

L

_{2}regularization will force the features towards roughly equivalent weights that are approximately half of what they would have been had only one of the two features been in the model. One feature will have a large weight; the other will have a
weight of

**almost**0.0. L

_{2}regularization penalizes large weights more than small weights. So, even if one weight started to drop faster than the other, L_{2}regularization would tend to force the bigger weight to drop more quickly than the smaller weight. One feature will have a large weight; the other will have a
weight of

**exactly**0.0. L

_{2}regularization rarely forces weights to exactly 0.0. By contrast, L_{1}regularization (discussed later)*does*force weights to exactly 0.0.