Regularization for Simplicity: Check Your Understanding

L2 Regularization

Explore the options below.

Imagine a linear model with 100 input features:
  • 10 are highly informative.
  • 90 are non-informative.
  • Assume that all features have values between -1 and 1. Which of the following statements are true?
    L2 regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.
    Yes, L2 regularization encourages weights to be near 0.0, but not exactly 0.0.
    L2 regularization will encourage most of the non-informative weights to be exactly 0.0.
    L2 regularization does not tend to force weights to exactly 0.0. L2 regularization penalizes larger weights more than smaller weights. As a weight gets close to 0.0, L2 "pushes" less forcefully toward 0.0.
    L2 regularization may cause the model to learn a moderate weight for some non-informative features.
    Surprisingly, this can happen when a non-informative feature happens to be correlated with the label. In this case, the model incorrectly gives such non-informative features some of the "credit" that should have gone to informative features.

    L2 Regularization and Correlated Features

    Explore the options below.

    Imagine a linear model with two strongly correlated features; that is, these two features are nearly identical copies of one another but one feature contains a small amount of random noise. If we train this model with L2 regularization, what will happen to the weights for these two features?
    Both features will have roughly equal, moderate weights.
    L2 regularization will force the features towards roughly equivalent weights that are approximately half of what they would have been had only one of the two features been in the model.
    One feature will have a large weight; the other will have a weight of almost 0.0.
    L2 regularization penalizes large weights more than small weights. So, even if one weight started to drop faster than the other, L2 regularization would tend to force the bigger weight to drop more quickly than the smaller weight.
    One feature will have a large weight; the other will have a weight of exactly 0.0.
    L2 regularization rarely forces weights to exactly 0.0. By contrast, L1 regularization (discussed later) does force weights to exactly 0.0.