Regularization for Sparsity: Check Your Understanding

L1 regularization

Explore the options below.

Imagine a linear model with 100 input features:
  • 10 are highly informative.
  • 90 are non-informative.
  • Assume that all features have values between -1 and 1. Which of the following statements are true?
    L1 regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.
    In general, L1 regularization of sufficient lambda tends to encourage non-informative features to weights of exactly 0.0. Unlike L2 regularization, L1 regularization "pushes" just as hard toward 0.0 no matter how far the weight is from 0.0.
    L1 regularization will encourage most of the non-informative weights to be exactly 0.0.
    L1 regularization of sufficient lambda tends to encourage non-informative weights to become exactly 0.0. By doing so, these non-informative features leave the model.
    L1 regularization may cause informative features to get a weight of exactly 0.0.
    Be careful--L1 regularization may cause the following kinds of features to be given weights of exactly 0:
  • Weakly informative features.
  • Strongly informative features on different scales.
  • Informative features strongly correlated with other similarly informative features.
  • L1 vs. L2 Regularization

    Explore the options below.

    Imagine a linear model with 100 input features, all having values between -1 and 1:
  • 10 are highly informative.
  • 90 are non-informative.
  • Which type of regularization will produce the smaller model?
    L2 regularization.
    L2 regularization rarely reduces the number of features. In other words, L2 regularization rarely reduces the model size.
    L1 regularization.
    L1 regularization tends to reduce the number of features. In other words, L1 regularization often reduces the model size.