L1 regularization
Explore the options below.
Imagine a linear model with 100 input features:
10 are highly informative.
90 are non-informative.
Assume that all features have values between -1 and 1.
Which of the following statements are true?
L1 regularization will encourage many of the non-informative weights
to be nearly (but not exactly) 0.0.
In general, L1 regularization of sufficient lambda tends
to encourage non-informative features to weights of exactly 0.0.
Unlike L2 regularization, L1 regularization "pushes" just as hard
toward 0.0 no matter how far the weight is from 0.0.
L1 regularization will encourage most of the non-informative weights
to be exactly 0.0.
L1 regularization of sufficient lambda tends to encourage
non-informative weights to become exactly 0.0. By doing so, these
non-informative features leave the model.
L1 regularization may cause informative features to get a
weight of exactly 0.0.
Be careful--L1 regularization may cause the following kinds of
features to be given weights of exactly 0:
Weakly informative features.
Strongly informative features on different scales.
Informative features strongly correlated with other
similarly informative features.
L1 vs. L2 Regularization
Explore the options below.
Imagine a linear model with 100 input features, all having values
between -1 and 1:
10 are highly informative.
90 are non-informative.
Which type of regularization will produce the smaller model?
L2 regularization.
L2 regularization rarely reduces the number of features.
In other words, L2 regularization rarely reduces the
model size.
L1 regularization.
L1 regularization tends to reduce the number of
features. In other words, L1 regularization often
reduces the model size.