Stay organized with collections
Save and categorize content based on your preferences.
Examining L2 regularization
This exercise contains a small, noisy training data set.
In this kind of setting, overfitting is a real concern. Fortunately,
regularization might help.
This exercise consists of three related tasks. To simplify comparisons
across the three tasks, run each task in a separate tab.
Task 1: Run the model as given for at least 500 epochs. Note
the following:
Test loss.
The delta between Test loss and Training loss.
The learned weights of the features and the feature crosses.
(The relative thickness of each line running from FEATURES to OUTPUT
represents the learned weight for that feature or feature cross.
You can find the exact weight values by hovering over
each line.)
Task 2: (Consider doing this Task in a separate tab.) Increase the
regularization rate from 0 to 0.3. Then, run the
model for at least 500 epochs and find answers to the following questions:
How does the Test loss in Task 2 differ from the Test loss in Task
1?
How does the delta between Test loss and Training loss in Task 2
differ from that of Task 1?
How do the learned weights of each feature and feature cross differ
from Task 2 to Task 1?
What do your results say about model complexity?
Task 3: Experiment with regularization rate, trying to find the
optimum value.
(Answers appear just below the exercise.)
Click the plus icon for answers.
Increasing the regularization rate from 0 to 0.3 produces the following
effects:
Test loss drops significantly.
Note: While test loss decreases, training loss actually
increases. This is expected, because you've added another
term to the loss function to penalize complexity. Ultimately, all that
matters is test loss, as that's the true measure of the model's ability to
make good predictions on new data.
The delta between Test loss and Training loss drops significantly.
The weights of the features and some of the feature crosses have lower
absolute values, which implies that model complexity drops.
Given the randomness in the data set, it is impossible to predict
which regularization rate produced the best results for you.
For us, a regularization rate of either 0.3 or 1 generally produced
the lowest Test loss.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-08-21 UTC."],[[["This exercise explores the impact of *L~2~* regularization on model performance, particularly in addressing overfitting with a small, noisy dataset."],["The tasks involve observing the effects of varying regularization rates on test loss, training loss, and learned feature weights."],["Increasing the regularization rate generally decreases test loss and model complexity, but might increase training loss."],["Experimentation is encouraged to identify the optimal regularization rate for achieving the lowest test loss on unseen data."]]],[]]