Machine Learning | Google for Developers

Examining L₂ regularization

This exercise contains a small, noisy training data set. In this kind of setting, overfitting is a real concern. Fortunately, regularization might help.

This exercise consists of three related tasks. To simplify comparisons across the three tasks, run each task in a separate tab.

Task 1: Run the model as given for at least 500 epochs. Note the following:
- Test loss.
- The delta between Test loss and Training loss.
- The learned weights of the features and the feature crosses. (The relative thickness of each line running from FEATURES to OUTPUT represents the learned weight for that feature or feature cross. You can find the exact weight values by hovering over each line.)
Task 2: (Consider doing this Task in a separate tab.) Increase the regularization rate from 0 to 0.3. Then, run the model for at least 500 epochs and find answers to the following questions:
- How does the Test loss in Task 2 differ from the Test loss in Task 1?
- How does the delta between Test loss and Training loss in Task 2 differ from that of Task 1?
- How do the learned weights of each feature and feature cross differ from Task 2 to Task 1?
- What do your results say about model complexity?
Task 3: Experiment with regularization rate, trying to find the optimum value.

(Answers appear just below the exercise.)

Click the plus icon for answers.

Increasing the regularization rate from 0 to 0.3 produces the following effects:

Test loss drops significantly.

Note: While test loss decreases, training loss actually increases. This is expected, because you've added another term to the loss function to penalize complexity. Ultimately, all that matters is test loss, as that's the true measure of the model's ability to make good predictions on new data.
The delta between Test loss and Training loss drops significantly.
The weights of the features and some of the feature crosses have lower absolute values, which implies that model complexity drops.

Given the randomness in the data set, it is impossible to predict which regularization rate produced the best results for you. For us, a regularization rate of either 0.3 or 1 generally produced the lowest Test loss.

Examining L2 regularization

Click the plus icon for answers.

Examining L₂ regularization