Regularization for Simplicity

Regularization means penalizing the complexity of a model to reduce overfitting.

Regularization for Simplicity

Generalization Curve

The loss function for the training set gradually declines. By contrast, the loss function for the validation set declines, but then starts to rise.

Penalizing Model Complexity

We want to avoid model complexity where possible.
We can bake this idea into the optimization we do at training time.
Empirical Risk Minimization:

aims for low training error

$$ \text{minimize: } Loss(Data\;|\;Model) $$

Penalizing Model Complexity

We want to avoid model complexity where possible.
We can bake this idea into the optimization we do at training time.
Structural Risk Minimization:
- aims for low training error
- while balancing against complexity

Regularization

How to define complexity(Model)?

Regularization

How to define complexity(Model)?
Prefer smaller weights

Regularization

How to define complexity(Model)?
Prefer smaller weights
Diverging from this should incur a cost
Can encode this idea via L₂ regularization (a.k.a. ridge)

complexity(model) = sum of the squares of the weights
Penalizes really big weights
For linear models: prefers flatter slopes
Bayesian prior:

weights should be centered around zero
weights should be normally distributed

A Loss Function with L₂ Regularization

$$ Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right) $$

$\text{Where:}$

$Loss\text{: Aims for low training error}$ $\lambda\text{: Scalar value that controls how weights are balanced}$ $w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}$

Help Center

Playground Exercise: Overcrossing?

L2 Regularization

Regularization for Simplicity

Regularization for Simplicity

Generalization Curve

Penalizing Model Complexity

Penalizing Model Complexity

Regularization

Regularization

Regularization

A Loss Function with L2 Regularization

A Loss Function with L₂ Regularization