Regularization for Simplicity

Regularization means penalizing the complexity of a model to reduce overfitting.

Regularization for Simplicity

Generalization Curve

The loss function for the training set gradually declines. By contrast, the loss function for the validation set declines, but then starts to rise.

Penalizing Model Complexity

  • We want to avoid model complexity where possible.
  • We can bake this idea into the optimization we do at training time.
  • Empirical Risk Minimization:
    • aims for low training error
    • $$ \text{minimize: } Loss(Data\;|\;Model) $$

Penalizing Model Complexity

  • We want to avoid model complexity where possible.
  • We can bake this idea into the optimization we do at training time.
  • Structural Risk Minimization:
    • aims for low training error
    • while balancing against complexity
    • $$ \text{minimize: } Loss(Data\;|\;Model) + complexity(Model) $$

Regularization

  • How to define complexity(Model)?

Regularization

  • How to define complexity(Model)?
  • Prefer smaller weights

Regularization

  • How to define complexity(Model)?
  • Prefer smaller weights
  • Diverging from this should incur a cost
  • Can encode this idea via L2 regularization (a.k.a. ridge)
    • complexity(model) = sum of the squares of the weights
    • Penalizes really big weights
    • For linear models: prefers flatter slopes
    • Bayesian prior:
      • weights should be centered around zero
      • weights should be normally distributed

A Loss Function with L2 Regularization

$$ Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right) $$

\(\text{Where:}\)

\(Loss\text{: Aims for low training error}\) \(\lambda\text{: Scalar value that controls how weights are balanced}\) \(w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}\)

Send feedback about...

Machine Learning Crash Course